This the text, more or less, of the talk I gave at NDF2013. It’s draws heavily on earlier posts, especially Project calabash and the pompously titled Open letter to cultural collecting organisations.
I was tempted not to give this talk after seeing Chris McDowall‘s talk on cross-walking collections, but in a way it supports that idea and also plays a bit with what Ed Summers had to say about small data and collections needing to be ‘regions of stability’ in his keynote talk, The web as a preservation medium. And more on that from Michael Lascarides in his talk and this pertinent tweet:
But I gave the talk anyway and this is what I said…
I have to preface this talk by saying that I’ll be talking about one very simple idea – linking a couple of sites to each other – and some more complicated ideas. They’re not really fully formed and aren’t really mine.
But they’re ideas that are floating around – what Virginia Gow refers to as shared ideas, or just ideas whose time has come and that we’re all kind of thinking about. And they’re not ideas I have solutions for, but they seem like problems that are staring us in the face and we should try to solve them.
The Web Team at the Ministry for Culture and Heritage (MCH) looks after all the Ministry’s websites, including our two largest sites, Te Ara – the encyclopedia of New Zealand, and NZ History – New Zealand History Online.
Unlike a lot of the organisations involved with NDF, MCH isn’t a collecting institution. We write text, a lot of it, both in print and online. Te Ara has maybe 3.5 million words; NZ History at least another million.
Where we intersect with collecting institutions is in the thousands of images that illustrate our stories.
We’re probably one of the country’s biggest collection users – Te Ara has 25 to 30000 resources, most from collecting institutions; NZ History has maybe 4 or 5000.
When it comes to collections items, like this gourd, or calabash, or hue from Te Papa, our sites explain their significance and place them in the context of stories that demonstrate their relevance to other items.
I’ll come back to that idea of relevance later, but for now I want to talk about a very simple linking project that we’re working on with Te Papa.
We source images and other media from institutions like Te Papa. Most of these items are available on Te Papa’s website. What we’re doing isn’t rocket science: we’ll be providing links between the two sites based on the item. So if you see it on Te Papa’s site you can go and read more on Te Ara, and if you’re on Te Ara you can find the original version on Te Papa’s site.
Before I get to why we’re doing this, I should say why are we using Te Papa as a test case.
A big part of it is down to people, and I have to name check Adrian Kingston for coming up with this idea, and seeing this as something that was possible, relatively easy, and worth the effort.
Partly it’s also the synergy between what the two organisations and our websites do: we both take a national view, and in the case of Te Ara, we also take an encyclopedic view and try to cover all aspects of New Zealand culture and history in much the same way that Te Papa’s collections span history, culture, natural sciences and so on.
It’s also partly that the number of Te Papa images used on Te Ara is relatively small, about 500, so we’ve got a manageable set to work with manually. And Te Papa uses persistent identifiers for items on Collections Online. We can’t do this without persistent IDs that will be there forever.
The process is currently manual. It’s basically a spreadsheet of Te Papa images and where they appear on Te Ara. Te Papa staff are going through the spreadsheet and identifying the corresponding IDs and URLs.
With 500 items that’s not a major hassle, but at some point I hope we’re going to have to think about how to scale is beyond 500 images and automate. It’s a pilot so maybe we’ll never have to ask that question, but I’d love to be applying this to some of the larger collections we use like the several thousand images from the Alexander Turnbull Library.
So we’re basically making item-to-item links. Well so what? You can look at this image on Te Ara or you can look at it on Te Papa. Surely if you’ve seen one image you’ve seen them all. But not quite.
What you’re getting from Te Ara is information on the item’s significance. It’s in the story on traditional Māori warfare; it illustrates the page about preparations and entering into battle; and from the caption you learn about its use in this context. Te Ara provides the context that signals why this item is important.
From Te Papa on the other hand you can find out that it’s also called a taha huahua, or calabash, it’s made of harakeke, muka, gourd, dye and was purchased in 1905. You can see the collection it belongs to and what it was influenced by. And all those underlined words link to more items that share those classifications.
So what we’re doing is helping people find the information that’s relevant to them. If they’re interested in the story behind something, get it in one place; if they’re interested in the detail about it, get it from another. What you get in each place is what’s most relevant to where you are but you can easily find other information if that’s what you’re interested in.
I just want to mention a couple of other examples that probably bring this into sharper relief.
This is Abel Tasman. Eric Ketelaar from the University of Amsterdam spoke earlier this year at the GLAM symposium held at Victoria University. I’m hazy on the details but he was talking about the many copies of Tasman’s journals that exist around the world.
Some exist as simple scans; others as transcripts of the Dutch; others as translations. Duplication isn’t the issue; as archivists tell us, lots of copies keep stuff safe. The issue is that none of these copies are linked to the others.
The copy you happen to stumble across directly affects your experience and ability to use the materials.
If you can’t read long-hand the scans are no good to you; if you can’t read Dutch, the transcripts won’t help; if you can’t read English, you might be better off with the Dutch. It wouldn’t be hard to link them together – they’re on the web so it’s just hyperlinks that are needed – so if you find one copy you can get to the copy that works for you.
Another example is images like this one from what’s called the H series – a collection of World War One photographss commissioned by the New Zealand government and taken by Henry Armytage Sanders. The Alexander Turnbull Library holds the original glass plate negatives and copies are held by other organisations around the country.
Auckland Museum for example has copies in photo albums, and that points to story of how the images were originally used. They were put into albums with captions, and the albums were distributed around the country so that soldiers and their families could order prints. This is why they’re all numbered so people knew which photo to order.
Again linking the originals to the albums gives people the chance to experience and use the images in different ways. If you want a hi-res copy, the best place to go is the Turnbull; if you want to experience what it was like for a nation to see what the war had been like in page after page of photos, you can view the albums. Making that simple link makes those things possible.
Back to Te Ara and the relevance of items to other items. I’ve written a bit over the last few years about the way that something like Te Ara – but any publication really that uses collection items – is kind of like the meat in the sandwich between collections. Where a publication uses items from different collections it’s effectively creating an inferred relationship between the items and the collections.
I keep coming back to the calabash. Its many names are part of the complexity, as are its many uses. As we’ve seen, it illustrates the story about traditional Māori warfare, but it also illustrates the story about rongoa, or the medicinal use of plants.
Within that story it’s suggesting relationships with items as diverse as other plants to an engraving of a Māori warrior, a Lindauer portrait of the tohunga, Tūhoto Ariki, and a cartoon of Māui.
Through that one story we have inferred relationships forming between Te Papa, Turnbull, Auckland Art Gallery, the Department for Conservation and Godwit Publishing.
Some of these links even start to get a little playful, in the way that Cath Styles talked about in her game Sembl last year, where you let your user make the connections between items.
Just to get a little meta about it, Te Ara’s story on collecting brings together a really wonderful assortment of subjects from Turnbull himself and his book plates to firearms and Barbie dolls. It starts to coalesce around a subject that an institution might not think of, but once it’s in a user’s hands, those connections start to form.
Where else can we go with this? At a simple level item-to-item linking opens up a few options. We can potentially share our content more easily with other organisations if they want it – that saves them the effort of writing new content about their items.
We can also use it as a hook to update copyright or other information when the institution changes their record. Or we could look at pulling their descriptions of items in as alt text for screen readers to use.
We’re also interested in sharing our content with third parties to build new publications, websites or apps. Currently we can only share the text as that’s our copyright, but if we have a direct link to an item, then it’s easier for a third party to find sources of images and be able to negotiate re-use rights directly with the holding institution. Services like Digital NZ could also use the information and map our stories to institution records and expose those relationships through their API.
But more than all that, we could as Adrian Kingston suggests start to use the items to catalogue the stories they illustrate.
Te Ara subjects are at a very high level – the story title and page title is in effect the main subject. That’s fair enough, it’s an encyclopedia after all, so the title is a headword, and a headword by default is a subject. But what if we used the items to infer more specific subjects that the story might relate to?
From that we might see that the story about Māori warfare and this image of a taiaha…
…is also a story about woodcarving and the use of materials like feathers, dog hair and flax in Māori society through Te Papa’s catalogue record.
And through that connection is related to thousands of items in their collection. Not all the Te Papa subjects will be directly relevant to Ta Ara’s story, but by being able to choose which ones are, we can make direct links from a story to a much larger pool of related items on Te Papa’s site.
And where this starts to head is towards an idea that Virginia Gow threw at me recently which picks up on some work in the Netherlands. The National History Museum there joined up with some other websites and created what’s basically a trusted network of sites. When one site links to another site in the network, the links gets reciprocated automatically. It’s the sort of thing you could start letting your users do for you.
It shouldn’t be impossible. It’s the kind of thing Facebook does when it lets you tag someone in a post or photo, or that WordPress does when you allow pingbacks to a blog post. What they’re doing is building a system that’s aware of the network it’s part of and letting users take advantage of the network.
Simply linking items to items is going to take some work, and it’s obvious we’ll need to work out ways of doing it automatically when we look at a set larger than the Te Ara Te Papa set. Could we let our users do it? Could we just start sharing our data with each other in such a way that machines can start making the matches for us?
It plays into the work that Chris McDowall demoed yesterday – making matches across collections based on people. That sort of thing can be done automatically, by the right person with the right tools. All we need to do is let people like Chris use our websites and collections and see what they can do.
People are easy – ish; so are places, and even some events. They’re hooks that can connect our websites, and connect our content. Subjects and classifications are potentially no different. A little more ambiguous at times but not impossible.
One of the things you notice when you look at collections is they’re never as comprehensive as you’d hope. They’re riddled with historical accidents. No institution has everything related to a subject or person or place. Or think of the Treaty of Waitangi – held by Archives New Zealand, soon to be housed in the building of the National Library, but arguably as relevant to Te Papa’s collection.
Look at any significant artist and see how their works are scattered across museums and galleries around the country (if not the world).
That’s history – different things get picked up by different institutions at different times, and we can’t change it. But for the user it’s infuriating. Why can’t I see all of someone’s work in one place? Or everything on a particular event all together?
The beauty of linking all our content together is that it creates a layer of meaning and use that sits above individual collections. It lets us all play to our strengths. Organisations can maintain their own web presence that talks to their mission, their collection, and their community, but it lets a much wider community tell their own stories that cross-walk all the separate institutions and collections. Through that we and our users could create truly national stories using all the different parts held in different institutions around the country.
That’s taken us away from the simple idea of linking items to items. That network is hard but we need to do it. At the same time, let’s not forget about doing the simple stuff. If there’s stuff you can connect your collection items to, just do that. It’s a start, and if it gives more use and meaning to your users then you’re doing something right.
But keep the hard stuff in mind – agitate for it, remind people why it’s worth doing, do it if you can and share the results with as much of the network as possible. That way we build a richer digital ecosystem for developers and our users.