February 27, 2017

Linked data 101

Ever wonder how movie listings show up at the top of a Google search, even when all you type in is the name of a recent release?

The answer is linked data, which is a specific way data is coded for web searches. Linked data is now available to libraries and it’s very cool. There’s also a fair bit of jargon involved in converting library records to linked data, so we’ll be providing definitions to some of the terms to help you turn the corner from “What is this?” to “Oh, this is awesome!” 

MARC record:

We’ve all come across the term MARC record, right? Librarians learned about them in our cataloging class in our graduate program, but unless you’re a cataloger (and I’m not), what you learned about MARC records, other than the name, might be hazy.

A MARC record is a MAchine-Readable Cataloging record - a bibliographic record designed to be read and interpreted by your ILS provider’s software. Each piece of bibliographic information (title, author, series, call number, etc.) contains a signpost to tell the ILS how to interpret the information. MARC standard is the basis for cataloging records and is how an ILS search box knows to look in the title field when you search for “Harry Potter and the Sorcerer’s Stone” and choose “title” from a dropdown menu.

MARC records are descriptive, but they aren’t “linked” in the way of linked data -- which we’ll look at more closely a little later in this post about why that’s important.

Structured data:

MARC records are a type of structured data: data that is organized/structured in a predicable way. You've seen it--if you have a spreadsheet of addresses with columns organized in different fields (title in one, first name in another, etc.) so that you can print address labels, you have structured data. If your library has form-based RA, the info you get back is probably structured data: patron email address in one column, five favorite books in another, format preference in another.

The magic of structured data is that it's easier to for a machine to consume and understand, but it's also easier for *you* to understand.

Structured data = the inclusive name for ALL of that type of data -- from MARC records to your Excel address book to Census records.

Linked data:

MARC records are a form of structured data. Linked data is structured data published to the web according to specific standards so that relationships can be linked. This means that when you type in a question, the structure allows the computer to infer an answer, based on defined relationships between data points. The term linked data was coined in 2006 by Tim Berners-Lee, director of the World Wide Web Consortium, in a design note about the Semantic Web project. He talked more about the power of linked data in his 2009 TED talk, “The Next Web.”

What does this mean for you, average internet user? If you Google Rogue One, the first page of search results is almost all linked data. Google offers me local showtimes, guessing that I might be interested in seeing this movie and piecing together bits of linked data from movie theater websites to answer an inferred question: “When is Rogue One playing near me?”

I might want to know about the movie, so Google also offers me a knowledge box on the right. This knowledge box pulls together pieces of information from several different websites (IMDb, Rotten Tomatoes, YouTube) to provide me with information to another question the algorithm guesses that I’m asking: “What is Rogue One about? Did it get good reviews?”

The last piece of linked data that Google offers me is snippets of news stories with “Rogue One” as a keyword. Google's algorithm is pulling bits of information from sites it has determined are trusted, that have well-organized, linked data.


To understand what a resources is, let’s look at a picture. The image is a linked data page on Sue Grafton from Athens Regional Library System. A resource is a piece of data. This page describes one resource, Sue Grafton, and is linked to hundreds of other resources. Suspenseful, Jo Bannister, W is for Wasted, Agatha Christie’s Sparkling Cyanide, women private investigators…. Whether a subject heading, an appeal term, an author, a title of a screenplay, or the title of a book, they are all resources.

Want the jargon? In linked data terms, RDF stands for Resource Description Framework, a set of standards produced by the World Wide Web Consortium. A resource is then anything that an RDF graph describes and can be identified by a URI (Unified Resource Identifier). A URL, or web address, is a type of URI.

Library.Link Network

What you’ve read so far has been a lot of technological definitions with some real world examples. The Library.Link Network is where you get the payoff. The Library.Link network is a network of thousands of libraries’ linked data. In the network, library data isn’t just linked to itself, but it is also linked to all those other libraries. Millions of resources from one library are linked to millions of resources from thousands of other libraries.

That’s a lot of big numbers.

Numbers have power. Going from MARC records to linked data resources means more numbers and more power(ful data). Publishing your linked data as part of the Library.Link network amplifies this numbers. Instead of just describing your library’s data and your library’s story, the Library.Link network tells the whole library story, because it connects all the individual library stories together.

Now, think about those movie listings and searching Google for Rogue One. Then imagine if libraries across the country had transformed their MARC records into linked data and all published them to the Library.Link Network. In this vision of the future, Google has changed its algorithm to look for linked data for libraries and now, in the knowledge box to the left, next to IMDb and Rotten Tomatoes is a link that says, “Check to see if your library has this movie.”

This future is only possible with linked data. 

Jennifer Lohmann is a NoveList Consultant.

Add Comment

Other EBSCO Sites +