The Internet Archive recently released a demo version of its new Open Library project, about which we are very excited.
We’re great fans of the IA, due to the wonderful Bookmobile and the all-encompassing awesomeness of their main site, the largest collection of its kind of publicly-available text, images, audio and video, as well as the world’s largest history of the web. So when we heard they were turning their attention to paper books, we were looking forward to seeing what they came up with.
Their mission statement is worth reading in full:
What if there was a library which held every book? Not every book on sale, or every important book, or even every book in English, but simply every book—a key part of our planet’s cultural legacy.
First, the library must be on the Internet. No physical space could be as big or as universally accessible as a public web site. The site would be like Wikipedia—a public resource that anyone in any country could access and that others could rework into different formats.
Second, it must be grandly comprehensive. It would take catalog entries from every library and publisher and random Internet user who is willing to donate them. It would link to places where each book could be bought, borrowed, or downloaded. It would collect reviews and references and discussions and every other piece of data about the book it could get its hands on.
But most importantly, such a library must be fully open. Not simply “free to the people,” as the grand banner across the Carnegie Library of Pittsburgh proclaims, but a product of the people: letting them create and curate its catalog, contribute to its content, participate in its governance, and have full, free access to its data. In an era where library data and Internet databases are being run by money-seeking companies behind closed doors, it’s more important than ever to be open. [Source]
But what’s it like, beyond the rhetoric? Well, it’s a collection of listings for every edition of every book that’s ever made it into library classification (or at least, that’s what it will be), as well as scans of those editions which have already made it into the Archive’s copyright-free library.
If:Book has some quibbles about the presentation, but I’m far more interested in what this means at the level of data and metadata.
For starters, Library data is not free. The OCLC, the world’s largest supplier of library data (and recent receiver of much Charkin-praise), is a non-profit which charges for it’s data feeds. The Open Library plans to build futurelib, an open, universal book catalogue, which will contain all books, not just those which arrived recently enough for the increasingly outmoded ISBN classification, or which belong to organisations hooked in to the OCLC’s network.
Secondly, an Open Library can consolidate and clarify all these data structures, not enslaved to the horribly outdated Dewey Decimal system, the increasingly subjective and unwieldy Library of Congress Classification system, the publishers’ proprietary and unworkable BIC and ONIX systems, or even the tag-based user-generated systems of the new wave, but providing a translation point between them all, as well as serving as a rallying call to create new and better schemas.
They plan to consolidate all the information surrounding the book too – imagine a place to go and search out books that contains not only the book itself, its various classifications and summaries, but also reviews at every level, from Amazon one-stars up to scholarly monographs, references and antecedents, cover art through time, location and author data… the possibilities are almost limitless.
So too are the commercial applications, with print-on-demand of scanned titles planned, with the trade-off of open sourcing the software driving the library. It will be interesting to look back in fifty or a hundred years to see how static this project (or a similar one) has become. If ebooks take over, will a project like the OL become a true archive, indexing only the past? Even if this is the outcome, it only strengthens the case for such a project. We look forward to following its progress.
I feel I should offer one correction: when you write that OpenLibrary can “consolidate and clarify all these data structures, not enslaved to the horribly outdated Dewey Decimal system, the increasingly subjective and unwieldy Library of Congress Classification system,” you partly miss the big issue, though I think you’re on target with BIC and ONIX.
LoC and DDC are subject classification systems, not systems of bibliographic description. The library acronyms that I think you want are MARC, and to a certain extent, AACR2 and perhaps FRBR. But I’d argue that these latter two, FRBR especially, will actually have to be among the core documents in the process of “consolidating and clarifying” that you mention.
Comment by Jacob Nadal — August 6, 2007 @ 4:07 pm
Jacob – Thanks for that. I’m not too sure of the distinction you’re drawing between subject classification and bibliographic systems – in the end, they’re all ways of organising books, and creating ways of addressing them – but I don’t know anything about MARC, AACR2 and FRBR, so I’ll look into them.
Comment by James Bridle — August 6, 2007 @ 10:06 pm
Aha, here’s one of the mysteries of library science, then! I’ll try to be brief, but apologies in advance if this is too far down the rabbit hole.
In a conversational sense, you’re entirely correct. When we say “book” we usually let the artifact stand in for its content and vice versa. In creating a library of any sort, though, one has to identify distinct “bibliographic entities” (this copy of that edition of this work by that author). The Anglo-American Cataloging Rules (AACR2) are one system for doing this, the Functional Requirements for Bibliographic Records (FRBR) are a conceptual model for the whole process.
Those individual books might be arranged in physical space by one or more methods. Size, subject systems and date of acquisition are the most common (and I’d assume it’s the first and last that predominate in digital libraries) but they’re referenced through any number of subject classifications. Moreover, as any given book can have numerous subject headings, none of them serve to uniquely identify it.
There’s another level of distinction between a subject heading and a call number (or shelf mark). The latter often do incorporate a subject based number, but have to add additional information to actually address a particular book: eg. 917.3 might be the DDC number for 372 books in a given library. Adding a Cutter number for the author’s name or the title of a work, and a copy number if there are multiple copies, is necessary to actually address a distinct book: 917.3 C34 D53 c. 2. (Even then, most library systems actually track books on a unique key that’s invisible to the user.)
That still won’t tell you what that particular book is. That is, there’s no reverse decoding of a call number possible to get a bibliographic description. The same is true, of course, of ISBNs. One can use them as a key to look up individual works, but that key still has to reference a bibliographic description.
Machine readable cataloging (MARC) is a format for storing all of that data, used by libraries for the last 30 years or so.
Comment by Jacob Nadal — August 7, 2007 @ 12:32 am
A fine explanation, thank you. It will be interesting to see where the Open Library goes with this – whether they will leave shelf addressing to the individual collections and go with subject classification, or attempt to bridge the gap a little more thoroughly.
Comment by James Bridle — August 7, 2007 @ 1:31 pm
Open Lib’s been having a few interesting discussions about this on their “librarianship” listserv (http://demo.openlibrary.org/about/lib). It looks like they really have their eye on capitalizing on some of the things you can do in a digital library (when many people can share the same copy and you don’t have to put it just one place on a shelf).
Thanks for you interest, too. I think that bibliographic description and subject classification are things that libraries do very well and are fundamentally necessary, but we absolutely make a mess of presenting them in a useful way to the people that they’re supposed to serve.
Comment by Jacob Nadal — August 8, 2007 @ 2:28 pm
Yes, I think that’s exactly the issue here. Library classification grew out of the need for librarians to be able to find things and deliver them to their users, but these systems don’t make much sense when we all want (/need) to be our own librarians.
… Which is not meant in any way to imply that librarians aren’t needed, or don’t have far more to bring to this discussion that your humble &c. &c.
Comment by James Bridle — August 8, 2007 @ 3:55 pm