Inter-operative bookmarking; Gracenote for books.

April 8, 2009


Shared bookmarks are one of the primary drivers of conversation and socialisation on the web. Simple pointers to information are the basic currency of networked communication, and one of the most desirable functions of the future book. But, in the book, they’re pretty hard to achieve.

I’ve hit this problem already on bkkeepr, and that’s just with physical books. If two people are reading the same book in two different editions (hardback or paperback, modern or ancient, even in different translations) then the same text doesn’t occur on the same page. (This is one of the main reasons bkkeepr bases itself on ISBNs rather than titles or “works”, but it’s unwieldy and has been, mostly rightly, criticised.)

The problem gets harder with ebooks. My Sony Reader lets me bookmark pages, but there’s no way to transfer or even translate these to another epub reader, let alone another format or edition. I’ve been lurking on the epub-interop group for a while, which has been considering this issue, as well as things like reliable identifiers for epub books, and just keeping your place in different editions (a subset of the bookmark problem).

So, to first principles: a bookmark is a location, right? But it’s a location in an existing text, and the problem comes down to defining a location in a text that moves about, covers different numbers of pages, appears in different formats. But here’s the rub: it’s always the text. (Well, not exactly, but we’ll come to that later.)

I do something quite similar a lot, when I’ve read a newspaper or journal article offline, and want to find the online version. I just pick a string of words from the text, that feels like it contains a reasonably-unique (don’t pick me up on that, you know what I mean) set of words or phrasing, and google it in quotes. Works a charm.

Going further, it seems likely you can bookmark anything given a string of sufficient length to be unique (I’m getting something in the back of my head about whole files, and the best model of something being itself, but we’ll ignore that).

This is where an idea I’ve been toying with for a while comes in: do we need a Gracenote / MusicBrainz for books? A big database containing everything – or at least some kind of hash of everything, a set of unique signatures for each book? Could you be able to take a string-of-a-certain length from anything, submit it to this DB, and get back a title, like holding your phone to the music with Shazam?

… although I’m realising that Google Book Search is pretty much working on that – and it has an API, so. I might put a wrapper on that. (The geek version of a donk.) Unless someone has already… ? (For more on Google Book Search and unique strings, see Dance of the Concords.)

So if you have a string of sufficient length, you’d get a single result, and be able to find the bookmark in a text, even if you didn’t know what the text was before. That’s quite interesting, and new. I think.

There are serious issues with this approach of course, not least that books are edited and do change more than just their page numbering over the course of time, but some kind of clever, fuzzy search or simple string-lengthening might deal with this. And then there are translations: could you bookmark cross-language in this fashion, given a sufficiently clever translation engine?


Photo of bookmarks by FlickrJunkie, used under Creative Commons.


  1. I’m sure I’m missing something, but I can’t see what the value-added is here. We’ve been searching unique strings on Google and other search engines for years — and, when we know what book a passage is likely to be in, via Amazon’s “search inside the book” feature — so what would be new in what you propose? Do you have some means in mind by which people with different editions of a text could find the same passage (other than what we already do, which is to say “it’s two pages from the end of Chapter 4”)? Again, I’m probably not getting what you’re saying, but I’m intrigued.

    Comment by Alan Jacobs — April 8, 2009 @ 7:16 pm

  2. What’s wrong with quotation? If you want to refer to text, cite it. Searching a book search engine by text string if a specific hyperlink isn’t available, as you suggest as the mechanism, seems simple enough.

    In my academic experience, I always loved the well-published editions, normally of canonical classics, that reproduced the original page numbers. My copy of Critique of Pure Reason was paradigmatic–not only did it reproduce the page numbers, but in sections of the text where the first and second editions deviated, the page numbers split off as well, labeled “A” and “B” for the different manuscripts. It was very intuitive. If a scanned book is converted to characters rather than simply an image (necessary for searching anyway, no?) it would seem fairly simple to embed the “original” page numbers to the text. Then one could search by this data, just as one does in class “original edition page 37, second paragraph…” I tend to think this sort of well-planned editing/publish effort is a better way of organizing the data, rather than creating a bunch of competing secondary app systems to search the data.

    The difference to me seems to be with music, you are referring to a particular sound segment, which is difficult to describe (the one with the part with the ooh-aahs, and the bum-bum-ba-dum…) but in a text, the only thing one would want to direct to is text, which is pretty easy to search for already as text, without creating a new category of metadata. Page numbers, a standard for centuries, seems like all the data one might need.

    Unless I’m misinterpreting or you see another context for reference?

    Last thought–I love concordances–SO useful for research of prolific authors. Is there some sort of thematic concordance one could develop by embedding metadata that would be helpful? Like an index in the old school sense? A user-defined subjective electronic card catalog, complete with page and paragraph numbers? Is this more what you’re thinking about? Tags linked to text passages?

    Comment by Adam — April 8, 2009 @ 8:54 pm

  3. @Alan – that’s the point, that we can’t say “it’s two pages from the end of Chapter 4” when we’re working across a bunch of electronic texts – and we need to be a lot more precise about it, in any case. The value add is that we’re trying to solve the problem of location by reconsidering the bookmark as just a string, rather than a string and a location.

    @Adam – using “original” page numbers is the technique that a lot of ereaders (and epub) currently use, but it’s really not good enough. This “original” page number is meaningless in an electronic context, and differs according to platform, format – and what you mean by “original”. Page numbers are no longer meaningful or useful, and we have to break with this metaphor.

    As for indexes and concordances – yes, this is something that editors (or even users) could build into electronic books in a more interesting and useful way than has ever been done before.

    Imagine if you could import differently themed concordances into your book, by critics or professors, according to your own particular interests?

    Comment by James Bridle — April 9, 2009 @ 1:41 pm

Comments are closed. Feel free to email if you have something to say, or leave a trackback from your own site.