RSS

booktwo.org

Archives (Google)

08/04/09: Inter-operative bookmarking; Gracenote for books.

bookmarks

Shared bookmarks are one of the primary drivers of conversation and socialisation on the web. Simple pointers to information are the basic currency of networked communication, and one of the most desirable functions of the future book. But, in the book, they’re pretty hard to achieve.

I’ve hit this problem already on bkkeepr, and that’s just with physical books. If two people are reading the same book in two different editions (hardback or paperback, modern or ancient, even in different translations) then the same text doesn’t occur on the same page. (This is one of the main reasons bkkeepr bases itself on ISBNs rather than titles or “works”, but it’s unwieldy and has been, mostly rightly, criticised.)

The problem gets harder with ebooks. My Sony Reader lets me bookmark pages, but there’s no way to transfer or even translate these to another epub reader, let alone another format or edition. I’ve been lurking on the epub-interop group for a while, which has been considering this issue, as well as things like reliable identifiers for epub books, and just keeping your place in different editions (a subset of the bookmark problem).

So, to first principles: a bookmark is a location, right? But it’s a location in an existing text, and the problem comes down to defining a location in a text that moves about, covers different numbers of pages, appears in different formats. But here’s the rub: it’s always the text. (Well, not exactly, but we’ll come to that later.)

I do something quite similar a lot, when I’ve read a newspaper or journal article offline, and want to find the online version. I just pick a string of words from the text, that feels like it contains a reasonably-unique (don’t pick me up on that, you know what I mean) set of words or phrasing, and google it in quotes. Works a charm.

Going further, it seems likely you can bookmark anything given a string of sufficient length to be unique (I’m getting something in the back of my head about whole files, and the best model of something being itself, but we’ll ignore that).

This is where an idea I’ve been toying with for a while comes in: do we need a Gracenote / MusicBrainz for books? A big database containing everything – or at least some kind of hash of everything, a set of unique signatures for each book? Could you be able to take a string-of-a-certain length from anything, submit it to this DB, and get back a title, like holding your phone to the music with Shazam?

… although I’m realising that Google Book Search is pretty much working on that – and it has an API, so. I might put a wrapper on that. (The geek version of a donk.) Unless someone has already… ? (For more on Google Book Search and unique strings, see Dance of the Concords.)

So if you have a string of sufficient length, you’d get a single result, and be able to find the bookmark in a text, even if you didn’t know what the text was before. That’s quite interesting, and new. I think.

There are serious issues with this approach of course, not least that books are edited and do change more than just their page numbering over the course of time, but some kind of clever, fuzzy search or simple string-lengthening might deal with this. And then there are translations: could you bookmark cross-language in this fashion, given a sufficiently clever translation engine?

Thoughts?

Photo of bookmarks by FlickrJunkie, used under Creative Commons.

19/03/09: Google lies – but you knew that already, right?

googlebooks.png

Re: today’s announcement about Google and Sony. It doesn’t appear to be a deal as such, but what’s clear is that half a million scanned books from Google Book Search will be made available as epub files, with millions more to come. Epubs. Ebooks.

Now, cast your mind back, if you will, to the London Book Fair 2007. I was there, twittering and liveblogging away. There were zombies and some book no one had heard of called White Tiger. All very good.

I went to a quite interesting Google presentation, on Google Book Search. Lots of publishers were very nervous about GBS, and Google, with the help of panellists from Berg, Springer and the Cambridge University Press, did a very good job of reassuring them. A lot of publishers went away reassured about Google’s aims and intentions, and no doubt signed up to GBS some time later.

A few weeks later I wrote a piece about this, and raised some questions. I was a bit doubtful when they assured publishers of their good, simple intentions, and felt they were taking advantage of publishers’ (then) fairly minimal comprehension of ebooks and the web. It was reposted at Teleread. Here’s the key quote:

This dilemma increases when you hear what Google are saying about the status of these files. Emphatically they state, and I’m directly quoting Google’s Jason Hanley (Strategic Partner Development Manager) here: “Google Book Search is not an ebook”.

Well, isn’t that interesting. As I said at the time:

This isn’t contrariness. I want digitisation to succeed, but I’ve got some worries about GBS, based on two main observations: Google Book Search isn’t the same as Google Web Search, and Google, if not actually, intentionally lying, is certainly wilfully misleading publishers about its intentions.

The first part of that statement has become obvious with time, although it was all a bit more confusing two years ago. The second part, well. Hello Google ePubs. Surprise!

I could rant on about this for ages, but the core point is simple: Google is not just a search engine, it’s a publisher. Every time I try to defend them, they do something like this, and pretty much justify all those people who want to sue them for copyright infringement for making a “copy” of their website in their index. Start thinking about that.

28/08/07: Errata as Metadata

smelling.jpg

Too long and too important for a Stop Press post:

Google is throwing away information that is fundamentally characteristic of books—metadata that describe and even determine what books are, as simple and trivial as volume numbers, or artifacts of type design, editing, and artistic production. Books are not, in other words, mere bags of words, but vehicles in which ride a wide sundry of other passengers—metadata, artistic expression, whimsy, and error. Books are born and produced in a rich organizational and information-rich social and economic context, and the willing discard of that context carries with it a loss whose surface manifestation may be amusing, but whose deeper ramifications are profoundly disturbing. [Link]

Even if you don’t want to go down the route of scratch’n’sniff ebooks, we have to recognise that books aren’t just the lit. They are an experience. Google is getting it wrong. Can we do better?

Image courtesy of Bekah Stargazing, Flickr and CC. 1,265 results for photos matching book and smell.

09/08/07: Printing the Obvious

createspace.jpg

So, what a surprise. Amazon has announced that it’s starting a Lulu-type POD system, through its wholly-owned subsidiary CreateSpace, which has been churning out self-published CDs and DVDs for several years now. The difference to Lulu being that products of said service will be searchable and buyable through the mighty Amazon.com, making them much more discoverable than stuff on Lulu, which is mostly only linked to from authors’ homepages.

There’s a bigger story here though, and it’s linked to this announcement:

The National Archives and Records Administration, the federal government’s official archivist, has entered into an agreement with CreateSpace, an Amazon.com subsidiary, to digitize the motion pictures in its collection. CreateSpace will digitize movies chosen from NARA’s collection of more than 200,000 motion picture titles, most of them public domain. Amazon.com will then make the DVDs available in a DVD-on-demand service ($19.99).

Creating better access to archives is unquestionably A Good Thing, but this way of doing things provokes a number of questions. The NARA claims they can’t possibly afford the costs of digitisation, and so getting Amazon to do it benefits everyone, as they get free, new copies for their archives. Charging for DVD hard copies on Amazon’s part is also justifiable, but what about electronic copies?

The reported trigger for the NARA’s decision was an earlier partnership with Google, which saw a trial run of 101 films made available through Google Video. From 200 requests for the hard copies in the previous year, the movies were seen over 200,000 times when available on the web – a clear indication that the interest was there, but not the availability. Hence the Createspace project. The NARA and Amazon executives have made the fascinating and fantastic statement that the material will remain in the public domain, meaning you can copy your Createspace DVD as many times as you like—but will they cut out the middleman and make the whole, Createspace-digitised archive available online through Google Video or similar?

The question is particularly pertinent because this is exactly what concerns me about Google Book Search: entering into partnership with libraries and archives to digitise public domain content, but not honouring the spirit of that public domain status by making the texts fully available and downloadable (including, particularly, being indexable by other agents). The Amazon/NARA partnership seems almost too good to be true, but public-private partnerships make me nervous (if you live in London, like I do, you’ll know exactly what I mean), and when rights and digital access are involved, I get very nervous indeed.

22/06/07: Friday light relief: Google Fan Fiction

google-tattoo.jpgBooktwo.org, always up-to-date with the latest online literary microtrends, is proud to bring you a new subgenre: Google fan fic (or should that be fear fic?). Enjoy.

Google Interiors by Sandra Niehaus:

I realized with a shock that George’s hat was a dense cluster of tiny cameras, forming a rounded beehive of angled, glittering eyes. “We’re from Google Interiors, a new venture sponsored by Google to make every home interior in the world searchable on the internet.”

Robot Exclusion Protocol by Paul Ford:

“Hi! I’m from Google. I’m a Googlebot! I will not kill you.”

I saw the best minds of my generation destroyed by Google by Bruce Sterling (!):

This is Macbeth’s world, and us teenagers just live in it. Dig this: those “Three Weird Sisters”, who mysteriously know everything? They can foretell anything, instantly, like Google? Plus, the witches make it all sound really great – only, in real life, it totally sucks?

The Nine Billion Names of God by Kathy Kachelries:

“Here’s the thing. Google has memorized who you are. It’s memorized all of us, through those little forgotten bits that we leave behind like breadcrumbs. And what’s more important, it’s memorized it’s own idea of you. Google is omniscient. It’s omniscient and omnipotent. When it cached its cache for the first time, back in 1994, that’s when Google realized what it was.”

And finally, the grandaddy of Google Fan Fic, EPIC 2014 by Robin Sloan and Matt Thompson (an oldie but still a goodie):

In 2014, Googlezon unleashes EPIC, the Evolving Personalized Information Construct, which pays users to contribute any information they know into a central grid, allowing the system to automatically create news tailored to individuals, entirely without journalists. … At its best, EPIC is “a summary of the world — deeper, broader and more nuanced than anything ever available before … but at its worst, and for too many, EPIC is merely a collection of trivia, much of it untrue.”

(See also: Armando Ianucci’s Tesco vs. Denmark: from “Every Little Helps” to “We Control Every Aspect Of Your Lives”.)

30/04/07: Google Book Search: Obfuscation & Mystification

googlebooks.png

I’ve written about Google Book Search before, but it’s time to do so again – particularly after their PR barrage at the London Book Fair, some aspects of which I wrote up at the time.

For a while now, I’ve been broadly in favour of GBS, at least in as much as it’s forcing publishers to look seriously at digitisation strategies and becoming the driving force for change within the industry. Google’s PR drive has also stepped up a notch, with their flacks becoming increasingly informed about the book trade, a number of high-profile panels at book events, and a rapidly growing number of publishers coming on board. At the LBF, they convinced a fair number more.

So now, as is my wont, I’m the one getting nervous. This isn’t contrariness. I want digitisation to succeed, but I’ve got some worries about GBS, based on two main observations: Google Book Search isn’t the same as Google Web Search, and Google, if not actually, intentionally lying, is certainly wilfully misleading publishers about its intentions.

Read the rest of this entry »

24/01/07: Unbounded Coverage

In what should be the last of the round-ups of the Google Unbound conference, but probably won’t be, some more commentators:

I’ll stop now.

23/01/07: Guarding the legacy

Today’s Guardian has a short piece with more Google follow-upping:

The iPod has done it with music, Flickr has done it with photos, MySpace has done it with bands and Saatchi is doing it with paintings. The question is: can Google do the same thing with books by creating an international online market place for them enabling readers to download volumes in their entirety – at a price of course – to their iPods, Blackberrys or smartphones?

Luckily, the Guardian’s Vic Keegan is more clued-up than Bryan Appleyard – for example, he’s been trying out iCUE too. He’s also the man behind Shakespeare’s Monkey, he’s active in Second Life, and, at the risk of stalking, he uses Flickr, so he’s rather better qualified to talk about all this.

According to a Guardian column from a couple of weeks back, which I can’t locate online, he also released a book of poems (which may or may not be this one) inside Second Life recently. If anyone can find out any more about this, I’d be very grateful.

[UPDATE:] Thank you, Mr Keegan (see the comments).

22/01/07: Information vs. Knowledge (the Times they are a-changin’)

Lots of recent activity in the British press concerning future books: last weekend’s Sunday Times contained not one but two pieces on the subject.

The first piece, Google plots e-books coup, reports on the Google Unbound conference we mentioned last week. Unfortunately, it’s all fairly techless, reporting that “the internet search giant is working on a system that would allow readers to download entire books to their computers in a format that they could read on screen or on mobile devices such as a Blackberry” (er, Gutenberg?) and “commuters in Japan were already reading entire novels on their mobile phones” – something some of us have been doing for a while in this country too (see iCUE).

It does, however, contain a nice quote from if:book’s Ben Vershbow: “Google seems to be simultaneously petting the industry and saying everything is going to be all right if they just let everything go, but at the same time telling them: ‘We have you guys up against the wall’.”

Serial crank Bryan Appleyard then takes up the story in Could this be the final chapter in the life of the book? Despite some cogent analysis of the Google/Publisher fight – with special mention going to Jean-Noël Jeanneney, president of the Bibliothèque Nationale de France, for his work highlighting the inherent cultural and corporate bias of Google, which makes it far less neutral an information dealer than it would like to present itself as – Appleyard can’t help the hyperbole: “We are, it seems, about to lose physical contact with books, the primary experience and foundation of civilisation for the last 500 years.”

Coming off the back of several paras about academic textbooks, this is unfortunate. Most of the debate about book digitisation is framed in terms of poor authors, starving in garrets, unable to make a penny because of evil copyright-infringers. But the vast, vast majority of digitised content is academic and/or technical; it’s being put out there to help people learn more, better, and more easily; to improve the world. Such works are pure information – their format is simply not important. The heft of a good novel may be pleasing to the bibliophile, but few would go so far as to say they must have the latest X-thousand-page volume of the International Journal of Electrical Engineering in hardback.

Appleyard draws the distinction, with John Sutherland, between the algorithmic search engine and the wisdom of the human-made index. But in the end he totally misunderstands the nature of information, arguing that it is a separate quality to ‘knowledge’, instead of its central, essential building block:

[...] David Worlock of Electronic Publishing Services said, “Ultimately it’s not up to Google or the publishers to decide how books will be read.

“It’s the readers who will have the final say.”

No, it is the teachers who will have the final say. They will determine whether people will read for information, knowledge or, ultimately, wisdom. If they fail and their pupils read only for information, then we are in deep trouble. For the net doesn’t educate and the mind must be primed to deal with its informational deluge. On that priming depends the future of civilisation. How we handle the digitising of the libraries will determine who we are to become.

“The net doesn’t educate”? If Appleyard means by the above that teachers must do more to help pupils learn to navigate the new digital libraries, to harness the flow of information themselves and to make their own judgements about the quality of information, then he is correct. But they’ve been doing that for centuries too, and as resources like Moodle (and Sloodle), the Million Book Project and the now entirely digital Open University show, they are embracing the new mediums with much more enthusiasm than doomsaying journalists.

*

[Update 23/01/07] More evidence of naysaying, or just lazy journalism: Contrary to Appleyard’s assertion that Google Unbound was “an invitation-only conference”, registration was open to all, and rapidly filled up.

16/01/07: Google’s Un-Bound

Google Unbound

This looks like it should be very interesting:

Six centuries ago, a German metalworker tinkered with a wine press, metal alloys and oil based ink, perfecting one of history’s great inventions: the printing press. With the rise of mass publishing, more people than ever were able to access information. Books proliferated. Today, digital technology offers a similar opportunity, and the Internet now represents a powerful platform for promoting and distributing books. Online book sales alone account for nearly four billion dollars in annual US sales—almost 15% of the entire book business. [More]

If anyone is going, I’d love to hear more. Boing Boing’s Cory Doctorow is speaking, so we can hopefully expect to hear more there soon.



Switch to Regular Style
James Bridle
booktwo.org
james@booktwo.org