RSS

booktwo.org


Archives (Standards)

09/07/07: The sustainability of the archive

manuscript.jpg

Citing the crucial need to access records on nuclear waste storage, or census returns, in five, 10 or even 100 years’ time, [Natalie Ceeney, chief executive of the National Archives] said: “This is a critical issue for us, and for UK society as a whole. We assume our personal records are secure, we expect our pensions to be paid, but anyone with a floppy disc even three or four years old is already having a hard time finding a computer that will open it.” [Source]

This is undoubtedly one of the most interesting and pertinent articles I’ve seen in the papers for a while: National Archive project to avert digital dark age.

First of all, it makes me nervous that Microsoft is a verbose partner in this. Isn’t the reliance on one or two companies’ proprietary formats what got us into this mess in the first place? MS are renowned for their distaste for open and accessible formats (witness their approach to web standards embodied in Internet Explorer, or the furore over the BBC’s MS-powered iPlayer), so while it is probably necessary that they should be involved to rescue these files, let’s hope the Archives have learnt their lesson and are moving towards the use of open, extensible, standards-based code.

I’m going to point again to this article about validation, because I think it says a lot of things very well about the importance of using this kind of code:

This is an attempt to make a code that can go decades and centuries, getting broader in scope without ever shutting out it’s early versions. Because that’s what we need the code to do: this code is for recording what we think. There are no paper backups of the web. Every day we put more on it that we’re not putting in our traditional medias. If we don’t use extensible code, then our current history evaporates with the next minor tech change. We’ve never had this problem before. Before a mark on a page could go centuries; there’d always be daylight to read it by. This is a new problem and it required a new solution. [Source]

This is as important in publishing as it is in other fields. As we move inevitably towards ebooks and beyond, it’s very easy to imagine a situation, twenty, thirty years from now when a decade-old literary work becomes inaccessible because it was composed on a computer, revised on others, and encoded in an obsolete, proprietary format for distribution - and never once written down on paper.

The solution, I’m afraid, is not to write everything down on paper - there’s too much of it now, and it’s wasteful and irresponsible to boot - but to make sure that we use the best, most open, most public formats right now, for everything we do.

Large sections of the music industry are already moving away from DRM-based systems (e.g. the latest version of iTunes) and publishers should take note, and not go down the bad old routes, which, experience is beginning to show, don’t help anyone in the long run. The International Digital Publishing Forum published the latest version of their XML-based Open eBook Publication Structure Specification at the end of last year, and it scored its first victory a few weeks back with its inclusion in the new Adobe Digital Editions (although this still lays open the possibility of DRM).

Yes, we need to find ways to make sure that authors and others are paid for their work, but we also need to make sure that their works - as well as those pension records and that nuclear waste data - are accessible to future generations. We owe them that.

Image detail from Illuminated by Chronicity, reproduced under CC Licence.

14/11/06: Seeing clearly

As accessibility is the watchword of the web standards movement, it’s kind of depressing to hear that traditional publishing is serving the blind and partially sighted community so badly: research for the Royal National Institute of the Blind found only twelve per cent of maths and eight per cent of science GCSE textbooks were available in a format which could be used be visually-impaired children.

The RNIB has led accessibility programmes for years - notably Daisy - and I happen to know it’s currently at work on a new XML-based standard for transferring all newly published material to accessible formats. While this represents a massive challenge - not least persuading publishers to supply data in whatever format they come up with - it also shows the massive benefits of digitisation: true access for all.

More: RNIB Web Access Centre Blog, Right to Read Campaign.

30/10/06: Open Standards

My recent post on Adobe’s Acrobat-disguised-as-an-eReader Digital Editions software drew a response from m’learned friends over at Mobileread. Alexander Turcic pointed out that DE doesn’t only support PDFs, but also the forthcoming Open eBook Publication Structure (OEBPS), a new standard for content creators and consumers - about which the International Digital Publishing Forum (IDPF) has just published a press release.

The new standard also includes a container standard for packaging ebooks (the Open eBook Publication Structure Container Format, or OCF), and is intended to make it easier and cheaper for all concerned. The IDPF and the OEBPS have some fairly heavyweight backers too - Adobe themselves, unsurprisingly, the Hachette Book Group, ebookseller Mobipocket (another of Amazon’s recent acquisitions), Random House, Simon & Schuster, and many others.

But the OEBPS isn’t the only standard available, and this is where it gets interesting. Their main rival, OpenReader, is a non-proprietary standard which nevertheless includes a standardised DRM. At first glance, this sounds counterintuitive to our position on DRM - in general, a bad thing for readers. But the OEBPS’s lack of a standardised DRM means that any publisher can slap their own conditions on the ebooks - meaning, for example, you could only read a particular book on a Sony Reader, just like you can only listen to MP3s from the iTunes store on an iPod. And the presence of people like John Perry Barlow endorsing OpenReader gives us a great deal of hope.

What is without doubt is that a new and consistent standard must be settled upon before the ebook market takes off and the book world gets into a VHS/Betamax type fight. The strength of the web’s open standards community comes from the fact that grassroots organisations had time to flourish before the corporations stepped in. With Adobe’s and the Publishing Conglomerate’s billions depending on this, that won’t be the case here. Both standards are based on XML, but there are many significant differences, and choosing the right one will be crucial for the future of books.

For the technically minded, specifications for both standards are available at www.openreader.org/spec/ and www.idpf.org/oebps/oebps1.2/. Some interesting places to go for more info include OpenReader Director of Strategic Information David Rothman’s blog at www.teleread.org (and his excellent piece in Publisher’s Weekly on the overcomplication of ebooks), and the blog of Abobe’s General Manager of ePublishing Business, Bill McCoy.



Switch to Regular Style
James Bridle
booktwo.org
james@booktwo.org