I’m working on a couple of eBook projects, and thinking about distribution. Sales figures are important: in the music world, we’ve already seen the move to recording downloads in addition to physical sales for compiling charts. (Chris Heathcote has some thoughts on the latter, and notes we’re not yet at the per-play stage – c.f. bkkeepr.)
My question is: how do you track, monitor and analyse downloads? Particularly of free ebooks?
Imagine this scenario: there’s a free ebook. It’s hosted in one place, and there’s a single addressable URL to access it. This will probably be a pointer, rather than a direct link to the actual file. This means the file can be delivered, but some analytic measure can also be triggered: recording number of downloads and their point of origin.
Yes, it’s perfectly possible someone will repost the file elsewhere, and this will be untrackable. Without imposing arcane and nasty DRM, we will have to ignore this. We’re also ignoring official (and presumably paid-for and therefore separately tracked) downloads avilable via eBook vendors elsewhere.
We’re talking about a single, canonical, trackable address for a single eBook. Are people doing this? How? Thoughts and answers in the comments, please.
*
Associated with this, I’ve been thinking a lot about artists’ books. That is, works of art in the form of a book. Ready-mades. Uniques (although the term doesn’t apply in this context). And Zines.
I’m thinking of things like the work of Mark Pawson, and Book Works. And the whole history of artists’ books.
I think there are opportunities and affordances for doing things in the eBook space, with artists. Distribution. Links. Algorithmic transformations.
So, in the tradition of marking out the territory via the strategy of buying domain names, I’ve registered artists ebooks .org. There’s not much there yet. Consider it a starting point.
Thoughts welcome.
If we’re ignoring everything except the single, canonical, trackable address then this could be easily tracked through standard server-side web traffic packages like AWStats (note: Google Analytics and other JavaScript-based tracking applications wouldn’t work).
A better solution, however, would be to have a server-side script (e.g. written in PHP) on the canonical, trackable URL that logs each download into a database.
Comment by Paul Watson — September 17, 2009 @ 2:34 pm
Thanks Paul – that was the route I was heading down too. Wondering if there are any other interesting solutions in use out there…
Comment by James Bridle — September 17, 2009 @ 2:57 pm
[…] Booktwo proposes an interesting question about how to track the distribution of an ebook (leaving aside piracy and paid versions). I wonder if there is any answer for this: I’m working on a couple of eBook projects, and thinking about distribution. Sales figures are important: in the music world, we’ve already seen the move to recording downloads in addition to physical sales for compiling charts. (Chris Heathcote has some thoughts on the latter, and notes we’re not yet at the per-play stage – c.f. bkkeepr.) […]
Pingback by How can you track an ebook? | TeleRead: Bring the E-Books Home — September 17, 2009 @ 3:39 pm
I have a proposal, but I don’t know enough about ebooks or rather ePub to be able to know whether it makes sense.
What you are interested in isn’t the number of downloads, but rather the number of installs of the ebook on different reader devices. Therefore you want something that triggers a “hit” when someone uses an ebook for the first time on a new device regardless of where they got it from.
Without going down the path of nasty DRM, does ePub support the concept of an external stylesheet? If it does then you can track access to this stylesheet and providing you use a different unique stylesheet for each publication then you get to track installs regardless of where people got them from.
If ePub also let you do run basic javascript then you could encode an extension to the stylesheet URL that resolved to the same file but encoding some unique identifier of the device to give you some kind of total number. Otherwise every time a user opened the file with the device connected to a different network then the trackers would record a new IP address whereas in fact it is a second reading by the same reader (assuming that the device doesn’t cache external stylesheets)
Does any of this sound vaguely possible?
Comment by Alex Fiennes — September 17, 2009 @ 4:13 pm
(empty comment to get notified of updates by e-mail)
Comment by Alex Fiennes — September 17, 2009 @ 4:13 pm
James,
You might consider Scribd as well. There are several advantages:
-Built in tools for views and downloads
-Widgets that others can embed in blog posts, etc
-Highly customizable for how much control you want over cost and accessibility
-Standard social media tools like favorites and commenting
-Built in audience already using the site
Here is my account with a couple of ebooks I uploaded:
http://www.scribd.com/doc/18195075/My-Favorite-Business-Book
Comment by Todd Sattersten — September 17, 2009 @ 6:52 pm
The ePub format expressly forbids including external resources (like images), though in practice many ereaders would probably not care. I know Bookworm will just go request any image, though it wouldn’t do the same for stylesheets.
I would use a simple 1×1 pixel invisible image, which is exactly how marketers track when you’ve opened an HTML email.
Comment by Liza Daly — September 18, 2009 @ 1:02 am
A related question is whether to encourage any standards for a book’s canonical URL. Does it include an ISBN or another unique reference for the book (I appreciate there are >1 ISBNs for one ‘book’) and/or should it be more easily human-readable? Or will the URL eventually replace the ISBN anyway? How hackable is it [Bookkake’s are a good start but surely http://bookkake.com/books/venus-in-furs/epub is a bit more elegant than the current]? Generally speaking, what else besides tracking downloads might you want from it?
Comment by Max — September 18, 2009 @ 10:12 am
In addition to downloads, the content distribution network providers, such as Limelight, can report downloads of cached copies back to a provider. You’re still limited to counting files downloaded rather than installs, but without an authentication process of some sort at install time—meaning, usually, DRM is involved, though it could be as simple as adding a remote procedure call that pings the provide to say “I’m being installed on another device.” You wouldn’t likely be able to get device information, and the capability would be limited to devices that have a live network connection.
Comment by Mitch Ratcliffe — September 18, 2009 @ 9:47 pm
Just been having a little look around the http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html and my gut reaction of something that is most likely to provide a network lookup on the bulk of epub readers is to use an “Out-Of-Line XML Island” in the manifest (see 2.3.1.2).
If one had a single item in the manifest that had a required-namespace that pointed to the booktwo server to grab the schema required to process an XML document, and if one rewrote the epub document on every download such that the URL to the schema contained a different id then I think that the chances are quite high that most readers would download the resource at least once (although they may well cache it after the initial download). The download of this resource could then be tracked by whichever http tracking system you prefer (custom php download, log analysis, ethernet packet monitor, etc etc)
Comment by Alex Fiennes — September 19, 2009 @ 10:53 am
No PHP scripts or such required–just monitor the URL in your web server’s access logs. See:
http://httpd.apache.org/docs/2.0/logs.html
for an example.
Comment by Bradley Wright — September 19, 2009 @ 3:38 pm
Thanks all for these suggestions. Lots of things to try.
I think exploring the possibility or rewriting an epub on every download is a particularly interesting one, generating a unique ID every time.
Comment by James Bridle — September 22, 2009 @ 10:18 am
Hello.
Not about the e-book download, but intrigued by idea of artists ebooks. I suspect this stuff might already be happening. But I’m not sure where. McSweeney’s just released their iPhone app., which seems a bit gash by all accounts. But it was more an attempt to mimic the minimal aesthetic of the website, I think.
I’d be interested in seeing books that play with conventions, that rewrite themselves based on how long we’ve been reading, for example. Interactive Fiction (look up Nick Montfort) does some work with this.
Should probably get an ereader before I start getting excited about this, though…
Comment by Kevin O'Neill — September 24, 2009 @ 5:08 pm
Kevin: if you are going to have interactive fiction, and it is running on a device that knows where it is as well as what you are doing with it (iphones and androids) then it may be interesting to get the story-telling algorithms to re-weight their choices based on how people who are allready reading things are changing their behaviour as they read them. I’m playing around with a similar idea with algorithmic music generation and listener position…
Comment by Alex Fiennes — September 24, 2009 @ 5:19 pm
I am a librarian at a university in the US. I am commenting on something on one of your posts way back on OCLC. You were looking for a way to search and link to your local libary catalogue. You gave a link to the catalogue and I looked at it, and somehow you think it is not adequate enough. I am not sure why. It looked adequate with all the standard search features. Actually using WorldCat would have been much more cumbersome to look for materials solely from your local library, and would have given you basically the same results — assuming all of your library’s resources are in the WorldCat — some of their more specialized items may not have been easily enough catalogued to put into WorldCat.
Where I live, we have access to virtually any library catalogue in our state: university, local etc. through a database constructed by our state government for state residents. The database also contains full text magazine and newspaper articles and government documents.
You say that you think data, and I guess that would include data about libraries’ holdings, should be “free.” Who exactly is going to pay to make such data “free” – which would include data entry, upload and maintanance? I don’t know if OCLC is making huge profits or not, but what they do costs money — no one would be able to do it for “free.” For example, our US Library of Congress, which employs thousands of people, receives approximately 10,000 more items than they can catalogue every single day. I think of this when I hear people say things like “every book in the world should be made electronic.”
The main reason we use WorldCat at our library is to locate books we don’t own (there are other uses too). Then people can obtain them through Interlibrary Loan. It is a wonderful service, for the most part with no charge to the patron. Tell me how could this possibly be made “free”?
By the way a limited amount of WorldCat entries are available through the internet, for free, at least in the US. Perhaps some kindly non-profit group could do what OCLC does, but they would still have to charge for their services. It could not be “free.”
Comment by Sue — October 28, 2009 @ 2:01 am
Is it really the number of downloads that’s important, though? I have a ton of ebooks I’ve downloaded and never read. But they were free (And all legal, BTW), and hard drive space is cheap.
I think you’d get a better sense of the book’s popularity by looking at mentions on blogs, Twitter, Facebook, whatever. I may download a ton of ebooks, but I’m only going to mention the ones I read and liked.
What if a million people download it, but don’t read it? Or ten people download it, but each share it with ten friends, who all become big fans of the author? Measuring the number of downloads isn’t going to give you much meaningful data, so I think your time and effort would be better spent thinking about other ways to track the book’s popularity.
Comment by Jon — November 2, 2009 @ 1:05 am
Jon: how about you divide the ebook up into a number of smaller ebooks, but only provide the link to the next (free?) installment at the end of the previous installment. Then you actually get stats as to who is actually reading and how far they get before they give up. I’ve been dabbling with similar things for web pages with javascript trying to work out when people actually scroll down (or not as the case may be) but I suspect that you wouldn’t be able to embed anything this intrusive inside an ebook that was going to be displayed in a reader that you don’t have that much control over.
Comment by Alex Fiennes — November 2, 2009 @ 1:13 am
Jon: in addition to this you could also make it so that the link to the next installment gets generated when the ebook for the previous installment is generated such that the link includes an encoded reference to the previous ebook. That way you could find out who was reading on, and also who was choosing to redistribute their copies to other locations who were then themselves reading further. Obviously the redistribution may or may not be something that you want to encourage depending on the licensing terms…
Comment by Alex Fiennes — November 2, 2009 @ 1:17 am
Alex: What you’re talking about is technically possible, but I think you lose most of the benefits of digital distribution when you try and retain that much control. I don’t know how typical I am, but if I have to come back to you every time I want the next couple of chapters, I’m going elsewhere for my ebooks. There’s too much competition for my time for me to deal with something that inconvenient.
What about just asking the fans? The last page of the ebook lists a web or email address, and the author asks anyone who read the book to just share a quick note on the book, on who they might have shared it with, what they thought, or on anything else that might be relevant? Look at Amazon – they get hundreds of thousands of reviews, and the relationship between Amazon customers and Amazon is nowhere near as personal as the author-reader relationship. Surely someone who loved the book would send the author a note if you make it easy, and someone who hated the book is even more likely to share their opinions. And the people in the middle, relatively indifferent – they’re not likely to be back, so it doesn’t really matter what they thought. You have to start over from scratch with them anyway.
Again, I think focusing on raw download numbers is a waste of time. Focus on the fans, on making it easy for them, and on listening to what they have to say about the book. That’s way more important than the number of people who downloaded and don’t care.
Comment by Jon — November 2, 2009 @ 2:33 am
[…] pleased to announce that Artists’ eBooks, a project first mooted in this post a couple of months ago, is now live at […]
Pingback by Artists’ eBooks formed – artists and writers responding to new technologies | TeleRead: Bring the E-Books Home — November 12, 2009 @ 5:49 pm