Article on Google Books in First Monday

There’s a new article in First Monday that surveys Google Books by looking at multiple versions of Laurence Sterne’s The Life and Opinions of Tristram Shandy, Gentleman.* The intent of the article is that the oddities of the book form make it difficult to digitize; however, this good and useful point gets a bit lost in the details.

The article argues that many of the books in Google Books have issues with quality control and it argues that “quality assurance on the Web is provided either through innovation or through “inheritance” and that the inheritance for Google Books comes from the quality of the libraries. The seems to conflate two types of inheritance. Quality assurance can certainly come from innovation (think of the difficult to OCR text now being used to ensure that people are humans and not spam-bots while also then using the people as OCR conglomerates). Inheritance also makes sense from one technology to another instance of/within that technology, like inheriting whatever attribute from a parent object to an instance of that parent. It even makes sense rhetorically to see quality being inherited from qualified creators to the works they create (ethos, seeing the author as credible and therefore the work as credible). However, mashing these together so that personal/entity credibility for quality flows into technology–especially still relatively new technology with changing standards, requirements, and functions–doesn’t make sense. It’s like arguing that because a local bookstore has a pleasant environment and is good at having materials in their physical store, they’d be good at mailing them out and having a friendly online presence. It just doesn’t really work.

My other issue with the article is that it spends too much time on the details. Albeit important, the details point to the larger issue which the article does include–that books are weird and unwieldy and hard to work with. The weirdness of the books and the incredible effort it take to digitize books (especially if they aren’t disbinding) means that there will be huge issues. But, getting some of the work done is still good and the problems are a good lesson in the messiness of digital media. In my work, we disbind some books and not others (based on the importance of the form of the book; books from particular collections, rare editions, significant binding, significant for time period, desire for a bound version within the library). If we disbind, the messy process involves metadata creation, cutting the books with various tools, including machetes for the large-format materials, scanning them in high speed scanners or flatbeds (dependent on size), image correcting, quality control, OCRing, archiving, and loading. This takes an incredible amount of time and person-power and it’s messy. We end up with scraps of paper around, the materials leave dust everywhere, OCRing isn’t perfect and weird characters show up in the text, and all sorts of weird problems come up at every stage. Google Books likely has a different system, but one likely plagued by the same sorts of difficulties.

Arguing that books are funky, finicky creatures is great and more people need to hear it. However, the argument in this article seems to be lost to the details of one book and how it presents issues that aren’t yet solved. Perhaps I’m being overly defensive of Google Books, but the technology is changing rapidly and even if the digitized books are horribly broken, they already are for many people. Digitizing the books–especially in full text–means that they can easily be used by screen readers and viewed via screen zooming applications. The book’s paper form has long been a problem for those with impaired vision and only a small subset of books are available in audio, large text, or braille format. Digitizing books–no matter how badly–makes books usable for more people. That said, Google’s correcting poor copies by offering others, like this issue of Tristram Shandy. Overall, I think the article in First Monday is useful, but it needs several caveats because of the often unfair and irrational arguments against digitizing books and because of the defensiveness often shown for the book form as it is–despite the many who can’t or can only partially use the print versions.

All that said, I’m also a bit of a Google fan-girl because I think they do great work, so my response will be colored by that.

