“Ramping It Up: 10 Lessons Learned in Mass Digitisation” is an excellent report by Rose Holley on the National Library of Australia’s (NLA) large-scale digitisation program. The program aimed to start at one million newspaper pages to ensure all supports were in place for ramping up digitization to an even higher level for newspapers, books, and journals. The report summarizes 10 key lessons learned, and the full report is online here.
It’s great to see these key lessons highlighted so succinctly. The lessons from the report are:
- Storage: purchase upfront or as soon as needed
- Quality assurance: differs greatly when done for access instead of preservation
- Quality assurance: workloads are uneven, requiring supports to utilize flexible staffing
- Digitization contractors: multiple are needed because some will have delays or will be unable to deliver; working with multiples increases the project coordinator’s workload
- Digitization contractors’ volume: many cannot quickly handle a large volume even if they assure that they can
- OCR contractor setup: is difficult, and NLA knows more about the process than the contractors
- Managing digitization contracts: because vendors may deliver late, impacting workflows and incurring costs, contract enforcement and penalties may be required
- Mass digitization workflows: are highly complex and still evolving
- Transparent processes and progress: creating documentation and sharing it widely is helpful for internal and external groups, and it saves time overall
- Public involvement: the public will help if given the opportunity to do so
Storage is an enormous issue because it impacts each and every area. Lack of storage is devastating for processing, breaking workflows and grinding work to a halt. It sounds extreme, but it’s fairly common. And, even when more storage is added, it takes days to move large amounts of content. One of the many (many, many) great things about being at the University of Florida is that the University of Florida is one of the largest schools around. The computing need and computing services are scaled appropriately, with commodity-style storage costs, allowing for stable and easily scaled up storage.
The UF Libraries conduct digitization for preservation, so we haven’t experienced the quality assurance issues related to digitization for access. For vendors and anyone else doing digitization, it isn’t easy and small problems can dramatically impact delivery dates and deliverables overall. The importance of transparency – in sharing and promoting knowledge – is thus essential for all involved. The public is ready to be involved and to assist for all projects, and even more helpful when given great tools like NLA’s OCR text correction (which the UF Digital Collections hope to emulate in the not-too-far future). Congrats to Rose Holley on publishing this and sharing NLA’s experience, and congrats to NLA on great work on the project and in service of digital libraries overall!