Closing the digitization gap

MS. Canon. Misc. 378, fol. 136r

Bodleian Digital Library’s Digitization Assistant, Tim, guest blogs about the treasures he finds while migrating and preparing complete, high-fidelity digitised items for Digital Bodleian. The Oxford DPOC Fellows feel lucky to sit across the office from the team that manages Digital Bodleian and so many of our amazing digitized collections.

We might spend most of our time on an industrial estate here at BDLSS, but we still get to do a bit of treasure-hunting now and then. Our kind has fewer forgotten ruins or charming wood-panelled reading rooms than we might like, admittedly – it’s more a rickety MySQL databases and arcane php scripts affair. But the rewards can be great. Recent rummages have turned up a Renaissance masterpiece, a metaphysical manuscript, and the legacy of a Polish queen.

Back in October, Emma wrote about our efforts to identify digital images held by the Bodleian which would make good candidates for Digital Bodleian, but for one reason or another haven’t yet made it onto the site. Since that post was published, we have been making good progress migrating images from our legacy websites, including the Oxford Digital Library and – coming soon to Digital Bodleian – our Luna collection of digitized slides. Many of the remaining undigitized images in our archive are unsuitable for the site, as they don’t constitute full image sets: we’re trying to keep Digital Bodleian a reserve for complete, high-fidelity digitized items, rather than a dumping-ground for fragmentary facsimiles. But among the millions of images are a few sets of fully-photographed books and manuscripts still waiting to be showcased to the public on our digital platform.

A recent Digital Bodleian addition: the Notitia Dignitatum, a hugely important Renaissance copy of a late-Roman administrative text (MS. Canon. Misc. 378).

Identifying these full-colour, complete image sets isn’t as easy as we’d like, thanks to some slightly creaky legacy databases, and the sheer volume of material versus limited staff time. An approach mentioned by Emma has, however, yielded some successes. Taking suggestions from our curators – and, more recently, our Twitter followers  – we’ve been able draw up a digitization wishlist, which also serves as a list of targets for when we go ferreting around in the archive. Most haven’t been fully photographed, but we’ve turned up a clutch of exciting items from these efforts.

Finding the images is only half the hunt, though. To present the digital facsimiles usefully, we need to give them some descriptive metadata. Digital Bodleian isn’t intended to be a catalogue, but we like to provide some information about an item where we have it, and make our digitized collections discoverable, as well as giving context for non-experts. But as with finding images, locating useful metadata isn’t always simple.

Most of the items on Digital Bodleian sit within the Bodleian’s Special Collections. Each object is unique, requiring the careful attention of an expert to be properly catalogued. For this reason, modern cataloguing efforts focus on subsets of the collections. For those not covered by these, often the only published descriptions (if any) are in 19th century surveys – which can be excellent, but can be terse, or no longer up-to-date. Other descriptions and scholarly analyses are spread around a variety of published and unpublished material, some of it available in a digital form, most of it not. This all presents a challenge when it comes to finding information to go along with items on Digital Bodleian: much as we’d like to be, Emma and I aren’t yet experts on the entirety of all the periods, areas and traditions represented in the Bodleian’s holdings.

Another item pulled from the Bodleian’s image archive: a finely decorated 16th-century Book of Hours (MS. Douce 112).

Happily, our colleagues responsible for curating these collections are engaged in constant, dogged efforts to make descriptions more accessible. Especially useful to those of us unable to pop into the Weston to rifle through printed finding aids are a set of TEI-based electronic catalogues*, developed in conjunction with BDLSS. These aim to provide systematically-structured digital catalogue entries for a variety of Western and Oriental Special Collections. They’re fantastic resources, but they represent ongoing cataloguing campaigns, rather than finished products. Nor do they cover all the Special Collections.

Our most valuable resource therefore remains the ever-patient curators themselves. They kindly help us track down information about the items we’re putting on Digital Bodleian from a sometimes-daunting array of potential sources, put us in touch with other experts where required, and are always ready to answer our questions when we need something clarified. This has been enormously helpful in providing descriptions for our new additions to the site.

With this assistance, and the help of our colleagues in the Imaging Studio, who provide similar expertise in tracking down the images, and try hard to squeeze in time to photograph items from the aforementioned wishlist, we’ve managed to get 25 new treasures onto Digital Bodleian since Emma’s post, on top of all the ongoing new photography and migration projects. This totals around 9,300 images altogether, and we have more items on the way (due soon are a couple of Mesoamerican codices and an Old Sundanese text printed on palm leaves from Java). Slowly, we’re closing the gap.

A selection of recent items we’ve dug up from our archives:

MS. Ashmole 304
MS. Ashmole 399
MS. Auct. D. inf. 2. 11
MS. Canon. Bibl. Lat. 61
MS. Canon. Misc. 213
MS. Canon. Misc. 378
MS. Douce 112
MS. Douce 134
MS. Douce 40
MS. Holkham misc. 49
MS. Lat. liturg. e. 17
MS. Lat. liturg. f. 2
MS. Laud Misc. 108
MS. Tanner 307


*Currently live are catalogues of medieval manuscripts, Hebrew manuscripts, Genizah fragments,  and union catalogues of Islamicate manuscripts and Shan Buddhist manuscripts in the United Kingdom. Catalogues of Georgian and Armenian manuscripts, to an older TEI standard, are still online and are currently undergoing conversion work. Similar, non-TEI-based resources for Incunables and some of our Chinese Special collections are also available.

Guest post: The 6-million-image gap

Bodleian Digital Library Systems and Services’ Digital Curator, Emma Stanford, guest blogs for the DPOC project this week. Emma writes about what she is doing to close some of the 6-million-image gap between what’s in our tape archive and what’s available online at Digital.Bodleian. It’s no small task, but sometimes Emma finds some real gems just waiting to be made available to researchers. She also raises some good questions about what metadata we should make available to researchers to interpret our digitized image. Read more from Emma below.

Thanks to Edith’s hard work, we now know that the Bodleian Imaging Services image archive contains about 5.8 million unique images. This is in addition to various images held on hard drives and other locations around the Bodleian, which bring the total up to almost 7 million. Digital.Bodleian, however, our flagship digital image platform, contains only about 710,000 unique images–a mere tenth of our total image archive. What gives?

That 6-million-image gap consists of two main categories:

Images that are online elsewhere (aka the migration backlog). In the decades before Digital.Bodleian, we tried a number of other image delivery platforms that remain with us today: Early Manuscripts at Oxford University, the Toyota City Imaging Project, the Oxford Digital Library, Luna, etc., etc. Edith has estimated that the non-Digital.Bodleian content comprises about 1.4 million images. Some of these images don’t belong in Digital.Bodleian, either because we don’t have rights to the images (for example, Queen Victoria’s Journals) or because they are incomplete selections rather than full image sets (for example, the images in the Bodleian Treasures exhibition). Our goal is to migrate all the content we can to Digital.Bodleian and eventually shut down most of the old sites. We’ve been chipping away at this task very slowly, but there is a lot left to do.

Images that have never been online. Much of Imaging Services’ work is commercial orders: shooting images for researchers, publishers, journalists, etc. We currently store all these images on tape, and we have a database that records the shelfmark, number of images, and list of captured pages, along with information about when and how the images were captured. Searching through this archive for Digital.Bodleian-appropriate images is a difficult task, though. Shelfmark notation isn’t standardized at all, so there are lots of duplicate records. Also, in many cases, just a few pages from a book or manuscript were captured, or the images were captured in black-and-white or grayscale; either way, not suitable for Digital.Bodleian, where we aim to publish fully-digitized works in full colour.

I’m working on extracting a list of complete, full-colour image sets from this database. In the meantime, we’ve started approaching the problem from the other direction: creating a list of items that we’d like to have on Digital.Bodleian, and then searching the archive for images of them. To do this, we asked the Bodleian’s manuscript and rare book curators to share with us their lists of “greatest hits”: the Bodleian’s most valuable, interesting, and/or fragile holdings, which would benefit most from online surrogates. We then began going through this list searching for the shelfmarks in the image archive. Mostly, we’ve found only a few images for each shelfmark, but occasionally we hit the jackpot: a complete, full-colour image set of a 13th-century bestiary or a first edition of a Shakespeare play.

Going through the archives in this way has underlined for me just how much the Bodleian’s imaging standards have changed in the last two decades. File size has increased, of course, as higher-resolution digital scanning backs have become available; but changes in lighting equipment, book cradles, processing software, rulers and colour charts have all made their mark on our images too. For me, this has raised the question of whether the technical metadata we’re preserving in our archives, about when and how the images were captured, should also be made available to researchers in some way, so that they can make an informed choice about how to interpret the images they encounter on sites like Digital.Bodleian.

In the meantime, here are some of the image sets we’ve pulled out of the archive and digitized so far:

Jane Austen’s juvenilia
a 13th-century bestiary
the Oxford Catullus

MS. Bodl. 764, fol. 2r (detail)

