DPOC Project reflections

Outreach and Training Fellow, Sarah, shares some of her reflections about the DPOC project as it draws to a close. Note: she wrote this post on a sunny day in September, before she left for maternity leave. She wants everyone to know things may have changed a bit by now.


It’s a sunny, cool autumn day in September. It is my last day before maternity leave. The project will continue on for another three months, but for me this feels like the end. By the time this is posted, it will be a cold winter day in December and the future of DPOC might have changed. I hope the next few months will bring more announcements, more ideas and more changes.

It seems like just yesterday we were picking a name for the project. Suddenly I am depositing datasets and publications as we begin a massive self-archiving component of the project. Things are starting to wrap up and it feels a little strange. So, I am going to take some time to reflect on what I have learned during the project not just from Oxford, but the wider digital preservation community.

 

People are constantly being undervalued in favour of technological solutions

It is so easy to just run to technology to solve a digital preservation problem. After all, our collections are digital so the solution must therefore be digital. This means that people are constantly being undervalued, overlooked and not given the opportunity to learn in a field that is always advancing. Technology has a place of course; they are our tools. But that’s just it, they are tools. Tools do not use themselves to their own ends. We need people to use them, to check them and to maintain them. Even in digital preservation, people have a place and we need to accept that. I’m not saying that to ensure we all have a job in the future, I’m saying that because people are the ones that make the decisions, run the quality checking processes and write the documentation. Whatever digital preservation may look like in the future, it needs to have people in it. Technology alone won’t save us.

 

Research and time to learn isn’t encouraged enough

Because of the previous point, it often means that existing staff are stretched to capacity. Not even with digital preservation work necessarily, but any digital work in general. It means there’s no time to advance skills or answer complex questions. Things have to get done and that means that something has to get dropped. Unfortunately, that’s always learning and research. In a field that is always changing, our knowledge and skills have to change to. We expect paper conservators to stay up to date with the current treatments, tools and chemicals. We also expect them to rigorously test and experiment before treating any work on paper. We should expect a similar level of research and care for our digital collections. They can be damaged, altered and lost. Just because they can also be copied easily doesn’t mean they are safe from all of that. A look back into your personal digital life is likely proof enough of that. IT departments are not immune to permanent loss; many of them have yet to adopt good digital preservation practices and so are often at risk.

 

Community and collaboration are everything

In the face of resource constraints, it is always the knowledge of the community that gets things done. It’s the open, collaborative nature of a small group of people that means tools and idea are shared. Work is undertaken collectively and people are generous with their time and expertise. I’m not sure how digital preservation would advance any other way. As it is, it’s a real struggle to get decent investment in it. Even this project was built on collaboration, which underpins that it’s hard to do this in isolation. It’s sad to see that project collaboration coming to a close now; there are so many possibilities for working together in the future. And this is what draws me to digital preservation—knowing there are a lot of smarter, generous people to always learn from.

 

Do something, no matter how small

Decisions around digital preservation might be hard to make, but make them. Sometimes there’s so much to do that conversations can jump from one thing to the next with little or no focus. Pick something and do it. There will always be more to consider—more collections, more processes, more tools, more people. The problem is that sometimes all we see are all of the problems and every one of them feels incredibly urgent. But looking to tackle all of the problems at once will likely bury you. I will point back again to the resource constraints, but also to the practicality that we cannot start off doing everything. If we could, the DPOC project and projects like it would never exist. The point is: we can’t. So be strategic. Look for the most important, the quick wins, the practicable: start there. Just don’t try to do it all; you may end up doing nothing.

 

So what is next?

Now that the project is concluding, the question is: has digital preservation become business as usual at Bodleian Libraries? The answer is: we’re not quite there yet, but we’re still fighting for it. At the time of writing this post, there were are a number of technical projects starting to improve workflows. There will be more collaborative digital preservation work with the GLAM institutions.

But all of it is project work. However, the fact that there are projects still happening at all gives me hope that we can keep advocating for a longer-term, sustainable programme. This message underpins every project and every report we deliver. That is a good place to start.

Devising Your Digital Preservation Policy: Learnings from the DPOC project

On December 4th the DPOC Policy and Planning Fellows ran a joint workshop in London presenting learnings and experiences of policy writing at CUL and Bodleian Libraries. Supporting the event were also Kirsty Lingstadt (Head of the Digital Library at the University of Edinburgh) and Jenny Mitcham (Head of Good Practice at the Digital Preservation Coalition). Kirsty and Jenny talked about their experience of policy writing in other organisational settings, illustrating how policy writing must be tailored to fit specific institutional contexts but that the broad principles remain the same.

In total 30 attendees partook in the workshop which mixed presentations with round table discussions. To make the event as interactive as possible Mentimeter was used to poll attendees on their own experiences of policy writing. Although the survey only represents a small selection of organisations in the process of writing digital preservation policy, the Fellows wanted to share some of the results in the hope that it will facilitate further discussion. Feel free to use the comments section below to let the project team know if the results from the poll seem familiar (or perhaps unfamiliar).


Question: Do you know who to consult on a digital preservation policy (in your organisation)?

Most workshop participants knew who they needed to consult on digital preservation in their organisation and also had a good working relationship with them. This is the first step when starting a new policy – knowing your organisational culture and context.

Being new to their organisations, the DPOC Fellows spent a lot of time of time early on in the project reaching out to staff across the libraries. If you are also new to your institution, getting to know those who have been there a long time is an important starting point to understanding what type of policy will suit your organisation’s culture before you begin any writing.

Question: What barriers can you see to developing a digital preservation policy (in your organisation)?

‘Time’ was identified as by far the largest barriers to writing new digital preservation policy by participants. And it is true that policy development does take a lot of time if you want the resulting document to be more than ‘just a paper’ which is filed away at the end of the process.

To get staff onboard with new policy, allocating resources for policy consultation is therefore crucial and the effort involved is not always appreciated by senior management. For example, it took the Fellows between 1-2 years to develop a new digital preservation policy for their organisations, illustrating why it is important to give staff sufficient time to write policy. While policy consultation took a long time, the DPOC Fellows felt that this was a worthwhile investment for their organisations, as time spent consulting on policy was also a great outreach and learning opportunity for the organisations as a whole.

Question: Does your organisation have a policy template?

Most participants did not have an organisation wide policy template. However, templates are part of policy best practice. A policy template is a skeleton document which outlines high level sections and headlines  which should be included in every organisational policy regardless of topic – from an HR policy to a digital preservation policy, they should all follow the same structure. The purpose of having these standardised headlines is to ensure that staff can easily digest and recognise any policy at a quick glance. Templates can also enforce good document management practices.

If you are interested in finding out more, a high level policy template which was developed for the DPOC project can be requested through the DPOC blog contact form or by emailing the Digital Preservation Coalition.

Questions: Where are institutional policies publishes (in your organisation)?

Once the policy is signed off, it is time to publicise it wider. Among the workshop participants the most common places to publish policies were either on an institutional website or intranet (although there are other options listed in the word cloud).

As a word of caution, make sure that your organisation is consistent in where it publish policies and ensure that documents are versioned. The international digital preservation policy review which the Fellows undertook in 2016 (analysing 50 different policies) found that most digital preservation policies do not use any document versioning. No versioning, in combination with the proliferation of different policy publication routes in an organisation, will soon become a real issue when staff try to locate up to date documents. (Again, if your organisation has a good policy template in place you can better enforce versioning!)

One option which was listed several times in the word cloud is to publish policy in an institutional repository; this is primarily useful if you do not have a reliable records management system in your organisation. Using a repository means that you can assign a DOI to the policy for persistent referencing and also has the added benefit of becoming the clear canonical copy of the policy

Question: How long will it take to…?

Participants were asked how long (using multiples of months) they think it would take their organisations to:

  • Draft a policy
  • Have it approved
  • Begin implementation of the policy
  • See real impact and benefits in the organisation

As seen from the chart, the drafting of a policy document is only one small aspect of policy and planning work. This is important to remember if you want to avoid your policy becoming just another ‘piece of paper’ that is filed away and not looked at again after its been written. Advocacy, communication and implementation plans continue for years to come after the original document has been drafted. 


Where next…

To find out more about policy writing during the DPOC project have a look at this recent blog post from CUL’s Policy and Planning Fellow Somaya Langley and at the workshop presentation slides available through the DPC. The Fellows are also happy to take questions through the blog and encourage use of the comments section.

Project update: available datasets

Oxford’s Outreach and Training Fellow, Sarah, announces the first available datasets from the DPOC project. This is part of the project’s self-archiving initiative, where they will be making sure project outputs have a permanent home.


As the project begins to come to a close (or in my case, maternity leave starts next week), we’ve begun efforts to self-archive the project. We’ll be employing a variety of methods to clean out SharePoint sites and identify records with enduring value to the project. We’ll be crawling websites and Twitter to make sure we have a record for future digital preservation projects to utilise. Most importantly, we’ll give our project outputs a long-term home so they can be reused as necessary.

That permanent home is of course our institutional repositories. Our conference papers, presentations, posters, monograph chapters and journal articles will rest there. But so will numerous datasets and records of reports and other material that will be embargoed. I’ve started depositing my datasets already, into ORA (Oxford University Research Archive).

There are two new datasets now available for download:

You can also find links to them on the Project Resources page. As more project outputs are made available through institutional repositories, we’ll be making more announcements. And at the end of the project, we’ll do a full blog post on how we self-archived the DPOC project, so that the knowledge gained will not be lost after the project ends.


Any tips for how you self-archive a project? Share them in the comments.

Project update

A project update from Edith Halvarsson, Policy and Planning Fellow at Bodleian Libraries. 


Ms Arm.e.1, Folio 23v

Bodleian Libraries’ new digital preservation policy is now available to view on our website, after having been approved by Bodleian Libraries’ Round Table earlier this year.

The policy articulates Bodleian Libraries’ approach and commitment to digital preservation:

“Bodleian Libraries preserves its digital collections with the same level of commitment as it has preserved its physical collections over many centuries. Digital preservation is recognized as a core organizational function which is essential to Bodleian Libraries’ ability to support current and future research, teaching, and learning activities.”

 

Click here to read more of Bodleian Libraries’ policies and reports.

In other related news we are currently in the process of ratifying a GLAM (Gardens, Libraries and Museums) digital preservation strategy which is due for release after the summer. Our new digitization policy is also in the pipelines and will be made publicly available. Follow the DPOC blog for future updates.

Digital Preservation Roadshow – Part 2

Building on the success of CUL’s digital preservation roadshow kit, the Oxford fellows have begun assembling a local version. The kit is a mixture of samples of old hardware, storage technology, quiz activities, and general “digital preservation swag”.

Pens, pins, and a BBC Micro

We were able to trial run it as part of a GLAM (Gardens, Libraries and Museums) showcase at the Weston Library this January. Among the showcase attendees’ favourite items was an early floppy disk camera (c.1998) and our BBC Micro Computer (1981).

Sony Digital Mavica (MVC-FD7) 

Technical Fellow James Mooney at the Oxford GLAM Showcase

Our floppy disk camera was among the first in the Mavica “FD” series from Sony. Sony produced 3.5” floppy disk cameras from late 1997 until 2002 (when it moved on to Mavica for CD). MVC-FD7 takes 8-bit images which can be easily transferred to a home computer. This is one of the reasons that the Mavica FD series was so popular – the FAT12 file system and wide spread adoption of 3.5″ floppy disk drives in computers made transfer a simple and quick task.

It is easy to forget that the floppy disk camera is really the grandfather of the microSD card!

 

 

BBC Micro

The BBC Micro is well known by most British people who went to school in the 1980s and ’90s – but even today some UK classrooms will feature a BBC Micro for more nostalgic reasons.  The BBC Microcomputer series was design and built by Acorn for the BBC Computer Literacy Project. Most schools in the UK adopted the system, and for many children the BBC BASIC programming language was the first one they learnt.

There is to this day a cult following of BBC Micro educational games, such as Granny’s Garden (1983).


The kit will be displayed in different Oxford libraries throughout 2018 to promote the DPOC training programme and raise awareness of Bodleian Libraries’ new digital preservation policy.


The vision for a preservation repository

Over the last couple of months, work at Cambridge University Library has begun to look at what a potential digital preservation system will look like, considering technical infrastructure, the key stakeholders and the policies underpinning them. Technical Fellow, Dave, tells us more about the holistic vision…


This post discusses some of the work we’ve been doing to lay foundations beneath the requirements for a ‘preservation system’ here at Cambridge. In particular, we’re looking at the core vision for the system. It comes with the standard ‘work in progress’ caveats – do not be surprised if the actual vision varies slightly (or more) from what’s discussed here. A lot of the below comes from Mastering the Requirements Process by Suzanne and James Robertson.

Also – it’s important to note that what follows is based upon a holistic definition of ‘system’ – a definition that’s more about what people know and do, and less about Information Technology, bits of tin and wiring.

Why does a system change need a vision?

New systems represent changes to the existing status-quo. The vision is like the Pole Star for such a change effort – it ensures that people have something fixed to move towards when they’re buried under minute details. When confusion reigns, you can point to the vision for the system to guide you back to sanity.

Plus, as with all digital efforts, none of this is real: there’s no definite, obvious end point to the change. So the vision will help us recognise when we’ve achieved what we set out to.

Establishing scope and context

Defining what the system change isn’t is a particularly good a way of working out what it actually represents. This can be achieved by thinking about the systems around the area you’re changing and the information that’s going to flow in and out. This sort of thinking makes for good diagrams: one that shows how a preservation repository system might sit within the broader ecosystem of digitisation, research outputs / data, digital archives and digital published material is shown below.

System goals

Being able to concisely sum-up the key goals of the system is another important part of the vision. This is a lot harder than it sounds and there’s something journalistic about it – what you leave out is definitely more important than what you keep in. Fortunately, the vision is about broad brush strokes, not detail, which helps at this stage.

I found some great inspiration in Sustainable Economics for a Digital Planet, which indicated goals such as: “the system should make the value of preserving digital resources clear”, “the system should clearly support stakeholders’ incentives to preserve digital resources” and “the functional aspects of the system should map onto clearly-defined preservation roles and responsibilities”.

Who are we implementing this for?

The final main part of the ‘vision’ puzzle is the stakeholders: who is going to benefit from a preservation system? Who might not benefit directly, but really cares that one exists?

Any significant project is likely to have a LOT of these, so the Robertsons suggest breaking the list down by proximity to the system (using Ian Alexander’s Onion Model), from the core team that uses the system, through the ‘operational work area’ (i.e. those with the need to actually use it) and out to interested parties within the host organisation, and then those in the wider world beyond. An initial attempt at thinking about our stakeholders this way is shown below.

One important thing that we realised was that it’s easy to confuse ‘closeness’ with ‘importance’: there are some very important stakeholders in the ‘wider world’ (e.g. Research Councils or historians) that need to be kept in the loop.

A proposed vision for our preservation repository

After iterating through all the above a couple of times, the current working vision (subject to change!) for a digital preservation repository at Cambridge University Library is as follows:

The repository is the place where the best possible copies of digital resources are stored, kept safe, and have their usefulness maintained. Any future initiatives that need the most perfect copy of those resources will be able to retrieve them from the repository, if authorised to do so. At any given time, it will be clear how the digital resources stored in the repository are being used, how the repository meets the preservation requirements of stakeholders, and who is responsible for the various aspects of maintaining the digital resources stored there.

Hopefully this will give us a clear concept to refer back to as we delve into more detail throughout the months and years to come…

DPOC: 1 year on

Oxford’s Outreach & Training Fellow, Sarah, reflects on how the first year of the DPOC project has gone and looks forward to the big year ahead.


A lot can happen in a year.

A project can finally get a name, a website can launch and a year of auditing can finally reach completion. It has been a long year of lessons and finding things for the Oxford DPOC team.

While project DR@CO and PADLOC never got off the ground, we got the DPOC Project. And with it has come a better understanding of our digital preservation practices at Bodleian Libraries. We’re starting year two with plenty of informed ideas that will lead to roadmaps for implementation and a business case to help continue to move Oxford forward with a digital preservation programme.

Auditing our collections

For the past year, Fellows have been auditing the many collections. The Policy and Planning Fellow spent nearly 6 months tracking down the digitized content of Bodleian Libraries across tape storage and many legacy websites. There was more to be found on hard drives under desks, on network drives and CDs. What Edith found was 20 years of digitized images at Bodleian Libraries. From that came a roadmap and recommendations to improve storage, access and workflows. Changes have already been made to the digitization workflow (we use jpylyzer now instead of jhove) and more changes are in progress.

James, the Technical Fellow at Oxford, has been looking at validating and characterising the TIFFs we have stored on tape, especially the half a million TIFFs from the Polonsky Foundation Digitization Project. There were not only some challenges to recovering the files from tape to disk for the characterisation and validating process, but there was issue with customising the output from JHOVE in XML. James did find a workaround to getting the outputs into a reporting tool for assessment in the end, but not without plenty of trial and error. However, we’re learning more about our digitized collections (and the preservation challenges facing them) and during year 2 we’ll be writing more about that as we continue to roadmap our future digital preservation work.

Auditing our skills

I spoke to a lot of staff and ran an online survey to understand the training needs of Bodleian Libraries. It is clear that we need to develop a strong awareness about digital preservation and its fundamental importance to the long-term accessibility of our digital collections. We also need to create a strong shared language in order to have these important discussions; this is important when we are coming together from several different disciplines, each with a different language. As a result, some training has begun in order to get staff thinking about the risks surrounding the digital content we use every day, in order to later translate it into our collections. The training and skills gaps identified from the surveys done in year 1 will continue to inform the training work coming in year 2.

 

What is planned for year 2?

Now that we have a clearer picture of where we are and what challenges are facing us, we’ve been putting together roadmaps and risk registers. This is allowing us to look at what implementation work we can do in the next year to set us up for the work of the next 3, 5, 10, and 15 years. There are technical implementations we have placed into a roadmap to address the major risks highlighted in our risk register. This work is hopefully going to include things like implementing PREMIS metadata and file format validation. This work will prepare us for future preservation planning.

We also have a training programme roadmap and implementation timeline. While not all of the training can be completed in year 2 of the DPOC project, a start can be made and materials prepared for a future training programme. This includes developing a training roadmap to support the technical implementations roadmap and the overall digital preservation roadmap.

There is also the first draft of our digital preservation policy to workshop with key stakeholders and develop into a final draft. There are roles and responsibilities to review and key stakeholders to work with if we want to make sustainable changes to our existing workflows.

Ultimately, what we are working towards is an organisational change. We want more people to think about digital preservation in their work. We are putting forward sustainable recommendations to help develop an ongoing digital preservation programme. There is still a lot a work ahead of us — well beyond the final year of this project — but we are hoping that what we have started will keep going even after the project reaches completion.

 

 

Visit to the Parliamentary Archives: Training and business cases

Edith Halvarsson, Policy and Planning Fellow at Bodleian Libraries, writes about the DPOC project’s recent visit to the Parliamentary Archives.


This week the DPOC fellows visited the Parliamentary Archives in London. Thank you very much to Catherine Hardman (Head of Preservation and Access), Chris Fryer (Digital Archivist) and Grace Bell (Digital Preservation Trainee) for having us. Shamefully I have to admit that we have been very slow to make this trip; Chris first invited us to visit all the way back in September last year! However, our tardiness to make our way to Westminster was in the end aptly timed with the completion of year one of the DPOC project and planning for year 2.

Like CUL and Bodleian Libraries, the Parliamentary Archives also first began their own Digital Preservation Project back in 2010. Their project has since transitioned into digital preservation in a more programmatic capacity as of 2015. As CUL and Bodleian Libraries will be beginning to draft business cases for moving from project to programme in year 2; meeting with Chris and Catherine was a good opportunity to talk about how you start making that tricky transition.

Of course, every institution has its own drivers and risks which influence business cases for digital preservation, but there are certain things which will sound familiar to a lot of organisations. For example, what Parliamentary Archives have found over the past seven years, is that advocacy for digital collections and training staff in digital preservation skills is an ongoing activity. Implementing solutions is one thing, whereas maintaining them is another. This, in addition to staff who have received digital preservation training eventually moving on to new institutions, means that you constantly need to stay on top of advocacy and training. Making “the business case” is therefore not a one-off task.

Another central challenge in terms of building business cases, is how you frame digital preservation as a service rather than as “an added burden”. The idea of “seamless preservation” with no human intervention is a very appealing one to already burdened staff, but in reality workflows need to be supervised and maintained. To sell digital preservation, that extra work must therefore be perceived as something which adds value to collection material and the organisation. It is clear that physical preservation adds value to collections, but the argument for digital preservation can be a harder sell.

Catherine had, however, some encouraging comments on how we can attempt to turn advice about digital preservation into something which is perceived as value adding.  Being involved with and talking to staff early on in the design of new project proposals – rather than as an extra add on after processes are already in place – is an example of this.

Image by James Mooney

All in all, it has been a valuable and encouraging visit to the Parliamentary Archives. The DPOC fellows look forward to keeping in touch – particularly to hear more about the great work Parliamentary Archive have been doing to provide digital preservation training to staff!

What is holding us back from change?

There are worse spots for a meeting. Oxford. Photo by: S. Mason

Every 3 months the DPOC teams gets together in person in either Oxford, Cambridge or London (there’s also been talk of taking a meeting at Bletchley Park sometime). As this is a collaborative effort, these meetings offer a rare opportunity to work face-to-face instead of via Skype with the endless issues around screen sharing and poor connections. Good ideas come when we get to sit down together.

As our next joint board meeting is next week, it was important to look over the work of the past year and make sure we are happy with the plan for year two. Most importantly, we wanted to discuss the messages we need to give our institutions as we look towards the sustainability of our digital preservation activities. How do we ensure that the earlier work and the work being done by us does not get repeated in 2-5 years time?

Silos in institutions

This is especially complicated when dealing with institutions like Oxford and Cambridge. We are big and old institutions with teams often working in silos. What does siloing have an effect on? Well, everything. Communication, effort, research—it all suffers. Work done previously is done again. Over and over.

The same problems are being tackled within different silos; this is duplicated and wasted effort if they are not communicating their work to each other. This means that digital preservation efforts can be fractured and imbalanced if institutional collaboration is ignored. We have an opportunity and responsibility in this project to get people together and to get them to talk openly about the digital preservation problems they are each trying to tackle.

Managers need to lead the culture change in the institution

While not always the case, it is important that managers do not just sit back and say “you will never get this to work” or “it has always been this way.” We need them on our side; they after often the gatekeepers of silos. We have to bring them together in order to start opening the silos.

It is within their power to be the agents of change; we have to empower them to believe in changing the habits of our institution. They have to believe that digital preservation is worth it if their team will also.

This might be the ‘carrot and stick’ approach or the ‘carrot’ only, but whatever approach is used, the are a number of points we agreed needed to be made clear:

  • our digital collections are significant and we have made assurances about their preservation and long term access
  • our institutional reputation plays a role in the preservation our digital assets
  • digital preservation is a moving target and we must be moving with it
  • digital preservation will not be “solved” through this project, but we can make a start; it is important that this is not then the end.

Roadmap to sustainable digital preservation

Backing up any messages is the need for a sustainable roadmap. If you want change to succeed and if you want digital preservation to be a core activity, then steps must be actionable and incremental. Find out where you are, where you want to go and then outline the timeline of steps it will take to get there. Consider using maturity models to set goals for your roadmap, such as Kenney and McGovern’s, Brown’s or the NDSA model. Each are slightly different and some might be more suitable for your institutions than others, so have a look at all of them.

It’s like climbing a mountain. I don’t look at the peak as I walk; it’s too far away and too unattainable. Instead, I look at my feet and the nearest landmark. Every landmark I pass is a milestone and I turn my attention to the next one. Sometimes I glance up at the peak, still in the distance—over time it starts to grow closer. And eventually, my landmark is the peak.

It’s only when I get to the top that I see all of the other mountains I also have to climb. And so I find my landmarks and continue on. I consider digital preservation a bit of the same thing.

What are your suggestions for breaking down the silos and getting fractured teams to work together? 

Over 20 years of digitization at the Bodleian Libraries

Policy and Planning Fellow Edith writes an update on some of her findings from the DPOC project’s survey of digitized images at the Bodleian Libraries.


During August-December 2016 I have been collating information about Bodleian Libraries’ digitized collections. As an early adopter of digitization technology, Bodleian Libraries have made digital surrogates of its collections available online since the early 1990’s. A particular favourite of mine, and a landmark among the Bodleian Libraries’ early digital projects, is the Toyota Transport Digitization Project (1996). [Still up and running here]

At the time of the Toyota Project, digitization was still highly specialised and the Bodleian Libraries opted to outsource the digital part to Laser Bureau London. Laser Bureau ‘digitilised’ 35mm image negatives supplied by Bodleian Libraries’ imaging studio and sent the files over on a big bundle of CDs. 1244 images all in all – which was a massive achievement at the time. It is staggering to think that we could now produce the same many times over in just a day!

Since the Toyota projects completion twenty years ago, Bodleian Libraries have continued large scale digitization activities in-house via its commercial digitization studio, outsourced to third party suppliers, and in project partnerships. With generous funding from the Polonsky Foundation the Bodleian Libraries are now set to add over half a million image surrogates of Special Collection manuscripts to its image portal – Digital.Bodleian.

What happens to 20 years’ worth of digitized material? Since 1996 both Bodleian Libraries and digitization standards have changed massively. Early challenges around storage alone have meant that content inevitably has been squirreled away in odd locations and created to the varied standards of the time. Profiling our old digitized collections is the first step to figuring out how these can be brought into line with current practice and be made more visible to library users.

“So what is the extent of your content?”, librarians from other organisations have asked me several times over the past few months. In the hope that it will be useful for other organisations trying to profile their legacy digitized collections, I thought I would present some figures here on the DPOC blog.

When tallying up our survey data, I came to a total of approximately 134 million master images in primarily TIFF and JP2 format. From very early digitization projects however, the idea of ‘master files’ was not yet developed and master and access files will, in these cases, often be one and the same.

The largest proportion of content, some 127,000,000 compressed JP2s, were created as part of the Google Books project up to 2009 and are available via Search Oxford Libraries Online. These add up to 45 TB of data. The library further holds three archives of 5.8million/99.4TB digitized image content primarily created by the Bodleian Libraries’ in-house digitization studio in TIFF. These figures does not include back-ups – with which we start getting in to quite big numbers.

Of the remaining 7 million digitized images which are not from the Google Books project, 2,395,000 are currently made available on a Bodleian Libraries website. In total the survey examined content from 40 website applications and 24 exhibition pages. 44% of the images which are made available online were, at the time of the survey, hosted on Digital.Bodleian, 4% on ODL Greenstone and 1% on Luna.The latter two are currently in the processes of being moved onto Digital.Bodleian. At least 6% of  content from the sample was duplicated across multiple website applications and are candidates for deduplication. Another interesting fact from the survey is that JPEG, JP2 (transformed to JPEG on delivery) and GIF are by far the most common access/derivative formats on Bodleian Libraries’ website applications.

The final digitized image survey report has now been reviewed by the Digital Preservation Coalition and is being looked at internally. Stay tuned to hear more in future blog posts!