Project update: available datasets

Oxford’s Outreach and Training Fellow, Sarah, announces the first available datasets from the DPOC project. This is part of the project’s self-archiving initiative, where they will be making sure project outputs have a permanent home.


As the project begins to come to a close (or in my case, maternity leave starts next week), we’ve begun efforts to self-archive the project. We’ll be employing a variety of methods to clean out SharePoint sites and identify records with enduring value to the project. We’ll be crawling websites and Twitter to make sure we have a record for future digital preservation projects to utilise. Most importantly, we’ll give our project outputs a long-term home so they can be reused as necessary.

That permanent home is of course our institutional repositories. Our conference papers, presentations, posters, monograph chapters and journal articles will rest there. But so will numerous datasets and records of reports and other material that will be embargoed. I’ve started depositing my datasets already, into ORA (Oxford University Research Archive).

There are two new datasets now available for download:

You can also find links to them on the Project Resources page. As more project outputs are made available through institutional repositories, we’ll be making more announcements. And at the end of the project, we’ll do a full blog post on how we self-archived the DPOC project, so that the knowledge gained will not be lost after the project ends.


Any tips for how you self-archive a project? Share them in the comments.

Project update

A project update from Edith Halvarsson, Policy and Planning Fellow at Bodleian Libraries. 


Ms Arm.e.1, Folio 23v

Bodleian Libraries’ new digital preservation policy is now available to view on our website, after having been approved by Bodleian Libraries’ Round Table earlier this year.

The policy articulates Bodleian Libraries’ approach and commitment to digital preservation:

“Bodleian Libraries preserves its digital collections with the same level of commitment as it has preserved its physical collections over many centuries. Digital preservation is recognized as a core organizational function which is essential to Bodleian Libraries’ ability to support current and future research, teaching, and learning activities.”

 

Click here to read more of Bodleian Libraries’ policies and reports.

In other related news we are currently in the process of ratifying a GLAM (Gardens, Libraries and Museums) digital preservation strategy which is due for release after the summer. Our new digitization policy is also in the pipelines and will be made publicly available. Follow the DPOC blog for future updates.

Digital Preservation Roadshow – Part 2

Building on the success of CUL’s digital preservation roadshow kit, the Oxford fellows have begun assembling a local version. The kit is a mixture of samples of old hardware, storage technology, quiz activities, and general “digital preservation swag”.

Pens, pins, and a BBC Micro

We were able to trial run it as part of a GLAM (Gardens, Libraries and Museums) showcase at the Weston Library this January. Among the showcase attendees’ favourite items was an early floppy disk camera (c.1998) and our BBC Micro Computer (1981).

Sony Digital Mavica (MVC-FD7) 

Technical Fellow James Mooney at the Oxford GLAM Showcase

Our floppy disk camera was among the first in the Mavica “FD” series from Sony. Sony produced 3.5” floppy disk cameras from late 1997 until 2002 (when it moved on to Mavica for CD). MVC-FD7 takes 8-bit images which can be easily transferred to a home computer. This is one of the reasons that the Mavica FD series was so popular – the FAT12 file system and wide spread adoption of 3.5″ floppy disk drives in computers made transfer a simple and quick task.

It is easy to forget that the floppy disk camera is really the grandfather of the microSD card!

 

 

BBC Micro

The BBC Micro is well known by most British people who went to school in the 1980s and ’90s – but even today some UK classrooms will feature a BBC Micro for more nostalgic reasons.  The BBC Microcomputer series was design and built by Acorn for the BBC Computer Literacy Project. Most schools in the UK adopted the system, and for many children the BBC BASIC programming language was the first one they learnt.

There is to this day a cult following of BBC Micro educational games, such as Granny’s Garden (1983).


The kit will be displayed in different Oxford libraries throughout 2018 to promote the DPOC training programme and raise awareness of Bodleian Libraries’ new digital preservation policy.


The vision for a preservation repository

Over the last couple of months, work at Cambridge University Library has begun to look at what a potential digital preservation system will look like, considering technical infrastructure, the key stakeholders and the policies underpinning them. Technical Fellow, Dave, tells us more about the holistic vision…


This post discusses some of the work we’ve been doing to lay foundations beneath the requirements for a ‘preservation system’ here at Cambridge. In particular, we’re looking at the core vision for the system. It comes with the standard ‘work in progress’ caveats – do not be surprised if the actual vision varies slightly (or more) from what’s discussed here. A lot of the below comes from Mastering the Requirements Process by Suzanne and James Robertson.

Also – it’s important to note that what follows is based upon a holistic definition of ‘system’ – a definition that’s more about what people know and do, and less about Information Technology, bits of tin and wiring.

Why does a system change need a vision?

New systems represent changes to the existing status-quo. The vision is like the Pole Star for such a change effort – it ensures that people have something fixed to move towards when they’re buried under minute details. When confusion reigns, you can point to the vision for the system to guide you back to sanity.

Plus, as with all digital efforts, none of this is real: there’s no definite, obvious end point to the change. So the vision will help us recognise when we’ve achieved what we set out to.

Establishing scope and context

Defining what the system change isn’t is a particularly good a way of working out what it actually represents. This can be achieved by thinking about the systems around the area you’re changing and the information that’s going to flow in and out. This sort of thinking makes for good diagrams: one that shows how a preservation repository system might sit within the broader ecosystem of digitisation, research outputs / data, digital archives and digital published material is shown below.

System goals

Being able to concisely sum-up the key goals of the system is another important part of the vision. This is a lot harder than it sounds and there’s something journalistic about it – what you leave out is definitely more important than what you keep in. Fortunately, the vision is about broad brush strokes, not detail, which helps at this stage.

I found some great inspiration in Sustainable Economics for a Digital Planet, which indicated goals such as: “the system should make the value of preserving digital resources clear”, “the system should clearly support stakeholders’ incentives to preserve digital resources” and “the functional aspects of the system should map onto clearly-defined preservation roles and responsibilities”.

Who are we implementing this for?

The final main part of the ‘vision’ puzzle is the stakeholders: who is going to benefit from a preservation system? Who might not benefit directly, but really cares that one exists?

Any significant project is likely to have a LOT of these, so the Robertsons suggest breaking the list down by proximity to the system (using Ian Alexander’s Onion Model), from the core team that uses the system, through the ‘operational work area’ (i.e. those with the need to actually use it) and out to interested parties within the host organisation, and then those in the wider world beyond. An initial attempt at thinking about our stakeholders this way is shown below.

One important thing that we realised was that it’s easy to confuse ‘closeness’ with ‘importance’: there are some very important stakeholders in the ‘wider world’ (e.g. Research Councils or historians) that need to be kept in the loop.

A proposed vision for our preservation repository

After iterating through all the above a couple of times, the current working vision (subject to change!) for a digital preservation repository at Cambridge University Library is as follows:

The repository is the place where the best possible copies of digital resources are stored, kept safe, and have their usefulness maintained. Any future initiatives that need the most perfect copy of those resources will be able to retrieve them from the repository, if authorised to do so. At any given time, it will be clear how the digital resources stored in the repository are being used, how the repository meets the preservation requirements of stakeholders, and who is responsible for the various aspects of maintaining the digital resources stored there.

Hopefully this will give us a clear concept to refer back to as we delve into more detail throughout the months and years to come…

DPOC: 1 year on

Oxford’s Outreach & Training Fellow, Sarah, reflects on how the first year of the DPOC project has gone and looks forward to the big year ahead.


A lot can happen in a year.

A project can finally get a name, a website can launch and a year of auditing can finally reach completion. It has been a long year of lessons and finding things for the Oxford DPOC team.

While project DR@CO and PADLOC never got off the ground, we got the DPOC Project. And with it has come a better understanding of our digital preservation practices at Bodleian Libraries. We’re starting year two with plenty of informed ideas that will lead to roadmaps for implementation and a business case to help continue to move Oxford forward with a digital preservation programme.

Auditing our collections

For the past year, Fellows have been auditing the many collections. The Policy and Planning Fellow spent nearly 6 months tracking down the digitized content of Bodleian Libraries across tape storage and many legacy websites. There was more to be found on hard drives under desks, on network drives and CDs. What Edith found was 20 years of digitized images at Bodleian Libraries. From that came a roadmap and recommendations to improve storage, access and workflows. Changes have already been made to the digitization workflow (we use jpylyzer now instead of jhove) and more changes are in progress.

James, the Technical Fellow at Oxford, has been looking at validating and characterising the TIFFs we have stored on tape, especially the half a million TIFFs from the Polonsky Foundation Digitization Project. There were not only some challenges to recovering the files from tape to disk for the characterisation and validating process, but there was issue with customising the output from JHOVE in XML. James did find a workaround to getting the outputs into a reporting tool for assessment in the end, but not without plenty of trial and error. However, we’re learning more about our digitized collections (and the preservation challenges facing them) and during year 2 we’ll be writing more about that as we continue to roadmap our future digital preservation work.

Auditing our skills

I spoke to a lot of staff and ran an online survey to understand the training needs of Bodleian Libraries. It is clear that we need to develop a strong awareness about digital preservation and its fundamental importance to the long-term accessibility of our digital collections. We also need to create a strong shared language in order to have these important discussions; this is important when we are coming together from several different disciplines, each with a different language. As a result, some training has begun in order to get staff thinking about the risks surrounding the digital content we use every day, in order to later translate it into our collections. The training and skills gaps identified from the surveys done in year 1 will continue to inform the training work coming in year 2.

 

What is planned for year 2?

Now that we have a clearer picture of where we are and what challenges are facing us, we’ve been putting together roadmaps and risk registers. This is allowing us to look at what implementation work we can do in the next year to set us up for the work of the next 3, 5, 10, and 15 years. There are technical implementations we have placed into a roadmap to address the major risks highlighted in our risk register. This work is hopefully going to include things like implementing PREMIS metadata and file format validation. This work will prepare us for future preservation planning.

We also have a training programme roadmap and implementation timeline. While not all of the training can be completed in year 2 of the DPOC project, a start can be made and materials prepared for a future training programme. This includes developing a training roadmap to support the technical implementations roadmap and the overall digital preservation roadmap.

There is also the first draft of our digital preservation policy to workshop with key stakeholders and develop into a final draft. There are roles and responsibilities to review and key stakeholders to work with if we want to make sustainable changes to our existing workflows.

Ultimately, what we are working towards is an organisational change. We want more people to think about digital preservation in their work. We are putting forward sustainable recommendations to help develop an ongoing digital preservation programme. There is still a lot a work ahead of us — well beyond the final year of this project — but we are hoping that what we have started will keep going even after the project reaches completion.

 

 

Visit to the Parliamentary Archives: Training and business cases

Edith Halvarsson, Policy and Planning Fellow at Bodleian Libraries, writes about the DPOC project’s recent visit to the Parliamentary Archives.


This week the DPOC fellows visited the Parliamentary Archives in London. Thank you very much to Catherine Hardman (Head of Preservation and Access), Chris Fryer (Digital Archivist) and Grace Bell (Digital Preservation Trainee) for having us. Shamefully I have to admit that we have been very slow to make this trip; Chris first invited us to visit all the way back in September last year! However, our tardiness to make our way to Westminster was in the end aptly timed with the completion of year one of the DPOC project and planning for year 2.

Like CUL and Bodleian Libraries, the Parliamentary Archives also first began their own Digital Preservation Project back in 2010. Their project has since transitioned into digital preservation in a more programmatic capacity as of 2015. As CUL and Bodleian Libraries will be beginning to draft business cases for moving from project to programme in year 2; meeting with Chris and Catherine was a good opportunity to talk about how you start making that tricky transition.

Of course, every institution has its own drivers and risks which influence business cases for digital preservation, but there are certain things which will sound familiar to a lot of organisations. For example, what Parliamentary Archives have found over the past seven years, is that advocacy for digital collections and training staff in digital preservation skills is an ongoing activity. Implementing solutions is one thing, whereas maintaining them is another. This, in addition to staff who have received digital preservation training eventually moving on to new institutions, means that you constantly need to stay on top of advocacy and training. Making “the business case” is therefore not a one-off task.

Another central challenge in terms of building business cases, is how you frame digital preservation as a service rather than as “an added burden”. The idea of “seamless preservation” with no human intervention is a very appealing one to already burdened staff, but in reality workflows need to be supervised and maintained. To sell digital preservation, that extra work must therefore be perceived as something which adds value to collection material and the organisation. It is clear that physical preservation adds value to collections, but the argument for digital preservation can be a harder sell.

Catherine had, however, some encouraging comments on how we can attempt to turn advice about digital preservation into something which is perceived as value adding.  Being involved with and talking to staff early on in the design of new project proposals – rather than as an extra add on after processes are already in place – is an example of this.

Image by James Mooney

All in all, it has been a valuable and encouraging visit to the Parliamentary Archives. The DPOC fellows look forward to keeping in touch – particularly to hear more about the great work Parliamentary Archive have been doing to provide digital preservation training to staff!

What is holding us back from change?

There are worse spots for a meeting. Oxford. Photo by: S. Mason

Every 3 months the DPOC teams gets together in person in either Oxford, Cambridge or London (there’s also been talk of taking a meeting at Bletchley Park sometime). As this is a collaborative effort, these meetings offer a rare opportunity to work face-to-face instead of via Skype with the endless issues around screen sharing and poor connections. Good ideas come when we get to sit down together.

As our next joint board meeting is next week, it was important to look over the work of the past year and make sure we are happy with the plan for year two. Most importantly, we wanted to discuss the messages we need to give our institutions as we look towards the sustainability of our digital preservation activities. How do we ensure that the earlier work and the work being done by us does not get repeated in 2-5 years time?

Silos in institutions

This is especially complicated when dealing with institutions like Oxford and Cambridge. We are big and old institutions with teams often working in silos. What does siloing have an effect on? Well, everything. Communication, effort, research—it all suffers. Work done previously is done again. Over and over.

The same problems are being tackled within different silos; this is duplicated and wasted effort if they are not communicating their work to each other. This means that digital preservation efforts can be fractured and imbalanced if institutional collaboration is ignored. We have an opportunity and responsibility in this project to get people together and to get them to talk openly about the digital preservation problems they are each trying to tackle.

Managers need to lead the culture change in the institution

While not always the case, it is important that managers do not just sit back and say “you will never get this to work” or “it has always been this way.” We need them on our side; they after often the gatekeepers of silos. We have to bring them together in order to start opening the silos.

It is within their power to be the agents of change; we have to empower them to believe in changing the habits of our institution. They have to believe that digital preservation is worth it if their team will also.

This might be the ‘carrot and stick’ approach or the ‘carrot’ only, but whatever approach is used, the are a number of points we agreed needed to be made clear:

  • our digital collections are significant and we have made assurances about their preservation and long term access
  • our institutional reputation plays a role in the preservation our digital assets
  • digital preservation is a moving target and we must be moving with it
  • digital preservation will not be “solved” through this project, but we can make a start; it is important that this is not then the end.

Roadmap to sustainable digital preservation

Backing up any messages is the need for a sustainable roadmap. If you want change to succeed and if you want digital preservation to be a core activity, then steps must be actionable and incremental. Find out where you are, where you want to go and then outline the timeline of steps it will take to get there. Consider using maturity models to set goals for your roadmap, such as Kenney and McGovern’s, Brown’s or the NDSA model. Each are slightly different and some might be more suitable for your institutions than others, so have a look at all of them.

It’s like climbing a mountain. I don’t look at the peak as I walk; it’s too far away and too unattainable. Instead, I look at my feet and the nearest landmark. Every landmark I pass is a milestone and I turn my attention to the next one. Sometimes I glance up at the peak, still in the distance—over time it starts to grow closer. And eventually, my landmark is the peak.

It’s only when I get to the top that I see all of the other mountains I also have to climb. And so I find my landmarks and continue on. I consider digital preservation a bit of the same thing.

What are your suggestions for breaking down the silos and getting fractured teams to work together? 

Over 20 years of digitization at the Bodleian Libraries

Policy and Planning Fellow Edith writes an update on some of her findings from the DPOC project’s survey of digitized images at the Bodleian Libraries.


During August-December 2016 I have been collating information about Bodleian Libraries’ digitized collections. As an early adopter of digitization technology, Bodleian Libraries have made digital surrogates of its collections available online since the early 1990’s. A particular favourite of mine, and a landmark among the Bodleian Libraries’ early digital projects, is the Toyota Transport Digitization Project (1996). [Still up and running here]

At the time of the Toyota Project, digitization was still highly specialised and the Bodleian Libraries opted to outsource the digital part to Laser Bureau London. Laser Bureau ‘digitilised’ 35mm image negatives supplied by Bodleian Libraries’ imaging studio and sent the files over on a big bundle of CDs. 1244 images all in all – which was a massive achievement at the time. It is staggering to think that we could now produce the same many times over in just a day!

Since the Toyota projects completion twenty years ago, Bodleian Libraries have continued large scale digitization activities in-house via its commercial digitization studio, outsourced to third party suppliers, and in project partnerships. With generous funding from the Polonsky Foundation the Bodleian Libraries are now set to add over half a million image surrogates of Special Collection manuscripts to its image portal – Digital.Bodleian.

What happens to 20 years’ worth of digitized material? Since 1996 both Bodleian Libraries and digitization standards have changed massively. Early challenges around storage alone have meant that content inevitably has been squirreled away in odd locations and created to the varied standards of the time. Profiling our old digitized collections is the first step to figuring out how these can be brought into line with current practice and be made more visible to library users.

“So what is the extent of your content?”, librarians from other organisations have asked me several times over the past few months. In the hope that it will be useful for other organisations trying to profile their legacy digitized collections, I thought I would present some figures here on the DPOC blog.

When tallying up our survey data, I came to a total of approximately 134 million master images in primarily TIFF and JP2 format. From very early digitization projects however, the idea of ‘master files’ was not yet developed and master and access files will, in these cases, often be one and the same.

The largest proportion of content, some 127,000,000 compressed JP2s, were created as part of the Google Books project up to 2009 and are available via Search Oxford Libraries Online. These add up to 45 TB of data. The library further holds three archives of 5.8million/99.4TB digitized image content primarily created by the Bodleian Libraries’ in-house digitization studio in TIFF. These figures does not include back-ups – with which we start getting in to quite big numbers.

Of the remaining 7 million digitized images which are not from the Google Books project, 2,395,000 are currently made available on a Bodleian Libraries website. In total the survey examined content from 40 website applications and 24 exhibition pages. 44% of the images which are made available online were, at the time of the survey, hosted on Digital.Bodleian, 4% on ODL Greenstone and 1% on Luna.The latter two are currently in the processes of being moved onto Digital.Bodleian. At least 6% of  content from the sample was duplicated across multiple website applications and are candidates for deduplication. Another interesting fact from the survey is that JPEG, JP2 (transformed to JPEG on delivery) and GIF are by far the most common access/derivative formats on Bodleian Libraries’ website applications.

The final digitized image survey report has now been reviewed by the Digital Preservation Coalition and is being looked at internally. Stay tuned to hear more in future blog posts!

(Mis)Adventures in guest blogging

Sarah shares her recent DPC guest blogging experience. The post is available to read at: http://www.dpconline.org/blog/beware-of-the-leopard-oxford-s-adventures-in-the-bottom-drawer 


As members of the Digital Preservation Coalition (DPC), we have the opportunity to contribute to their blog on issues in digital preservation. As the Outreach & Training Fellow at Oxford, that tasks falls upon me when its our turn to contribute.

You would think that because I contribute to this blog regularly,  I’d be an old hat at blogging. It turns out that writer’s block can hit at precisely the worst possible time. But, I forced out what I could and then turned to the other Fellows at Oxford for support. Edith and James both added their own work to the post.

With a final draft ready, the day approached when we could submit it to the blog. Even the technically-minded struggled with technology now and again. First, it was the challenge of uploading images—it only took about 2 or 3 tries and then I deleted the evidence mistakes. Finally, I clicked ‘submit’ and waited for confirmation.

And I waited…

And got sent back to the homepage. Then I got a ‘failure notice’ email that said “I’m afraid I wasn’t able to deliver your message to the following addresses. This is a permanent error; I’ve given up. Sorry it didn’t work out.” What just happened? Did it work or not?

So I tried again….

And again…

And again.  I think I submitted 6 more times before I emailed to the DPC to ask what I had done wrong. I had done NOTHING wrong, except press ‘submit’ too much. There were as many copies waiting for approval as there were times when I had hit ‘submit’. There was no way to delete the evidence, so I couldn’t avoid that embarrassment.

Minus those technological snafus, everything worked and the DPOC team’s first guest blog post is live! You can read the post here for an Oxford DPOC project update.

Now that I’ve got my technological mistakes out of the way, I think I’m ready to continue contributing to the wider digital preservation community through guest blogging. We are a growing (but still relatively small) community and sharing our knowledge, ideas and experiences freely through blogs is important. We rely on each other to navigate the field where things can be complex and ever-changing. Journals and project websites date quickly, but community-driven and non-profit blogs remain a good source of relevant and immediate information. They are valuable part of my digital preservation work and I am happy to be giving back.

 

Save Comic Sans

Happy April Fools’ Day! This was the joke post put out by the DPOC team. Though none of the following post is true (Comic Sans is going nowhere so far as we know), it is important to think about the preservation of font files. Ever notice that if a certain font file is not installed in your computer, the certain files can look completely different? Suddenly specialised font files become an important part of the digital file (maintaining its original look and feel) and preserving it becomes important. Just something to think about.


Save Comic Sans!

We were deeply saddened by today’s news that Microsoft Office products will in the future stop supporting the iconic Comic Sans font. The decision comes as a direct reaction to the slow decline in popularity and uptake from the Microsoft user community. The font became a staple in the mid 1990’s, but has seen a back-lash, particularly from the media industry, over the last few years. Repeated ridicule from leading public relation agencies and graphic designers has inevitably led to the drastic response from Microsoft.

‘Ban Comic Sans’, a fanatic society of typographic purists, have after an extensive smear campaign fought over a 15-year period finally won their case. “Clearly, Comic Sans as a voice conveys silliness, childish naiveté, irreverence, and is far too casual[…]”, they comment gleefully following the news from Microsoft Head Office.

(Above: Propaganda spread by the “group” Ban Comic Sans http://bancomicsans.com/propaganda/)

As preservation professionals and historians, we feel that it is our duty to speak up for all the other lovers of the font. Fans who have for years been shamed into silence by the widespread acceptance of these fanatical views. The digital preservation of Comic Sans is not only about safeguarding 20 years of cultural history, but it is also about doing the right thing for our children and grandchildren. As a small tribute, and as a show of our appreciation www.dpoc.ac.uk, will from now on only blog in Comic Sans. We refuse to say RIP to the font – we say it is time to fight the good fight.

If you have an anecdote about a time you enjoyed Comic Sans – please comment below and show your support. Perhaps we can make a difference together.