Cambridge University Libraries inaugural Digital Preservation Policy

The inaugural Cambridge University Libraries Digital Preservation Policy has been published last week. Somaya Langley (Cambridge Policy & Planning Fellow) provides some insight into the policy development process and announces a policy event in London, presented in collaboration with Edith (Oxford Policy & Planning Fellow) to be held in early December 2018.


In December 2016, I started the digital preservation policy development process for Cambridge University Library (CUL), which has finally culminated in a published policy.

Step one

Commencing with a ‘quick and dirty’ policy gap analysis at CUL, what I discovered was not so much that there were some gaps in their existing policy landscape but rather that there was a dearth of much-needed policies. The gap analysis at CUL found that a few key policies did exist for different audiences (some intended to guide CUL, some to guide researchers and some meant for all staff and researchers working at the University of Cambridge). While my counterpart at Oxford found there was duplication in their policies across Bodleian Libraries and the University of Oxford, I mostly found chasms.

Next step

The second step in the policy development process was attempting to meet an immediate need from staff, by adding some “placeholder” digital preservation statements into the Collection Care and Conservation Policy that was currently under review. In the longer term, while it might be ideal to combine a preservation policy into one (encompassing the conservation and preservation of physical and digital collection items), CUL’s digital preservation maturity and skill capabilities are too low at present. Focus needed to be really drawn to how to manage digital content, hence the need for a separate Cambridge University Libraries Digital Preservation Policy.

That said, like everything else I’ve been doing at Cambridge, it needed to be addressed holistically. And policy is no exception. Being able to undertake about two full weeks of work (spanning several months in early 2017) contributing to the review of the Collection Care and Conservation Policy has meant including some statements in this policy that will support better care for digital (and audiovisual) content still remaining on carriers (that are yet to be transferred).

Collaborative development

Then in June 2017 we moved onto undertaking policy development collaboratively. Part of this was to do an international digital preservation policy review – looking at dozens of different policies (and some strategies). Edith wrote about the policy development process back in middle of last year.

The absolute lion’s share of the work was carried out by my Oxford counterparts, Edith and Sarah. Due to other work priorities, I didn’t have much available time during this stage. This is why it is so important to have a team – whether this is a co-located team or distributed across an organisation or multiple organisations – when working in the digital preservation space. I really can’t thank them enough for carrying the load for this task.

Policy template

My contribution was to develop a generic policy template, for use in both our organisations. For those that know me, you will know I prefer to ‘borrow and adapt’ rather than reinvent the wheel. So I used the layout of policies from a previous workplace and constructed a template for use by CUL and the Bodleian Libraries. I was particularly keen to ensure what I developed was generic, so that it could be used for any type of policy development in future.

This template has now been provided to the Digital Preservation Coalition, who will make it available with other documents in the coming years – so that some of this groundwork doesn’t have to be carried out by every other organisation still needing to do digital preservation policy (or other policy) development. We found in our international digital preservation maturity and resourcing survey (another blog post on this is still to follow), that there’s still at least 42% of organisations internationally, that do not have a digital preservation policy.

Who has a digital preservation policy?

What next?

Due to other work priorities, drafting the digital preservation policy didn’t properly commence until earlier this year. But by this point I had a good handle on my organisation’s specific:

  • Challenges and issues related to digital content (not just preservation and management concerns)
  • High-level ‘profile’ of digital collections, right across all content ‘classes’
  • Gaps in policy, standards, procedures and guidelines (PSPG) as well as strategy
  • Appreciation of a wide-range of digital preservation policies (internationally)
  • Digital preservation maturity (holistic, not just technical) – based on maturity assessments using several digital preservation maturity models
  • Governance (related to policy and strategy)
  • Language relevant to my organisation
  • Responsibilities across the organisation
  • Relevant legislation (UK/EU)

This formed my approach of how to draft the digital preservation policy, that would meet CUL’s needs.

Approach

I realised that CUL required a comprehensive policy, that would fill the many gaps that ideally other policies would cover. I should note that there are many ways of producing a policy, and it does have to be tailored to meet the needs of your organisation. (You can compare with Edith’s digital preservation policy for the Bodleian Libraries, Oxford.)

The next steps involved:

  • Gathering requirements (this had already taken place during 2017)
  • Setting out a high-level structure/list of points to address
  • Defining the stakeholder group membership (and ways of engaging with them)
  • Setting the frame of the task ahead
  • Agreeing on the scope (this changed from ‘Cambridge University Library’ to ‘Cambridge University Libraries’ – encompassing CUL’s affiliate and dependent libraries‘)

Then came the iterative process of:

  1. Drafting policy statements and principles
  2. Meeting with the stakeholder group and discussing the draft
  3. Gathering feedback on the policy draft (internally and externally)
  4. Incorporating feedback
  5. Circulating a new version of the draft
  6. Developing associated documentation (to support the policy)

Once a final version had been reached, this was followed by the approvals and ratification process.

What do we have?

Last week, the inaugural Cambridge University Libraries Digital Preservation Policy was published (which was not without a few more hurdles).

It has been an ‘on again, off again’ process that has taken 23 months in total. Now we can say that for CUL and the University of Cambridge, that:

“Long-term preservation of digital content is essential to the University’s mission of contributing to society through the pursuit of education, learning, and research.”

Which compliments some of our other CUL policies.

What now?

This is never the end of a policy process. Policy should be a ‘live and breathing’ process, with the policy document itself purely being there to keep a record of the agreed upon decisions and principles.

So, of course there is more to do. “But what’s that?”, I hear you say.

Join us

There is so much more that Edith and I would like to share with you about our policy development journey over the past two years of the Digital Preservation at Oxford and Cambridge (DPOC) project.

So much so that we’re running an event in London on Tuesday 4th December 2018 on Devising Your Digital Preservation Policy, hosted by the DPC. (There is one seat left – if you’re quick, that could be you).

We’re also lucky to be joined by two ‘provocateurs’ for the day:

  • Kirsty Lingstadt, Head of Digital Library and Deputy Director of Library and University Collections, University of Edinburgh
  • Jenny Mitcham, Head of Good Practice and Standards, Digital Preservation Coalition (who has just landed in her new role – congrats & welcome to Jenny!)

There is so much more I could say about policy development in relation to digital content, but I’ll leave it there. I do hope you get to hear Edith and I wax lyrical about this.

Thank-yous

Finally, I must thank my Cambridge Polonsky team members, Edith Halvarsson (my Oxford counterpart), plus Paul Wheatley and William Kilbride from the DPC. Policy can’t be developed in a void and their contributions and feedback have been invaluable.

Electronic lab notebooks and digital preservation: part II

In her previous blog post on electronic lab notebooks (ELNs), Sarah outlined a series of research questions that she wanted to pursue to see what could be preserved from an ELN. Here are some of her results.


In my last post, I had a number of questions that I wanted to answering regarding the use of ELNs at Oxford, since IT Services is currently running a pilot with LabArchives.

Those questions were:

  1. Authenticity of research – are timestamps and IP addresses retained when the ELN is exported from LabArchives?
  2. Version/revision history – Can users export all previous versions of data? If not users, then can IT Services? Can the information on revision history be exported, even if not the data?
  3. Commenting on the ELN – are comments on the ELN exported? Are they retained if deleted in revision history?
  4. Export – What exactly can be exported by a user? What does it look like? What functionality do you have with the data? What is lost?

What did I find out?

I started first with looking at the IT Services’ webpage on ELNs. It mentions what you can download (HTML or PDF), but it doesn’t offer much more about the long-term retention of it. There’s a lot of useful advice on getting started with ELNs though and how to use the notebook.

In the Professional version that staff and academics can use offers two modes of export:

  • Notebook to PDF
  • Offline Notebook – HTML

When you request one of these functions, LabArchives will email it to the email address associated with your work. It should happen within 60 minutes. Then you will have 24 hours to download the file. So, the question is: what do you get with each?

PDF

There are two options when you go to download your PDF: 1) including comments and 2) including empty folders.

So, this means that comments are retained in the PDF and they look something like this:

It also means that where possible, previews of images and documents show up in the PDF. As do the latest timestamps.

What you lose is:

  • previous versions and revision history
  • the ability to use files – these will have to be downloaded and saved separately (but this was expected from a PDF)

What you get:

  • a tidy, printable version of a lab notebook in its most recent iteration (including information on who generated the PDF and when)

What the PDF cover of a lab notebook looks like.

Offline HTML version

In this version, you are delivered a zip file which contains a number of folders and documents.

All of the attachments are stored under the attachments folder, both as original and thumbnails (which are just low res jpegs used by LabArchives).

How does the HTML offline version stack up? Overall, the functionality for browsing is pretty good and latest timestamps are retained. You can also directly download the attachments on each page.

In this version, you do not get the comments. You also do not get any previous versions, only the latest files, updates and timestamps. But unlike the PDF, it is easy to navigate and the uploaded attachments can be opened, which have not been compressed or visibly changed.

I would recommend taking a copy of both versions, since each one offers some different functions. However, neither offer a comprehensive export. Still, the most recent timestamps are useful for authenticity, though checksums for files generated on upload and given you to in an HTML export in a manifest file would be even better.

Site-wide backup

Neither export option open to academics or staff allows a comprehensive version of the ELN. Something is lost in the export. But, what LabArchives does offer is an annual site-wide back up to local IT Services as part of their Enterprise agreement. That includes: all timestamps, comments and versions. The copy contains everything. This is promising, so all academics should be aware of this because they can then request a copy from IT Services. And they should be able to get a full comprehensive backup of their ELN. This also means that IT Services is also preserving a copy of the ELNs, like LabArchives.

So, we are going to follow up with IT Services, to talk about how they will preserve and provide access to these ELN backups as part of the pilot. Many of you will have similar conversations with your own IT departments over time, as you will need to work closely with them to ensure good digital preservation practices.

And these are some of the questions you may want to consider asking when talking with your IT department about the preservation of ELNs:

  • How many backups? Where are the backups stored? What mediums are being used? Are backups checked and restored as part of testing and maintenance? How often is the media refreshed?
  • What about fixity?
  • What about the primary storage? Is it checked or refreshed regularly? Is there any redundancy if that primary storage is online? If it is offline, how can it be requested by staff?
  • What metadata is being kept and created about the different notebooks?
  • What file formats are being retained? Is any data being stored on the different file formats? Presumably with research data, there would be a large variety of data.
  • How long are these annual backups being retained?
  • Is your IT department actively going to share the ELNs with staff?
  • If it is currently the PI and department’s responsibility to store physical notebooks, what will be the arrangement with electronic ones?

Got anything else you would ask your IT department when looking into preserving ELNs? Share in the comments below.

Electronic lab notebooks and digital preservation: part I

Outreach and Training Fellow, Sarah, writes about a trial of electronic lab notebooks (ELN) at Oxford. She discusses the requirements and purpose of the ELN trial and raises lingering questions around preserving the data from ELNs. This is part I of what will be a 2-part series.


At the end of June, James and I attended a training course on electronic lab notebooks (ELN). IT Services at the University of Oxford is currently running a trial of Lab Archives‘ ELN offering. This course was intended to introduce departments and researchers to the trial and to encourage them to start their own ELN.

Screenshot of a LabArchives electronic lab notebook

When selecting an ELN for Oxford, IT Services considered a number of requirements. Those that were most interesting from a preservation perspective included:

  • the ability to download the data to store in an institutional repository, like ORA-data
  • the ability to upload and download data in arbitrary formats and to have it bit-preserved
  • the ability to upload and download images without any unrequested lossy compression

Moving from paper-based lab notebooks to an ELN is intended to help a lot with compliance as well as collaboration. For example, the government requires every scientist to keep a record of every chemical used for their lifetime. This has a huge impact on the Chemistry Department; the best way to search for a specific chemical is to be able to do so electronically. There are also costs associated with storing paper lab notebooks. There’s also the risk of damage to the notebook in the lab. In some ways, an electronic lab notebook can solve some of those issues. Storage will likely cost less and the risk of damage in a lab scenario is minimised.

But how to we preserve that electronic record for every scientist for at least the duration of their life? And what about beyond that?

One of the researchers presenting on their experience using LabArchives’ ELN stated, “it’s there forever.” Even today, there’s still an assumption that data online will remain online forever. Furthermore, there’s an overall assumption that data will last forever. In reality, without proper management this will almost certainly not be the case. While IT Services will be exporting the ELNs for back up purposes, but management and retention periods for those exports were not detailed.

There’s also a file upload limit of 250MB per individual file, meaning that large datasets will need to be stored somewhere else. There’s no limit to the overall size of the ELN at this point, which is useful, but individual file limits may prove problematic for many researchers over time (this has already been an issue for me when uploading zip files to SharePoint).

After learning how researchers (from PIs to PhD students) are using ELNs for lab work and having a few demos on the many features of LabArchives’ ELN, we were left with a few questions. We’ve decided to create our own ELN (available to us for free at during the trial period) in order to investigate these questions further.

The questions around preserving ELNs are:

  1. Authenticity of research – are timestamps and IP addresses retained when the ELN is exported from LabArchives?
  2. Version/revision history – Can users export all previous versions of data? If not users, then can IT Services? Can the information on revision history be exported, even if not the data?
  3. Commenting on the ELN – are comments on the ELN exported? Are they retained if deleted in revision history?
  4. Export – What exactly can be exported by a user? What does it look like? What functionality do you have with the data? What is lost?

While there’s potential for ELNs to open up collaboration and curation in lab work by allowing notes and raw data to be kept together, and facilitating sharing and fast searching. However, the long-term preservation implications are still unclear and many still seem complacent about the associated risks.

We’re starting our LabArchives’ ELN now, with the hope of answering some of those questions. We also hope to make some recommendations for preservation and highlight any concerns we find.


Anyone have an experience preserving ELNs? What challenges and issues did you come across? What recommendations would you have for researchers or repository staff to facilitate preservation? 

Digital Preservation at Oxford Open Days

Oxford Fellow, Sarah, describes the DPOC team’s pop-up exhibition “Saving Digital,” held at the Radcliffe Science Library during Oxford Open Days #OxOpenDay. The post describes from the equipment and games the team showcased over the two days and some of the goals they had in mind for this outreach work.


On 27 June and 28 June, Oxford ran Open Days for prospective students. The city was alive with open doors and plenty of activity. It was the perfect opportunity for us to take our roadshow kit out and meet with prospective students with a pop-up exhibition called “Saving Digital”. The Radcliffe Science Library (RSL) on Parks Road kindly hosted the DPOC team and all of our obsolete media for two day in their lounge area.

The pop-up exhibition hosted at the RSL

We set up our table with a few goals in mind:

  • to educate prospective students about the rapid pace of technology and the concern about how we’re going to read digital data off them in the future (we educated a few parents as well!)
  • to speak with library and university staff about their digital dilemmas and what we at the digital preservation team could do about it
  • to raise awareness about the urgency and need of digital preservation in all of our lives and to inform more people about our project (#DP0C)

To achieve this, we first drew people in with two things: retro gaming and free stuff.

Last minute marketing to get people to the display. It worked!

Our two main games were the handheld game, Galaxy Invader 1000, and Frak! for the BBC Micro.

Frak! on the BBC Micro. The yellow handheld console to the right is Galaxy Invader 1000.

Galaxy Invader 1000 by CGL (1980) is a handheld game, which plays a version of Space Invaders. This game features a large multi-coloured display and 3 levels of skill. The whole game was designed to fit in 2 kilobytes of memory. 

Frak! (1984) was a game released for the BBC Micro in 1984 under the Aardvark software label. It was praised for excellent graphics and game play. In the side scrolling game, you play a caveman named Trogg. The aim of the game is to cross a series of platforms while avoiding dangers that include various monsters named Poglet and Hooter. Trogg is armed with a yo-yo for defence. 

Second, we gave them some digestible facts, both in poster form and by talking with them:

Saving Digital poster

Third, we filled the rest of the table with obsolete media and handheld devices from about the last forty years—just a small sample of what was available! This let them hold some of the media of the past, marvel over how little it could hold, but how much it could do for the time. And then we asked them how would they read the data off it today. That probably concerned parents more than their kids as several of them admitted to having important digital stuff either still on VHS or miniDV tapes, or on 3.5-inch disks! It got everyone thinking at least.

A lot of obsolete media all in one place.

Lastly, an enthusiastic team with some branded t-shirts made to emulate our most popular 1st generation badge, which was pink with a 3.5-inch disk in the middle. We gave away our last one during Open Days! But don’t worry, we have some great 2nd generation badges to collect now.

An enthusiastic team always helps. Especially if they are willing to demo the equipment.


A huge thank you to the RSL for hosting us for two days—we’ll be back on the 16th of July if you missed us and want to visit the exhibition! We’ll have a few extra retro games on hand and some more obsolete storage media!

Our poster was found on display in the RSL.

Update on the training programme pilot

Sarah, Oxford’s Outreach and Training Fellow, has been busy since the new year designing and a running a digital preservation training programme pilot in Oxford. It consisted of one introductory course on digital preservation and six other workshops. Below is an update on what she did for the pilot and what she has learnt over the past few months.


It’s been a busy few months for me, so I have been quiet on the blog. Most of my time and creative energy has been spent working on this training programme pilot. In total, there were seven courses and over 15 hours of material. In the end, I trialled the courses on over 157 people from Bodleian Libraries and the various Oxford college libraries and archives. Many attendees were repeats, but some were not.

The trial gave me an opportunity to test out different ideas and various topics. Attendees were good at giving feedback, both during the course and after via an online survey. It’s provided me with further ideas and given me the chance to see what works or what doesn’t. I’ve been able to improve the experience each time, but there’s still more work to be done. However, I’ve already learned a lot about digital preservation and teaching.

Below are some of the most important lessons I’ve learned from the training programme pilot.

Time: You always need more

I found that I almost always ran out of time at the end of a course; it left no time for questions or to finish that last demo. Most of my courses could have either benefited from less content, shorter exercises, or just being 30 minutes longer.

Based on feedback from attendees, I’ll be making adjustments to every course. Some will be longer. Some will have shorter exercises with more optional components and some will have slightly less content.

While you might budget 20 minutes for an activity, you will likely use 5-10 minutes more. But it varies every time due to the attendees. Some might have a lot of questions, but others will be quieter. It’s almost better to overestimate the time and end early than rush to cover everythhing. People need a chance to process the information you give them.

Facilitation: You can’t go it alone

In only one of my courses did I have to facilitate alone. I was run off my feet for the 2 hours because it was just me answering questions during  exercises for 15 attendees. It doesn’t sound like a lot, but I had a hoarse voice by the end from speaking for almost 2 hours!

Always get help with facilitation—especially for workshops. Someone to help:

  • answer questions during exercises,
  • get some of the group idea exercises/conversations started,
  • make extra photocopies or print outs, and
  • load programs and files onto computers—and then help delete them after.

It is possible to run training courses alone, but having an extra person makes things run smoother and saves a lot of time. Edith and James have been invaluable support!

Demos: Worth it, but things often go wrong

Demos were vital to illustrate concepts, but they were also sometimes clunky and time consuming to manage. I wrote up demo sheets to help. The demos relied on software or the Internet—both which can and will go wrong. Patience is key; so is accepting that sometimes things will not go right. Processes might take a long time to run or the course concludes before the demo is over.

The more you practice on the computer you will be using, the more likely things will go right. But that’s not always an option. If it isn’t, always have a back up plan. Or just apologise, explain what should have happened and move on. Attendees are generally forgiving and sometimes it can be turned into a really good teaching moment.

Exercises: Optional is the way to go

Unless you put out a questionnaire beforehand, it is incredibly hard to judge the skill level of your attendees. It’s best to prepare for all levels. Start each exercise slow and have a lot of optional work built in for people that work faster.

In most of my courses I was too ambitious for the time allowed. I wanted them to learn and try everything. Sometimes I wasn’t asking the right questions on the exercises either. Testing exercises and timing people is the only way to tailor them. Now that I have run the workshops and seen the exercises in action, I have a clearer picture of what I want people to learn and accomplish—now I just have to make the changes.

Future plans

There were courses I would love to run in the future (like data visualisation and digital forensics), but I did not have the time to develop. I’d like to place them on a roadmap for future training. As well as reaching out more to the Oxford colleges, museums and other departments. I would also like to tailor the introductory course a bit more for different audiences.

I’d like to get involved with developing courses like Digital Preservation Carpentry that the University of Melbourne is working on. The hands-on workshops excited and challenged me the most. Not only did others learn a lot, but so did I. I would like to build on that.

At the end of this pilot, I have seven courses that I will finalise and make available through a creative commons licence. What I learned when trying to develop these courses is that there isn’t always a lot of good templates available on the Internet to use as a starting point—you have to ask around for people willing to share.

So, I am hoping to take the work that I’ve done and share it with the digital preservation community. I hope they will be useful resources that can be reused and repurposed. Or at the very least, I hope it can be used as a starting point for inspiration (basic speakers notes included).

These will be available via the DPOC website sometime this summer, once I have been able to make the changes necessary to the slides and exercises—along with course guidance material. It has been a rewarding experience (as well as an exhausting one); I look forward to developing and delivering more digital preservation training in the future.

Digital preservation with limited resources

What should my digital preservation strategy be, if I do not have access to repository software or a DAMS system?

At Oxford, we recently received this question from a group of information professionals working for smaller archives. This will be a familiar scenario for many – purchasing and running repository software will require a regular dedicated budget, which many archives in the UK do not currently have available to them.

So what intermediate solutions could an archive put in place to better its chances of not losing digital collection content until such a time? This blog summarises some key points from meeting with the archivists, and we hope that these may be useful for other organisations who are asking the same question.


Protect yourself against human error

CC-BY KateMangoStar, Freepik

Human error is one of the major risks to digital content. It is not uncommon that users will inadvertently drag files/folders or delete content by mistake. It is therefore important to have strict user restrictions in place which limits who can delete, move, and edit digital collections. For this purpose you need to ensure that you have defined an “archives directory” which is separate from any “working directories” where users can still edit and actively work with content.

If you have IT support available to you, then speak to them about setting up new user restrictions.


Monitor fixity

CC-BY Dooder, Freepik

However, even with strong user restrictions in place, human error can occur. In addition to enforcing stronger user restrictions in the “archives directory”, tools like Fixity from AVP can be used to spot if content has been moved between folders, deleted, or edited. By running regular Fixity reports an archivist can spot any suspicious looking changes.

We are aware that time constraints are a major factor which inhibits staff from adding additional tasks to their workload, but luckily Fixity can be set to run automatically on a weekly basis, providing users with an email report at the end of the week.


Understand how your organisation does back-ups

CC-BY Shayne_ch13, Freepik

A common IT retention period for back-ups of desktop computers is 14 days. The two week period enables disaster recovery of working environments, to ensure that business can continue as usual. However, a 14 day back-up is not the same as preservation storage and it is not a suitable solution for archival collections.

In this scenario, where content is stored on a file system with no versioning, the archivist only has 14 days to spot any issues and retrieve an older back-up before it is too late. So please don’t go on holiday or get ill for long! Even with tools like Fixity, fourteen days is an unrealistic turn-around time (if the issue is at all spotted in the first place).

If possible, try and make the case to your organisation that you require more varied types of back-ups for the “archival directory”. These should include back-ups which are at least retained for a year. Using a mix of tape storage and/or cloud service providers can be a less expensive way of storing additional back-ups which do not require ongoing access. It is an investment which is worth making.

As a note of warning though – you are still only dealing in back-ups. This is not archival storage. If there are issues with multiple back-ups (due to for example transfer or hardware errors) you can still lose content. The longer term goal, once better back-ups are in place, should be to monitor the fixity of multiple copies of content from the “archival directory”. (For more information about the difference between back-ups used for regular IT purposes and storage for digital preservation see the DPC Handbook)


Check that your back-ups work
Once you have got additional copies of your collection content, remember to check that you can retrieve them again from storage.

Many organisations have been in the positions where they think they have backed up their content – only to find out that their back-ups have not been created properly when they need them. By testing retrieval you can protect your collections against this particular risk.


But… what do I do if my organisation does not do back-ups at all?
Although the 14 day back-up retention is common in many businesses, it is far from the reality which certain types of archives operate within. A small community organisation may for example do all its business on a laptop or workstation which is shared by all staff (including the archive).

This is a dangerous position to be in, as hardware failure can cause immediate and total loss. There is not a magic bullet for solving this issue, but some of the advice which Sarah (Training and Outreach Fellow at Bodleian Libraries) has provided in her Personal Digital Archiving Course could apply.

Considerations from Sarah’s course include:

  • Create back-ups on additional removable hard drive(s) and store them in a different geographical location from the main laptop/workstation
  • Make use of free cloud storage limits (do check the licenses though to see what you are agreeing to – it’s not where you would want to put your HR records!)
  • Again – remember to check your back-ups!
  • For digitized images and video, consider using the Internet Archive’s Gallery as an additional copy (note that this is open to the public, and requires assigning a CC-BY license)  (If you like the work that the Internet Archive does – you can donate to them here )
  • Apply batch-renaming tools to file names to ensure that they contain understandable metadata in case they are separated from their original folders

(Email us if you would like to get a copy of Sarah’s lecture slides with more information)


Document all of the above

CC-BY jcomp, Freepik

Make sure to write down all the decisions you have made regarding back-ups, monitoring, and other activities. This allows for succession planning and ensures that you have a paper trail in place.


Stronger in numbers

CC-BY, Kjpargeter, Freepik

Licenses, contracts and ongoing management is expensive. Another venue to consider is looking to peer organisations to lower some of these costs. This could include entering into joint contracts with tape storage providers, or consortium models for using repository software. An example of an initiative which has done this is the NEA (Network Electronic Archive) group which has been an established repository for over ten years supporting 28 small Danish archives.


Summary:
These are some of the considerations which may lower the risk of losing digital collections. Do you have any other ideas (or practical experience) of managing and preserving digital collections with limited resources, and without using a repository or DAMS system?

Gathering the numbers: a maturity and resourcing survey for digital preservation

The ability to compare ourselves to peer institutions is key when arguing the case for digital preservation within our own organisations. However, finding up-to-date and correct information is not always straight forward.

The Digital Preservation at Oxford and Cambridge (DPOC) project has joined forces with the Digital Preservation Coalition (DPC) to gather some of the basic numbers that can assist staff in seeking to build a business case for digital preservation in their local institution.

We need your input to make this happen!

The DPOC and the DPC have developed a survey aimed at gathering basic data about maturity levels, staff resources, and the policy and strategy landscapes of institutions currently doing or considering digital preservation activities. (The survey intentionally does not include questions about the type or size of the data organisations are required to preserve.)

Completing the survey will only take 10-20 minutes of your time, and will help us better understand the current digital preservation landscape. The survey can be taken at: https://cambridge.eu.qualtrics.com/jfe/form/SV_brWr12R8hMwfIOh

Deadline for survey responses is: Thursday 31 May 2018.

For those wanting to know upfront what questions are asked in the survey – here is the full set of Survey Questions (PDF). Please keep in mind the survey is interactive and you may not see all of the questions when filling this in online (as the questions only appear in relation to your previous responses). Responses must be submitted through the online survey.

Anonymised data gathered as part of this maturity and resourcing survey will be made available via this DPOC website.

For any questions about the survey and its content, please contact: digitalpreservation@lib.cam.ac.uk

Email preservation 2: it is hard, but why?

A post from Sarah (Oxford) with input from Somaya (Cambridge) about the 24 January 2018 DPC event on email archiving from the Task Force on Technical Approaches to Email Archives.

The discussion of the day circulated around what they had learnt during the year of the task force, that personal and public stories are buried in email, considerable amounts of email have been lost over previous decades, that we should be treating email as data (it allows us to understand other datasets), that current approaches to collecting and preserving email don’t work as they’re not scalable and the need for the integration of artificial intelligence and machine learning (this is already taking place in legal professions with ‘predictive coding’ and clustering technologies) to address email archives, including natural language processing functions is important.


Back in July, Edith attended the first DPC event on email preservation, presented by the Task Force on Technical Approaches to Email Archives. She blogged about here. In January this year, Somaya and I attended the second event hosted again by the DPC.

Under the framework of five working groups, this task force has spent 12 months (2017) focused on five separate areas of the final report, which is due out in around May this year:

  • The Why: Overview / Introduction
  • The When/Who/Where: Email Lifecycles Perspectives
  • The What: The Needs of Researchers
  • The How: Technical Approaches and Solutions
  • The Path Forward: Sustainability & Community Development

The approach being taken is technical, rather than on policy. Membership of the task force includes the DPC, representatives from universities and national institutions from around the world and technology companies including Google and Microsoft.

For Chris Prom (from University of Illinois Urbana Champaign, who authored the 2011 DPC Technology Watch Report on Preserving Email) and Kate Murray’s (Library of Congress and contributor to FADGI) presentation about the work they have been doing, you can view their slides here. Until the final report is published, I have been reviewing the preliminary draft (of June 2017) and available documents to help develop my email preservation training course for Oxford staff in April.

So, when it comes to email preservation, most of the tools and discussions focus on processing email archives. Very little of the discussion has to do with the preservation of email archives over time. There’s a very good reason for this. Processing email archives is the bottleneck in the process, the point at which most institutions are still stuck at. It is hard to make decisions around preservation, when there is no means for collecting email archives or processing them in a timely manner.

There were many excellent questions and proposed solutions from the speakers at the January event. Below are some of the major points from the day that have informed my thinking of how to frame training on email preservation:

Why are email archives so hard to process?

  1. They are big. Few people cull their emails and over time they build up. Reply and ‘reply all’ functions expand out emails chains and attachments are growing in size and diversity. It takes a donor a while to prepare their email archives, much less for an institution to transfer and process them.
  2. They are full of sensitive information. Which is hard to find. Many open source technology assisted review (TAR) tools miss sensitive information. Software used for ‘predictive coding’ and machine learning for reviewing email archives are well out of budget for heritage institutions. Manual review is far too labour intensive.
  3. There is no one tool that can do it all. Email preservation requires ‘tool chaining’ in order to transfer, migrate and process email archives. There are a very wide variety of email software programs which in turn create a many different email file format types. Many of the tools used in email archive processing are not compatible with each of the different email file types; this requires a multiple file format migrations to allow for processing. For a list of some of the current available tools, see the Task Force’s list here.

What are some of the solutions?

  1. Tool chaining will continue. It appears for now, tool chaining is here to stay, often mixing proprietary with open source tools to get workflows running smoothly. This means institutions will need to invest in establishing email processing workflows: the software, people who know about how to handle different email formats etc.
  2. What about researchers? Access to emails is tightly controlled due to sensitivity restraints, but is there space to get researchers to help with the review? If they use the collection for research, could they also be responsible for flagging anything deemed as sensitive? How could this be done ethically?
  3. More automation. Better tool development to assisted with TAR. Reviewing processes must become more automated if email archives are ever to be processed. The scale of work is increasing and traditional appraisal approaches (handling one document at a time) and record schedules are no longer suitable.
  4. Focus on bit-level preservation first. Processing of email archives can come later, but preserving it needs to start on transfer. (But we know users want access and our institutions want to provide this access to email archives.)
  5. Perfection is no longer possible. While archivists would like to be precise, in ‘scaling up’ email archive processing we need to think about it as ‘big data’ and take a ‘good enough’ approach.

Towards a common understanding?

Cambridge Outreach and Training Fellow, Lee, describes the rationale behind trialling a recent workshop on archival science for developers, as well as reflecting on the workshop itself. Its aim was to get those all those working in digital preservation within the organisation to have a better understanding of each other’s work to improve co-operation for a sustainable digital preservation effort.


Quite often, there is a perceived language barrier due to the wide range of practitioners that work in digital preservation. We may be using the same words, but there’s not always a shared common understanding of what they mean. This became clear when I was sitting next to my colleague, a systems integration manager, at an Archivematica workshop in September. Whilst not a member of the core Cambridge DPOC team, our colleague is a key member of our extended digital preservation network at Cambridge University Library a is a key member for development for understanding and retaining digital preservation knowledge in the institution.

For those from a recordkeeping background, the design principles behind the front end of Archivematica should be obvious, as it incorporates both traditional principles of archival practice and features of the OAIS model. However, coming from a systems integration point of view, there was a need to have to translate for my colleague words such as ‘accession’, ‘appraisal’ and ‘arrangement’, which many of us with archival education take their meanings for granted.

I asked my colleague if an introductory workshop on archival science would be useful, and she said, “yes, please!” Thus, the workshop was born. Last week, a two and a half hour workshop was trialled for members of our developer and systems integration colleagues. The aim of the workshop was to enable them to understand what archivists are taught on postgraduate courses and how this teaching informs their practice. After understanding the attendees’ impressions of an archivist and the things that they do (see image) the workshop then practically explored how an archivist would acquire and describe a collection. The workshop was based on an imaginary company, complete with a history and description of the business units and examples of potential records they would deposit. There were practical exercises on making an accession record, appraising a collection, artificial arrangement and subsequent description through ISAD(G).

Sticky notes about archivists

Sticky notes about archivists from a developer point of view.

Having then seen how an archivist would approach a collection, the workshop moved into explaining physical storage and preservation before moving onto digital preservation, specifically looking at OAIS and then examples of digital preservation software systems. One exercise was to get the attendees to use what they had learned in the workshop to see where archival ideas mapped onto the systems.

The workshop tried to demonstrate how archivists have approached digital preservation armed with the professional skills and knowledge that they have. The idea was to inform to teams working with archivists and the digital preservation of how archivists think and how and why some of the tools and products are design in the way that they are. My hope was for ‘IT’ to understand the depth of knowledge that archivists have in order to help everyone work together on a collaborative digital preservation solution.

Feedback was positive and it will be run again in the New Year. Similarly, I’m hoping to devise a course from a developer perspective that will help archivists communicate more effectively with developers. Ultimately, both will be working from a better level of understanding each other’s professional skill sets. Co-operation and collaboration on digital preservation projects will become much easier across disciplines and we’ll have a better informed (and relaxed) environment to share practices and thoughts.

Advocating for digital preservation

Bodleian Libraries and Cambridge University Library are entering into the last phase of the DPOC project, where they are starting to write up business cases for digital preservation. In preparation, the Fellows attended DPC’s “advocacy briefing day” in London.  Policy and Planning Fellow, Edith, blogs about some of the highlights and lessons from the day.


This week I had the pleasure of attending DPC’s advocacy training day. It was ran by Catherine Heaney, the founder of DHR Communications, and a veteran when it comes to advocating for supporting digital heritage. Before the event I thought I had a clear idea of what advocacy means in broad terms. You invite yourself into formal meetings and try to deliver measured facts and figures which will be compelling to the people in front of you – right?

Well… not quite it turns out. Many of these assumptions were put on their head during this session. Here are my four favourite pieces of (sometimes surprising) advocacy advice from Catherine.

Tip 1: Advocacy requires tenaciousness

The scenario which was described above is what communications professionals might call “the speech” – but it is only one little part of effective advocacy. “The digital preservation speech” is important, but it is not necessarily where you will get the most buy-in for digital preservation. Research has shown that one-off communications like these are usually not effective.

In fact, all of those informal connections and conversations you have with colleagues also come under advocacy and may reap greater benefits due to their frequency. And if one of these colleagues are themselves talented at influencing others, they can be invaluable in advocating for digital preservation when you are not there in person.

Lesson learnt: you need to keep communicating the message whenever and wherever you can if you want it to seep in to peoples’ consciousness. Since digital preservation issues do not crop up that often in popular culture and the news, it is up to us to deliver, re-deliver… and then re-deliver the message if we want it to stick.

Tip 2: Do your background research

When you know that you will be interacting with colleagues and senior management, it is important to do your background research and find out what argument will most appeal to the person you are meeting. Having a bog-standard ‘speech’ about digital preservation which you pull out at all occasions is not the most effective approach. In order to make your case, the problem you are attempting to solve should also reflect the goals and the challenges which the person you are trying to advocate to are facing.

The aspects which appeal about digital preservation will be different depending on the role, concerns and responsibilities of the person you are advocating to. Are they concerned with:

  • Legal or reputational risk?
  • Financial costs and return on investment?
  • About being seen as someone at the forefront of the digital preservation fields?
  • Creating reproducible research?
  • Collecting unique collections?
  • Or perhaps about the opportunity to collaborate cross-institutionally?

Tip 3: Ensure that you have material for a “stump speech” ready

Tailoring your message to the audience is important, and this will be easier if you have material ready at hand which you can pick and choose from. Catherine suggested preparing a folder of stories, case studies, data and facts about digital preservation which you can cut and paste from to suit the occasion.

What is interesting though is the order of that list of “things to collect”:

  1. Stories
  2. Case studies
  3. Data and facts

The ranking is intentional. We tend to think that statistics and raw data will convince people, as this appeals to their logic. In fact, your argument will be stronger if your pitch starts with a narrative (a story) about WHY we need digital preservation and case studies to illustrate your point.  Catherine advises that it is then when the audience is listening that you bring out the data and facts. This approach is both more memorable and more effective in capturing your audience’s attention.

Tip 4: Personalise your follow up

This connects to tip 2 – about knowing your audience. Catherine advised that, although it may feel strange at first, writing a personalised follow up message is a very effective tool. When you do have the chance to present your case to an important group within your organisation, the follow up message can further solidify that initial pitch (again – see tip 1 about repeated communication).

By taking notes about the concerns or points that have been made during a meeting, you have the opportunity to write personalised messages which captures and refers back to the concerns raised by that particular person. The personalised message also has the additional benefit of opening up a channel for future communication.


This was just a small subsection of all the interesting things we talked about on the advocacy briefing day. For some more information have a look at the hashtag for the day #DPAdvocacy.