Devising Your Digital Preservation Policy: Learnings from the DPOC project

On December 4th the DPOC Policy and Planning Fellows ran a joint workshop in London presenting learnings and experiences of policy writing at CUL and Bodleian Libraries. Supporting the event were also Kirsty Lingstadt (Head of the Digital Library at the University of Edinburgh) and Jenny Mitcham (Head of Good Practice at the Digital Preservation Coalition). Kirsty and Jenny talked about their experience of policy writing in other organisational settings, illustrating how policy writing must be tailored to fit specific institutional contexts but that the broad principles remain the same.

In total 30 attendees partook in the workshop which mixed presentations with round table discussions. To make the event as interactive as possible Mentimeter was used to poll attendees on their own experiences of policy writing. Although the survey only represents a small selection of organisations in the process of writing digital preservation policy, the Fellows wanted to share some of the results in the hope that it will facilitate further discussion. Feel free to use the comments section below to let the project team know if the results from the poll seem familiar (or perhaps unfamiliar).


Question: Do you know who to consult on a digital preservation policy (in your organisation)?

Most workshop participants knew who they needed to consult on digital preservation in their organisation and also had a good working relationship with them. This is the first step when starting a new policy – knowing your organisational culture and context.

Being new to their organisations, the DPOC Fellows spent a lot of time of time early on in the project reaching out to staff across the libraries. If you are also new to your institution, getting to know those who have been there a long time is an important starting point to understanding what type of policy will suit your organisation’s culture before you begin any writing.

Question: What barriers can you see to developing a digital preservation policy (in your organisation)?

‘Time’ was identified as by far the largest barriers to writing new digital preservation policy by participants. And it is true that policy development does take a lot of time if you want the resulting document to be more than ‘just a paper’ which is filed away at the end of the process.

To get staff onboard with new policy, allocating resources for policy consultation is therefore crucial and the effort involved is not always appreciated by senior management. For example, it took the Fellows between 1-2 years to develop a new digital preservation policy for their organisations, illustrating why it is important to give staff sufficient time to write policy. While policy consultation took a long time, the DPOC Fellows felt that this was a worthwhile investment for their organisations, as time spent consulting on policy was also a great outreach and learning opportunity for the organisations as a whole.

Question: Does your organisation have a policy template?

Most participants did not have an organisation wide policy template. However, templates are part of policy best practice. A policy template is a skeleton document which outlines high level sections and headlines  which should be included in every organisational policy regardless of topic – from an HR policy to a digital preservation policy, they should all follow the same structure. The purpose of having these standardised headlines is to ensure that staff can easily digest and recognise any policy at a quick glance. Templates can also enforce good document management practices.

If you are interested in finding out more, a high level policy template which was developed for the DPOC project can be requested through the DPOC blog contact form or by emailing the Digital Preservation Coalition.

Questions: Where are institutional policies publishes (in your organisation)?

Once the policy is signed off, it is time to publicise it wider. Among the workshop participants the most common places to publish policies were either on an institutional website or intranet (although there are other options listed in the word cloud).

As a word of caution, make sure that your organisation is consistent in where it publish policies and ensure that documents are versioned. The international digital preservation policy review which the Fellows undertook in 2016 (analysing 50 different policies) found that most digital preservation policies do not use any document versioning. No versioning, in combination with the proliferation of different policy publication routes in an organisation, will soon become a real issue when staff try to locate up to date documents. (Again, if your organisation has a good policy template in place you can better enforce versioning!)

One option which was listed several times in the word cloud is to publish policy in an institutional repository; this is primarily useful if you do not have a reliable records management system in your organisation. Using a repository means that you can assign a DOI to the policy for persistent referencing and also has the added benefit of becoming the clear canonical copy of the policy

Question: How long will it take to…?

Participants were asked how long (using multiples of months) they think it would take their organisations to:

  • Draft a policy
  • Have it approved
  • Begin implementation of the policy
  • See real impact and benefits in the organisation

As seen from the chart, the drafting of a policy document is only one small aspect of policy and planning work. This is important to remember if you want to avoid your policy becoming just another ‘piece of paper’ that is filed away and not looked at again after its been written. Advocacy, communication and implementation plans continue for years to come after the original document has been drafted. 


Where next…

To find out more about policy writing during the DPOC project have a look at this recent blog post from CUL’s Policy and Planning Fellow Somaya Langley and at the workshop presentation slides available through the DPC. The Fellows are also happy to take questions through the blog and encourage use of the comments section.

Memory Makers: Digital preservation skills and how to get them

The Memory Makers Conference was hosted at Amsterdam Museum in the Netherlands 29th-30th November. Bodleian Libraries’ Policy and Planning Fellow, Edith Halvarsson, attended.


The Memory Makers conference in Amsterdam brought together training providers from the private, higher education and continuing education sector to discuss digital preservation skills, how to get them (and how to retain them).

In my experience, research on skills development is often underrepresented at digital preservation conferences, and when such talks are included the attendance tend to be lower than for technology based strands. However, taking a 1.5 day deep dive into this topic is one of the most interesting and thought-provoking activities I’ve done this year and I am happy that NDE and DPC decided to highlight this area by giving it its own conference. So in this blog I wanted to summarise some of the thoughts that have stayed with me since coming back from Amsterdam

The expectation gap

‘The expectation gap’ is something which we have discussed in a roundabout way among the Fellows over the past years, but it was a presentation by Dr Sarah Higgins which really put words onto this phenomena for me. The notion of an ‘expectation gap’ also nicely frames why we need to think seriously about lifelong learning and competency frameworks.

Sarah has been teaching Information Management to Masters Students at Aberystwyth University (Wales) for almost a decade and has been observing both the development of the programme and the career trajectories of students graduating into the field. In this time there’s been a growing gap between what employers expect of students in terms of digital preservation skills and what certified MA programmes can offer.

The bodies which certify Information Management courses in the UK (CILIP and ARA) still only require minimal digital skills as part of their competency frameworks. This has made it challenging to argue for new and mandatory digital preservation related modules on UK MA programmes. MA programmes have definitely shifted to begin meeting the digital preservation challenge, but they are still at an early stage.

So while UK Information Management courses continue to frame a lot of teaching around physical collections, the expectations of digital skills from organisations hiring recent graduates from these programmes has skyrocketed. This has made the gap between reality and fantasy even larger.  There has been a growing trend for organisations to hire new graduates and expecting them to be the magic bullet; the readymade lone experts in all areas of digital preservation who do not require any further development or support ever again. Many of Sarah’s graduates who began working on digital preservation/curation/archiving projects after graduation were essentially ‘set up to fail’ – not a nice or fair place to be at in your first job.

Dr Natalie Harrower: https://twitter.com/natalieharrower/status/1068124988358709254

Developing skills frameworks

To meet the challenge of unclear competency expectations, Sharon McMeekin (Head of Training and Skills at DPC) called for continued development of skills frameworks such as DigCurV. While DigCurV has been immensely valuable (we have for example drawn on it continuously in the DPOC project), the digital preservation field has matured a lot over the past couple of years and new learnings could now be incorporated into the model. A useful new addition to DigCurV, Sharon argued, would be to create more practitioner levels which reflects the expected skills progressions over 1-10 years for new graduates entering the field.

If such frameworks were taken on by certifying bodies, it could potentially temper both unrealistic job descriptions and help staff argue for professional development opportunities.

Lifelong learning

In her talk, Sarah strongly argued that we should expect recent Information Management graduates to also require more workplace based training after graduation. A two-year MA programme is not the endpoint for learning, especially in a quickly moving and developing field. This means that ongoing learning opportunities must also be considered by hiring organisations.

It was refreshing to hear form the British Library who strongly subscribe to this idea. The British Library team teach introductory courses on digital preservation and drop in lab sessions for all library staff on a yearly basis.

Micky Lindlar: https://twitter.com/MickyLindlar/status/1068155027108306944

But the digital preservation team also engages with a wide range of training opportunities that are perhaps not considered traditional Information Management skills. Maureen Pennock (Head of Digital Preservation at the BL) argued that skills for digital preservation are not necessarily unique to the field, and can be acquired in places which you may not initially have consider. Such skills include project management, social media management, presentation delivery, and statistical analysis. Although it should be noted that Maureen also strongly stated that no one person should be expected to be an expert in all these areas at the same time.

Learning collaboratively

Another set of presentations which I really enjoyed was focused on “collaborative learning”. Puck Huijtsing (Netwerk Oorlogsbronnen) challenged why we are so attached to lecture style learning which we are familiar with from school and higher education. She argued that collaborative learning has been shown to be a successful model when training people to take on a new craft (and she believes that digital preservation is a craft). Puck went on to elaborate on Amsterdam’s strong history of craft guilds and how these taught and shared new skills, arguing that it could potentially be a more accessible and sustainable model for workplace based training.

A number of successful training models presented by the Netherland Institute for Sound and Visions then illustrated how collaborative hands-on workshops can be delivered in practices. In one workshop series delivered by the institute, participants were asked to undertake small projects which focused on discreet digital collection material which they had a pre-existing relationship with. The institutes research indicates that this model is successful in aiding retention and uptake of digital preservation and archiving skills. These are workshops which we are also keen to test out at Bodleian Libraries next year to see if they are received well by staff.

Summary

It is clear from the Memory Makers conference that there are a lot of people out there who care about learning and professional development in the digital preservation field. This blog only summarises a small section of all the excellent work that was presented over 1.5 days, and I would encourage others to look at presentation slides and the Twitter hash for the event (#MemoryMakers18) if this is a topic which interests you as well.

Cambridge University Libraries inaugural Digital Preservation Policy

The inaugural Cambridge University Libraries Digital Preservation Policy has been published last week. Somaya Langley (Cambridge Policy & Planning Fellow) provides some insight into the policy development process and announces a policy event in London, presented in collaboration with Edith (Oxford Policy & Planning Fellow) to be held in early December 2018.


In December 2016, I started the digital preservation policy development process for Cambridge University Library (CUL), which has finally culminated in a published policy.

Step one

Commencing with a ‘quick and dirty’ policy gap analysis at CUL, what I discovered was not so much that there were some gaps in their existing policy landscape but rather that there was a dearth of much-needed policies. The gap analysis at CUL found that a few key policies did exist for different audiences (some intended to guide CUL, some to guide researchers and some meant for all staff and researchers working at the University of Cambridge). While my counterpart at Oxford found there was duplication in their policies across Bodleian Libraries and the University of Oxford, I mostly found chasms.

Next step

The second step in the policy development process was attempting to meet an immediate need from staff, by adding some “placeholder” digital preservation statements into the Collection Care and Conservation Policy that was currently under review. In the longer term, while it might be ideal to combine a preservation policy into one (encompassing the conservation and preservation of physical and digital collection items), CUL’s digital preservation maturity and skill capabilities are too low at present. Focus needed to be really drawn to how to manage digital content, hence the need for a separate Cambridge University Libraries Digital Preservation Policy.

That said, like everything else I’ve been doing at Cambridge, it needed to be addressed holistically. And policy is no exception. Being able to undertake about two full weeks of work (spanning several months in early 2017) contributing to the review of the Collection Care and Conservation Policy has meant including some statements in this policy that will support better care for digital (and audiovisual) content still remaining on carriers (that are yet to be transferred).

Collaborative development

Then in June 2017 we moved onto undertaking policy development collaboratively. Part of this was to do an international digital preservation policy review – looking at dozens of different policies (and some strategies). Edith wrote about the policy development process back in middle of last year.

The absolute lion’s share of the work was carried out by my Oxford counterparts, Edith and Sarah. Due to other work priorities, I didn’t have much available time during this stage. This is why it is so important to have a team – whether this is a co-located team or distributed across an organisation or multiple organisations – when working in the digital preservation space. I really can’t thank them enough for carrying the load for this task.

Policy template

My contribution was to develop a generic policy template, for use in both our organisations. For those that know me, you will know I prefer to ‘borrow and adapt’ rather than reinvent the wheel. So I used the layout of policies from a previous workplace and constructed a template for use by CUL and the Bodleian Libraries. I was particularly keen to ensure what I developed was generic, so that it could be used for any type of policy development in future.

This template has now been provided to the Digital Preservation Coalition, who will make it available with other documents in the coming years – so that some of this groundwork doesn’t have to be carried out by every other organisation still needing to do digital preservation policy (or other policy) development. We found in our international digital preservation maturity and resourcing survey (another blog post on this is still to follow), that there’s still at least 42% of organisations internationally, that do not have a digital preservation policy.

Who has a digital preservation policy?

What next?

Due to other work priorities, drafting the digital preservation policy didn’t properly commence until earlier this year. But by this point I had a good handle on my organisation’s specific:

  • Challenges and issues related to digital content (not just preservation and management concerns)
  • High-level ‘profile’ of digital collections, right across all content ‘classes’
  • Gaps in policy, standards, procedures and guidelines (PSPG) as well as strategy
  • Appreciation of a wide-range of digital preservation policies (internationally)
  • Digital preservation maturity (holistic, not just technical) – based on maturity assessments using several digital preservation maturity models
  • Governance (related to policy and strategy)
  • Language relevant to my organisation
  • Responsibilities across the organisation
  • Relevant legislation (UK/EU)

This formed my approach of how to draft the digital preservation policy, that would meet CUL’s needs.

Approach

I realised that CUL required a comprehensive policy, that would fill the many gaps that ideally other policies would cover. I should note that there are many ways of producing a policy, and it does have to be tailored to meet the needs of your organisation. (You can compare with Edith’s digital preservation policy for the Bodleian Libraries, Oxford.)

The next steps involved:

  • Gathering requirements (this had already taken place during 2017)
  • Setting out a high-level structure/list of points to address
  • Defining the stakeholder group membership (and ways of engaging with them)
  • Setting the frame of the task ahead
  • Agreeing on the scope (this changed from ‘Cambridge University Library’ to ‘Cambridge University Libraries’ – encompassing CUL’s affiliate and dependent libraries‘)

Then came the iterative process of:

  1. Drafting policy statements and principles
  2. Meeting with the stakeholder group and discussing the draft
  3. Gathering feedback on the policy draft (internally and externally)
  4. Incorporating feedback
  5. Circulating a new version of the draft
  6. Developing associated documentation (to support the policy)

Once a final version had been reached, this was followed by the approvals and ratification process.

What do we have?

Last week, the inaugural Cambridge University Libraries Digital Preservation Policy was published (which was not without a few more hurdles).

It has been an ‘on again, off again’ process that has taken 23 months in total. Now we can say that for CUL and the University of Cambridge, that:

“Long-term preservation of digital content is essential to the University’s mission of contributing to society through the pursuit of education, learning, and research.”

Which compliments some of our other CUL policies.

What now?

This is never the end of a policy process. Policy should be a ‘live and breathing’ process, with the policy document itself purely being there to keep a record of the agreed upon decisions and principles.

So, of course there is more to do. “But what’s that?”, I hear you say.

Join us

There is so much more that Edith and I would like to share with you about our policy development journey over the past two years of the Digital Preservation at Oxford and Cambridge (DPOC) project.

So much so that we’re running an event in London on Tuesday 4th December 2018 on Devising Your Digital Preservation Policy, hosted by the DPC. (There is one seat left – if you’re quick, that could be you).

We’re also lucky to be joined by two ‘provocateurs’ for the day:

  • Kirsty Lingstadt, Head of Digital Library and Deputy Director of Library and University Collections, University of Edinburgh
  • Jenny Mitcham, Head of Good Practice and Standards, Digital Preservation Coalition (who has just landed in her new role – congrats & welcome to Jenny!)

There is so much more I could say about policy development in relation to digital content, but I’ll leave it there. I do hope you get to hear Edith and I wax lyrical about this.

Thank-yous

Finally, I must thank my Cambridge Polonsky team members, Edith Halvarsson (my Oxford counterpart), plus Paul Wheatley and William Kilbride from the DPC. Policy can’t be developed in a void and their contributions and feedback have been invaluable.

Reflections on the International Conference on Digital Preservation (iPres) 2018

The iPres conference celebrated its fifteenth birthday in 2018. Bodleian Libraries’ Policy and Planning Fellow, Edith, discusses her take on this year’s conference theme.  


In 2003 a small international meeting, hosted by the Chinese Academy of Science, prompted the creation of what is today iPres (the International Conference on Digital Preservation). The conference has since grown massively; this year almost 500 delegates attended. To celebrate its fifteenth birthday, iPres 2018 had a self-reflecting theme, considering how the theory of digital preservation has today matured into a community of practice.

In the three years that I’ve worked in the digital preservation field, I have often felt that I have the same conversations on repeat. Which is not to say that I do not love having them! However, the opportunity to reflect on significant developments in digital preservation since 2003 is comforting and shows how these conversations eventually do have lasting impact. Knowing how far the community has come in the past fifteen years opens up my imagination around where digital preservation might be by 2033. And despite current world challenges I am very optimistic!


So what did iPres 2018 have to say about developments since 2003?

1) We now have a joint vocabulary

Barbara Sierman, of the Koninklijke Bibliotheek, commented that a development which is particularly striking to her is that digital preservation today has a shared vocabulary. In the early 2000’s even defining the issues around preservation was a barrier when speaking to colleagues. The fact that we now have a shared vocabulary, comments Sierman, means that practitioners are able to present their research and practices at conferences such as iPres.

This is something hugely valuable and does show that digital preservation is emerging as a distinct discipline. Importantly, having established a vocabulary and theories also enables the digital preservation community to challenge and test these very notions and use them as a reference point for new ones.

Twitter – @euanc – https://twitter.com/euanc/status/1044941732155215873


2) More people see the value of digital preservation

“The ability to authenticate and validate turns out to be a superpower in an era where data and truth has become a key economic product.”

This was a comment from William Kilbride (Digital Preservation Coalition) on growing interest in the field. I agree that public awareness of digital collecting and digital preservation is something which appears to have changed rapidly in the last year or so. I think there is a growing consciousness that the internet is not permanent and that your digital life has value. My personal observation has been that recent events, (such as Cambridge Analytica as well as the stricter General Data Protection Regulation in the EU), have prompted more people to see their social media and other data as something they can make decisions about. This is for example the first year when friends have started asking me how to extract and preserve their social media!


3) Digital preservation is becoming more Business-as-Usual (but we are not completely there yet)

Twitter-@karirene69, https://twitter.com/karirene69/status/1045014419045064704

In the panel Taking Stock after 15 Years Maureen Pennock, of the British Library, reflected on the role of research in developing digital preservation as a field. Many of the research projects undertaken in the late 1990’s to 2000’s profoundly shaped the field and without them we would today not have sustainable digital collecting programmes in place in some organisations.

Having the space to undertake innovative research will always be important to ensure that digital preservation can address emerging challenges. It is also highly encouraging that BAU digital preservation programmes are now becoming more common and that organisations are collecting at large and automated scales. However, Pennock warns that there is a difference between research and practice and that the latter needs to function outside the remit of discreet research project funding. This still an ongoing challenge to BAU practices for digital preservation.


And what about the future?

It is always hard to predict which topics are “fads” and which ones make a more lasting impact. However, a hot topic this year (which divided opinions) was whether or not digital preservation should develop into a separate profession with its own code of ethics. The development of digital preservation as a profession could be an important advocacy tool. Conversely, it also runs the risks of isolating digital preservation activities by framing them as something separate from other professions such as archivists, records managers and librarians.

Twitter – @mopennock – https://twitter.com/mopennock/status/1044944038170972161

Now that we have the vocabularies, theories, practices, and attention of the media (as outlined above) – should we instead be making a more concerted effort to integrate with library, archives and other research conferences? This will no doubt be a continued area of discussion for iPres 2019 and beyond!

International Digital Preservation Day 2017 #IDPD17

It is International Digital Preservation Day. Today, around the world we celebrate the field that is fighting against time and technology to make sure that our digital “things” survive. And in turn, we are trying to make time and technology work with us.


We’re the people that see a 5.25” floppy disk and think “I bet I can read that. I wonder what I’ll find?” and we’re already making a list of where we can find the hardware and software to read it. We’re already dating it to wonder what kind of files would be on it, what software created those files—can we still find them? We’re willing to try, because every day that disk is ageing and every day is the possibility that when we get around to reading it, the data might be corrupted.

We’re the people fighting against the inevitable technological obsolescence, juggling media carriers, file formats, technological failures, software obsolescence and hardware degradation. It is like a carefully coordinated dance, where one wrong thing can end up in some sort of error. A file can’t open, or if I can open it what am I even staring at? We’re trying to save our digital world, before it degrades and corrupts.

Sometimes it’s not always that dire, but it’s the knowledge that if something gets overlooked, at some point – often in the blink of an eye – something will be lost. Something will be damaged. It’s like playing a kind of Russian roulette, expect for those of us who are custodians of unique digital collections, we can’t take those chances. We cannot lose our digital assets, our digital “things” that we collect on behalf of the public, or for compliance reasons, or because we are keeping a record of the now for the future. After all, we have stories to tell, histories to save – what is it that we want to leave for the future?

If we don’t consider preserving our digital “things” now, then we might not leave a story behind to tell.

For some reason, while this is an issue we all struggle with (raise your hand if you’ve lost a digital file in your life or if your computer/tablet/phone has crashed and you lost everything and didn’t have a backup) digital preservation is still something people don’t know about or just don’t talk about. Why is something that we are all struggling with ignored so much? Is it because we’re not speaking up enough? Is it because people just lose their stuff and move on, forgetting about it? When so much of our lives’ records are now only digital, how can we just forget what we lose? How can we not care?

The truth is we should. And we should all be looking to digital preservation in one form or another. From individuals to big business, digital preservation matters. It’s not just for the cultural heritage and higher education institutions to “do” or to “worry” about. It involves you too.

The good news is that the world is starting to catch on. They are starting to look to us, the digital preservation practitioners, to see what they should do. They are starting to worry, starting to see the cracks in the digital world. Nothing lasts forever and sometimes in the digital world, it can be gone in a second with just a flick of a switch. Maybe it lives on somewhere, on those motionless hard drives, but without active management and commitment, even those hard drives will fail you some days. The events around the Gothamist’s shut down of its online news sites (inc. DCist and LAist) has highlighted this. The recent Slate article of streaming only services has us worried about preservation of TV and film content that is born digital and so centralised, that it cannot rely on a LOCKSS-based approach (Lots of Copies Keeps Stuff Safe).

These are of course just some of the things we need to worry about. Just some of things we’ll have to try to save. There’s still the other approximately 2.5 quintillion bytes (or roughly about 2.5 exabytes or 2.5 billion gigabytes) of data being created around the world each day to worry about. We’re not going to keep it all, but we’re going to want to keep some of it. And that some of it is rapidly increasing.

So this International Digital Preservation Day, I encourage everyone to think about their digital lives, at home and at work, and think about what you need to do to make your digital “things” last. There are a field of experts in the world, who are here to help. We are no further than a tweet away. We survive by collaborating and helping each other. And we’re here to help you save the bits.


Want to learn more?

Visit the Digital Preservation Coalition for advice, reports and further information: http://www.dpconline.org/ 

Speak to the digital preservation hive mind on Twitter using any of these hashtags: #digitalpreservation #digipres #digpres

For more International Digital Preservation Day activities, visit: http://www.dpconline.org/events/international-digital-preservation-day or check out the hashtag #IDPD17

Digital Preservation futurology

I fancy attempting futurology, so here’s a list of things I believe could happen to ‘digital preservation systems’ over the next decade. I’ve mostly pinched these ideas from folks like Dave Thompson, Neil Jefferies, and my fellow Fellows. But if you see one of your ideas, please claim it using the handy commenting mechanism. And because it’s futurology, it doesn’t have to be accurate, so kindly contradict me!

Ingest becomes a relationship, not a one-off event

Many of the core concepts underpinning how computers are perceived to work are crude, paper-based metaphors – e.g. ‘files’, ‘folders’, ‘desktops’, ‘wastebaskets’ etc – that don’t relate to what your computer’s actually doing. (The early players in office computing were typewriter and photocopier manufacturers, after all…) These metaphors have succeeded at getting everyone to use computers, but they’ve also suppressed various opportunities to work smarter, too.

The concept of ingesting (oxymoronic) ‘digital papers’ is obviously heavily influenced by this paper paradigm.  Maybe the ‘paper paradigm’ has misled the archival community about computers a bit, too, given that they were experts at handling ‘papers’ before computers arrived?

As an example of what I mean: in the olden days (25 whole years ago!), Professor Plum would amass piles of important papers until the day he retired / died, and then, and only then, could these personal papers be donated and archived. Computers, of course, make it possible for the Prof both to keep his ‘papers’ where he needs them, and donate them at the same time, but the ‘ingest event’ at the centre of current digital preservation systems still seems to be underpinned by a core concept of ‘piles of stuff needing to be dealt with as a one-off task’. In future, the ‘ingest’ of a ‘donation’ will actually become a regular, repeated set of occurrences based upon ongoing relationships between donors and collectors, and forged initially when Profs are but lowly postgrads. Personal Digital Archiving and Research Data Management will become key; and ripping digital ephemera from dying hard disks will become less necessary as they become so.

The above depends heavily upon…

Object versioning / dependency management

Of course, if Dr. Damson regularly donates materials from her postgrad days onwards, some of these may be updates to things donated previously. Some of them might have mutated so much since the original donation that they can be considered ‘child’ objects, which may have ‘siblings’ with ‘common ancestors’ already extant in the archive. Hence preservation systems need to manage multiple versions of ‘digital objects’, and the relationships between them.

Some of the preservation systems we’ve looked at claim to ‘do versioning’ but it’s a bit clunky – just side-by-side copies of immutable ‘digital objects’, not records of the changes from one version to the next, and with no concept of branching siblings from a common parent. Complex structures of interdependent objects are generally problematic for current systems. The wider computing world has been pushing at the limits of the ‘paper-paradigm’ immutable object for a while now (think Git, Blockchain, various version control and dependency management platforms, etc). Digital preservation systems will soon catch up.

Further blurring of the object / metadata boundary

What’s more important, the object or the metadata? The ‘paper-paradigm’ has skewed thinking towards the former (the sacrosanct ‘digital object’, comparable to the ‘original bit of paper’), but after you’ve digitised your rare book collection, what are Humanities scholars going to text-mine? It won’t be images of pages – it’ll be the transcripts of those (i.e. the ‘descriptive metadata’)*. Also, when seminal papers about these text mining efforts are published, how is this history of the engagement with your collection going to be recorded? Using a series of PREMIS Events (that future scholars can mine in turn), perhaps?

The above talk of text mining and contextual linking of secondary resources raises two more points…

* While I’m here, can I take issue with the term ‘descriptive metadata’? All metadata is descriptive. It’s tautological; like saying ‘uptight Englishman’. Can we think of a better name?

Ability to analyse metadata at scale

‘Delivery’ no longer just means ‘giving users a viewer to look at things one-by-one with’ – it now also means ‘letting people push their Natural Language or image processing algorithms to where the data sits, and then coping with vast streams of output data’.

Storage / retention informed by well-understood usage patterns

The fact that everything’s digital, and hence easier to disseminate and link together than physical objects, also means better understanding how people use our material. This doesn’t just mean ‘wiring things up to Google Analytics’ – advances in bibliometrics that add social / mainstream media analysis, and so forth, to everyday citation counts present opportunities to judge the impact of our ‘stuff’ on the world like never before. Smart digital archives will inform their storage management and retention decisions with this sort of usage information, potentially in fully or semi-automated ways.

Ability to get data out, cleanly – all systems are only ever temporary!

Finally – it’s clear that there are no ‘long-term’ preservation system options. The system you procure today will merely be ‘custodian’ of your materials for the next ten or twenty years (if you’re lucky). This may mean moving heaps of content around in future, but perhaps it’s more pragmatic to think of future preservation systems as more like ‘lenses’ that are laid on top of more stable data stores to enable as-yet-undreamt-of functionality for future audiences?

(OK – that’s enough for now…)

Six Priority Digital Preservation Demands

Somaya Langley, Cambridge Policy and Planning Fellow, talks about her top 6 demands for a digital preservation system.


Photo: Blazej Mikula, Cambridge University Library

As a former user of one digital preservation system (Ex Libris’ Rosetta), I have spent a few years frustrated by the gap between what activities need to be done as part of a digital stewardship end-to-end workflow – including packaging and ingesting ‘information objects’ (files and associated metadata) – and the maturity level of digital preservation systems.

Digital Preservation Systems Review

At Cambridge, we are looking at different digital preservation systems and what each one can offer. This has involved talking to both vendors and users of systems.

When I’m asked about what my top digital preservation system current or future requirements are, it’s excruciatingly hard to limit myself to a handful of things. However, having previously been involved in a digital preservation system implementation project, there are some high-level takeaways from past experiences that remain with me.

Shortlist

Here’s the current list of my six top ‘digital preservation demands’ (aka user requirements):

Integration (with various other systems)

A digital preservation ‘system’ is only one cog in a wheel within a much larger machine; one piece of a much larger puzzle. There is an entire ‘digital ecosystem’ that this ‘system’ should exist within, and end-to-end digital stewardship workflows are of primary importance. The right amount of metadata and/or files should flow should flow from one system to another. We must also know where the ‘source of truth’ is for each bit.

Standards-based

This seems like a no-brainer. We work in Library Land. Libraries rely on standards. We also work with computers and other technologies that also require standard ways (protocols etc.) of communicating.

For files and metadata to flow from one system to another – whether via import, ingest, export, migration or an exit strategy from a system – we already spend a bunch of time creating mappings and crosswalks from one standard (or implementation of a standard) to another. If we don’t use (or fully implement) existing standards, this means we risk mangling data, context or meaning; potentially losing or not capturing parts of the data; or just wasting a whole lot of time.

Error Handling (automated, prioritised)

There’s more work to be done in managing digital materials than there are people to do it. Content creation is increasing at exponential rates, meanwhile the number of staff (with the right skills) just aren’t. We have to be smart about how we work. This requires prioritisation.

We need to have smarter systems that help us. This includes helping to prioritise where we focus our effort. Digital preservation systems are increasingly incorporating new third-party tools. We need to know which tool reports each error and whether these errors are show-stoppers or not. (For example: is the content no longer renderable versus a small piece of non-critical descriptive metadata that is missing?) We have to accept that, for some errors, we will never get around to addressing them.

Reporting

We need to be able to report to different audiences. The different types of reporting classes include (but are not limited to):

  1. High-level reporting – annual reports, monthly reports, reports to managers, projections, costings etc.)
  2. Collection and preservation management reporting – reporting on successes and failures, overall system stats, rolling checksum verification etc.
  3. Reporting for preservation planning purposes – based on preservation plans, we need to be able to identify subsections of our collection (configured around content types, context, file format and/or whatever other parameters we choose to use) and report on potential candidates that require some kind of preservation action.

Provenance

We need to best support – via metadata – where a file has come from. This, for want of a better approach, is currently being handled by the digital preservation community through documenting changes as Provenance Notes. Digital materials acquired into our collections are not just the files, they’re also the metadata. (Hence, why I refer to them as ‘information objects’.) When an ‘information object’ has been bundled, and is ready to be ingested into a system, I think of it as becoming an ‘information package’.

There’s a lot of metadata (administrative, preservation, structural, technical) that appears along the path from an object’s creation until the point at which it becomes an ‘information package’. We need to ensure we’re capturing and retaining the important components of this metadata. Those components we deem essential must travel alongside their associated files into a preservation system. (Not all files will have any or even the right metadata embedded within the file itself.) Standardised ways of handling information held in Provenance Notes (whether these are from ‘outside of the system’ or created by the digital preservation system) and event information so it can be interrogated and reported on is crucial.

Managing Access Rights

Facilitating access is not black and white. Collections are not simply ‘open’ or ‘closed’. We have a myriad of ways that digital material is created and collected; we need to ensure we can provide access to this content in a variety of ways that support both the content and our users. This can include access from within an institution’s building, via a dedicated computer terminal, online access to anyone in the world, mediated remote access, access to only subsets of a collection, support for embargo periods, ensuring we respect cultural sensitivities or provide access to only the metadata (perhaps as large datasets) and more.

We must set a goal of working towards providing access to our users in the many different (and new-ish) ways they actually want to use our content.

It’s imperative to keep in mind the whole purpose of preserving digital materials is to be able to access them (in many varied ways). Provision of content ‘viewers’ and facilitating other modes of access (e.g. to large datasets of metadata) are essential.

Final note: I never said addressing these concerns was going to be easy. We need to factor each in and make iterative improvements, one step at a time.

Designing digital preservation training – it’s more than just talking

Sarah, Oxford’s Outreach and Training Fellow, writes about the ‘training cycle’ and concludes that delivering useful training is more than just talking at learners.


We have all been there before: trying to keep our eyes open as someone drones on in the front of the room, while the PowerPoint slides seem to contain a novella that hurts your eyes to squint to read. That’s not how training is supposed to go.

Rather, engaging your learner in a variety activities will help them retain knowledge. And in a field like digital preservation, the more hands-on the training, the better. So often we talk about concepts or technical tools, but we very rarely provide examples, demonstrate them, or (better yet) have staff experiment with them.

And training is just one small part of the training process. I’ve learned there are many steps involved in developing a course that will be of use to staff. Most of your time will not be spent in the training room.

Identifying Learner’s Needs

Often easier said than done. It’s better to prepare for all types of learners and pitch the material to a wide audience. With hands-on tasks, it’s possible to have additional work prepared for advanced learners, so they don’t get bored while other learners are still working through the task.

Part of the DPOC project has been about finding the gaps in digital preservation skills and knowledge, so that our training programmes can better meet staff’s needs. What I am learning is that I need to cast my net wide to reach everyone!

Planning and Preparation

The hard bit. Start with what your outcomes are going to be and try not to put too many into a session. It’s too easy to be extra ambitious. Once you have them, then you pick your activities, gather your materials (create that PowerPoint) and practise! Never underestimate the value of practising your session on your peers beforehand.

Teaching and Learning

The main event. It’s important to be confident, open and friendly as a trainer. I admit, I stand in the bathroom and do a “Power Pose” for a few minutes to psyche myself up. You are allowed nerves as a trainer! It’s important to be flexible during the course

Assessment

Because training isn’t just about Teaching and Learning. That only accounts for 1/5th of the training cycle. Assessment is another 1/5th and if that’s going to happen during the course, then it needs to be planned. Using a variety of the activities mentioned above will help with that. Be aware though: activities almost always take longer than you plan! 

Activities to facilitate learning:

  • questioning
  • group activities such as, case studies, card sorting, mindmapping, etc.
  • hands-on tasks with software
  • group discussions
  • quizzes and games
  • modelling and demonstrations followed by an opportunity to practise the skill

Evaluation

Your evaluation is crucial to this. Make notes after your session on what you liked and what you need to fix. Peer evaluation is also important and sending out surveys immediately after will help with response rates. However, if you can do a paper evaluation at the end of the course, your response rates will be higher. Use that feedback to improve the course, tweak activities and content, so that you can start all over again.

Planning is a verb

In her DPC webinar on October 19, Nancy McGovern (MIT Libraries) spoke about ‘Preservation Planning and Maturity Modelling’. Maturity models are a great way to measure our progress as we look to solves some of our institutions’ digital preservation issues. Without them, digital preservation would be an unending task with no benchmarks, no goals. And one of the things that stuck out in the talk were some words of wisdom from Nancy:

Planning is a verb, it is not something you can do once and you’re done.

This is something that I think sits at the heart of digital preservation: this is not something we “do” and we’re done. Technology is constantly changing and requires continual monitoring for new tools, applications, and obsolescence. This constantly shifting environment means there is no single, one-time solution to digital preservation. It is a coordinated effort between “technology, decision-making, and people.” None of these things remains constant, but are ever-changing. Decision-making tools (such as policies) and people (skills) are also the hardest part of digital preservation, because there is no one-size fits all for either one. In comparison, technology is relatively easy to manage and plan for.

Having maturity models provides the stepping stones for developing technology, decision-making, and people. If viewed all at once, the task of implementing a sustainable digital preservation programme seems unlikely, but following steps makes it manageable ad measurable. One such maturity model is The Five Organizational Stages of Digital Preservation (from Kenney & McGovern):

  1. Acknowledge: Understanding that digital preservation is a local concern;
  2. Act: Initiating digital preservation projects;
  3. Consolidate: Segueing from projects to programs;
  4. Institutionalize: Incorporating the larger environment; and
  5. Externalize: Embracing inter-institutional collaboration and dependency.

(this is just one of many maturity models available, but it was referenced in the webinar)

And when Nancy spoke about this maturity model, she stressed the importance that your organisation might reach a level 5, but it might not stay a level 5 forever. The loss of an integral staff member, a shift in technology, or even starting a new digital collection or department would shift the balance again. This discussion only further reinforced for the that digital preservation is not something you can “set and forget,” but an on-going process.

Planning is also an important function in the OAIS reference model (preservation planning sits over the entire model). It is about monitoring external environments and recommending revisions or changes where necessary. Planning is essentially the “safeguard against a constantly evolving user and technology environment” (Lavoie, 2014). Meaning that where people and technology are involved, we are facing an ever-changing future; we must continually monitor and plan in order to provide long-term access to our digital assets.

After all, planning is a verb isn’t it?


What do you think? Is digital preservation a solution you can do once and be done with or does it require ongoing support and development? Or something else entirely? Join the discussion below:

Come Join the DPOC Team!

The Bodleian Libraries, University of Oxford are looking for the third Polonsky Fellow (Technical Officer/Research Software Engineer) to join the team! 

As a Technical Officer/Research Software Engineer at Oxford you will undertake research and training to build upon your expertise in the technical issues surrounding digital preservation and your awareness of the tools, systems and projects that seek to address these issues. You will also develop and/or implement digital preservation applications and services within the Bodleian Libraries, contribute to the development of a business case and sustainability plan for digital preservation operations, disseminate the key findings of your work to at least one conference and submit one journal article per year based on your work in collaboration with colleagues.

If you’re interested in joining this project and want more information, apply here.


*Remember you get to work with these great team members at Oxford and Cambridge!