The Ethics of Working in Digital Preservation

Since joining the DPOC project in 2016, I have been espousing the need for holistic approaches to digital preservation. This has very much been about how skills development, policy, strategy, workflows and much more need to be included as part of a digital preservation offering. Digital preservation is never just about the tech. There is a concern I must raise: how we play nice together.

Since first drafting this post in October 2017, there have been several events I would be remiss not to mention. Ethics and how we conduct ourselves in professional contexts have been brought into the current social consciousness by the #metoo movement and the recent matter regarding Chris Bourg’s keynote at the Code4Lib conference.

Working Together

We know digital preservation can’t be done alone, and I believe the digital preservation community is well on the way to accepting this. One single person cannot hold all the information about every type of file, standard, operating system, disk file system, policy, carrier, hardware, peripheral, protocol, copyright, legislation as well as undertake advocacy, suitably negotiate with donors etc.

Dream Team – Library of Congress Digital Preservation Outreach and Education Training Materials

For each digital preservation activity, we need a ‘dream team’. This is a term Emma Jolley (Curator of Digital Archives, National Library of Australia) incorporated into the 2015 Library of Congress Digital Preservation Outreach & Education (DPOE) Train the Trainer education programme I took part in. This understanding of the needs of complementary skills, knowledge and approaches very much underpins the Polonsky Digital Preservation Project.

Step by Step, Hand in Hand

If I think back to my time working in digital preservation in the mid-2000s, it was a far more isolating experience than it is now. Remembering the challenges we were discussing back then, it doesn’t feel as if the field has progressed all that much. It may just be slow going. Or perhaps it’s fear of making a wrong decision?

As humans, we know we have the capacity to learn from mistakes. We’ve likely had someone tell us about the time they (temporarily or permanently) lost data. The short-term lifespan of media carriers, inter-dependencies between different components, changes to services where data may be stored ‘in the cloud’ and the limited availability of devices (hardware or software) to read and interpret the data means that digital content is fragile (for many reasons, not only technical) and is continually at risk.

There are enough lessons of data loss out there in the wider world that it is imperative we acknowledge these situations and learn from them. Nor should we have to face these kinds of stressful situations alone; it should be done step-by-step, hand-in-hand, supporting each other.

Acknowledging Failure

Over recent years, the international arts and cultural sector has begun to share examples of failures. While it is easy to share successes, it’s far harder to openly share information about failures. Failure in current western society is definitely not a desirable outcome. Yet we learn from failure. As a response to ‘ideas’ festivals and TED talks, events such as Failure Lab have been gaining momentum.

The need to share (in considered ways) about failures in digital preservation is somewhat new, however it’s not an entirely new concept. (The now infamous story of how parts of Toy Story 2 were deleted have helped illustrate the need for regularly checking backup functions.) More recently, at PASIG 2017, one of the most memorable presentations of the whole conference was Eduardo Del Valle’s Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation. I believe I speak for many of the PASIG conference attendees when I state how valuable a presentation this was.

In May 2017, the Digital Preservation Coalition ran possibly the most useful event I attended in all of 2017: Digital Preservationists Anonymous (aka Fail Club). We were able to share our war stories within the safety and security of Chatham House Rules and learn a lot from each other that will be able to take us forward in our work at our respective institutions. Hearing another organisation that is further ahead, inform us about the tricky things they’ve encountered helps us progress better, faster.

iPres 2017 and the Operational Pragmatism Panel

Yet there are other problematic issues within the field of digital preservation. It’s not always an easy field to work in; it doesn’t yet have the diversity it needs, nor necessarily respect the diversity of views already present.

Operational Pragmatism in Digital Preservation: Establishing context-aware minimum viable baselines was a panel session I facilitated at iPres 2017, held in September 2017 in Kyoto, Japan. The discussion was set out as a series of ‘provocations’ (developed collaboratively by the panellists) about different aspects digital preservation. (Future blog posts are yet to published about the topics and views presented during the panel discussion.) I had five experienced panellists representing a range of different countries they’ve worked in around the world (Canada, China, France, Kenya, the Netherlands, the UK and the USA) plus myself (originally from Australia). Another eight contributors (from Australia, Germany, New Zealand, the UK and the USA) also fed into forming the panel topic, panel member makeup or the provocations. Each panellist was allocated a couple of minutes to present their point of view in response to each provocation. Then the discussion was opened up to the wider audience. It was never going to be an easy panel. I was asking a lot of my panellists. They were each having to respond to one challenging question after another, providing a simple answer to each question (that could be used to inform decisions about what the ‘bare minimum’ work could be done for each digital preservation scenario). This was no small feat.

Rather than the traditional panel presentation, where only a series of experts get to speak, it was intended as a more inclusive discussion. The discussion was widened to include the audience in good faith, so that audience members could share openly throughout, if they wished. However, it became apparent that there were some other dynamics at play.

One Person Alone is Never Enough

Since I first commenced working in digital preservation in 2005, I have witnessed the passion and commitment to viewpoints that individuals within this field hold. I expected a lively discussion and healthy debate, potentially with opposing views between the panellists (who had been selected to represent different GLAM sectors, organisation sizes, nations, cultures, backgrounds and approaches to digital preservation).

As I was facilitating the panellists for this demanding session, I had organised an audience facilitator (someone well-established within the digital preservation community). Unfortunately, due to circumstances out of our control, this person was unable to be present (and an experienced replacement was unable to be found at short notice). This situation left my panellists open to criticism. One panellist was on the receiving end of a disproportionate amount of scrutiny from the audience. Despite attempts, as a lone facilitator, I was unable to defuse the situation. After the panel session finished, several audience members remarked that they didn’t feel comfortable participating in the discussion.

Facilitating a safe environment for both panellists and for the wider audience to debate topics they are passionate about is vitally important, yet this failed to occur in this instance. As a result, the panel were unable to summarise and present conclusions about possible ‘minimum baselines’ for each of the provocations. It’s clear in this instance that a single facilitator was not enough.

Community Responsibility

In this respect, we have failed as a community. While we may have vastly differing viewpoints, it is essential we cultivate environments where people feel safe to express their views and have them received in a professional and respected manner. The digital preservation community is growing – in both size and diversity. We are aware we need to put in place, improve or refresh our technical infrastructures. Now is also the time to look at how we handle our social infrastructure. It is my opinion that there is a place for a wide range of individuals, with a vast variety of backgrounds and skills needed in the digital preservation field.

There are people who are already working in digital preservation and who have great skills. They might not all be software developers, but they know how to project manage, speak, write, problem-solve, and are subject matter experts in a wide range of areas. The value of diversity has been proven. If we only have coders, computer scientists or individuals from any one background working in the field of digital preservation, then surely, we will fail.

Moving Forward

In the hours and days following the panel, I reached out to my communities online for pointers to Codes of Ethics, Codes of Conduct and other articles discussing challenging situations in similar industries. Borrowing from other industries and adapting to fit the context at hand has always been important to me. I don’t want to reinvent the wheel and would prefer to learn from others’ experiences. The panel ‘provocations’ presented were not contentious, yet how the discussion evolved throughout the duration of the panel somewhat echoes other events that have occurred within the tech industry.

At the time of publishing this post, neither the digital preservation community nor iPres has a Code of Conduct or Code of Ethics. There have been mentions of the lack of an iPres Code of Conduct in previous years. For iPres 2018, developing a Code of Conduct has become a priority. However, it shouldn’t have taken us this long to put in place some frameworks of this type, given we all know we must work collaboratively if we are to succeed. Back in 1997, UNESCO suggested that if Audiovisual Archiving was a profession, it would also require a Code of Ethics (Audiovisual archives: a practical reader – section 4, pages 15-17).

Codes of Conduct and Codes of Ethics are a starting point. Several examples include:

There’s a longer list of Codes of Conduct and Codes of Ethics that have been compiled over the past six months since iPres 2017. Even the Loop electronic music makers summit (an initiative of the Ableton software company) I attended last November in Berlin, had in place a thorough Code of Conduct.

Building Better Communities

Codes are not enough. This is about building better communities.

A 2016 article emerging from the tech community has a list of suggestions for facilitating the development of ‘plumbers’ (and therefore functional infrastructure) rather than ‘rock stars’, under the section titled: “How do we as a community prevent rock stars?”.

Building and maintaining infrastructure is typically not fun nor sexy – but this is what digital preservation demands. Without us working collaboratively and inclusively, we will not be able to acquire, preserve or provide access to the digital content we are the stewards of. This is because we won’t fully understand the contexts of the individuals producing the content, if we don’t have the same kind of diversity within our own field of digital preservation.

Diversity may not be easy, but neither is digital preservation. While it might not be rocket science per se, we’re accustomed to working on hard and complex things. Here are some suggestions to help us take the next step(s):

  • Organisers: encourage, model and – where necessary – enforce ‘good practice’ behaviours codes
  • Participants: recognise, appreciate and celebrate the privilege of being able to debate digital preservation as part of what we do. Allow and encourage minority, less confident and new voices to hold an equal place in our discussions
  • Everyone: recognise and work towards addressing our own unconscious biases and privileges

Like Kenney and McGovern’s Three-Legged Stool for Digital Preservation (a model our DPOC project is very much based on), where the organisational infrastructure, resources framework and technological infrastructure are of equal importance, recognising that the complexity of the digital preservation challenge is best addressed through multiple perspectives is essential. We must model and welcome the benefits of our diversity. Each of us brings something unique and every skill or bit of knowledge is valuable.

The vision for a preservation repository

Over the last couple of months, work at Cambridge University Library has begun to look at what a potential digital preservation system will look like, considering technical infrastructure, the key stakeholders and the policies underpinning them. Technical Fellow, Dave, tells us more about the holistic vision…


This post discusses some of the work we’ve been doing to lay foundations beneath the requirements for a ‘preservation system’ here at Cambridge. In particular, we’re looking at the core vision for the system. It comes with the standard ‘work in progress’ caveats – do not be surprised if the actual vision varies slightly (or more) from what’s discussed here. A lot of the below comes from Mastering the Requirements Process by Suzanne and James Robertson.

Also – it’s important to note that what follows is based upon a holistic definition of ‘system’ – a definition that’s more about what people know and do, and less about Information Technology, bits of tin and wiring.

Why does a system change need a vision?

New systems represent changes to the existing status-quo. The vision is like the Pole Star for such a change effort – it ensures that people have something fixed to move towards when they’re buried under minute details. When confusion reigns, you can point to the vision for the system to guide you back to sanity.

Plus, as with all digital efforts, none of this is real: there’s no definite, obvious end point to the change. So the vision will help us recognise when we’ve achieved what we set out to.

Establishing scope and context

Defining what the system change isn’t is a particularly good a way of working out what it actually represents. This can be achieved by thinking about the systems around the area you’re changing and the information that’s going to flow in and out. This sort of thinking makes for good diagrams: one that shows how a preservation repository system might sit within the broader ecosystem of digitisation, research outputs / data, digital archives and digital published material is shown below.

System goals

Being able to concisely sum-up the key goals of the system is another important part of the vision. This is a lot harder than it sounds and there’s something journalistic about it – what you leave out is definitely more important than what you keep in. Fortunately, the vision is about broad brush strokes, not detail, which helps at this stage.

I found some great inspiration in Sustainable Economics for a Digital Planet, which indicated goals such as: “the system should make the value of preserving digital resources clear”, “the system should clearly support stakeholders’ incentives to preserve digital resources” and “the functional aspects of the system should map onto clearly-defined preservation roles and responsibilities”.

Who are we implementing this for?

The final main part of the ‘vision’ puzzle is the stakeholders: who is going to benefit from a preservation system? Who might not benefit directly, but really cares that one exists?

Any significant project is likely to have a LOT of these, so the Robertsons suggest breaking the list down by proximity to the system (using Ian Alexander’s Onion Model), from the core team that uses the system, through the ‘operational work area’ (i.e. those with the need to actually use it) and out to interested parties within the host organisation, and then those in the wider world beyond. An initial attempt at thinking about our stakeholders this way is shown below.

One important thing that we realised was that it’s easy to confuse ‘closeness’ with ‘importance’: there are some very important stakeholders in the ‘wider world’ (e.g. Research Councils or historians) that need to be kept in the loop.

A proposed vision for our preservation repository

After iterating through all the above a couple of times, the current working vision (subject to change!) for a digital preservation repository at Cambridge University Library is as follows:

The repository is the place where the best possible copies of digital resources are stored, kept safe, and have their usefulness maintained. Any future initiatives that need the most perfect copy of those resources will be able to retrieve them from the repository, if authorised to do so. At any given time, it will be clear how the digital resources stored in the repository are being used, how the repository meets the preservation requirements of stakeholders, and who is responsible for the various aspects of maintaining the digital resources stored there.

Hopefully this will give us a clear concept to refer back to as we delve into more detail throughout the months and years to come…

Planning your (digital) funeral: for projects

Cambridge Policy & Planning Fellow, Somaya, writes about her paper and presentation from the Digital Culture Heritage Conference 2017. The conference paper, Planning for the End from the Start: an Argument for Digital Stewardship, Long-Term Thinking and Alternative Capture Approaches, looks at considering digital preservation at the start of a digital humanities project and provides useful advice for digital humanities researchers to use in their current projects.


In August I presented at the Digital Cultural Heritage 2017 international conference in Berlin (incidentally, my favourite city in the whole world).

Berlin - view from the river Spree. Photo: Somaya Langley

Berlin – view from the river Spree. Photo: Somaya Langley

I presented the Friday morning Plenary session on Planning for the End from the Start: an Argument for Digital Stewardship, Long-Term Thinking and Alternative Capture Approaches. Otherwise known as: ‘planning for your funeral when you are conceived’. This is a presentation that represents challenges faced by both Oxford and Cambridge and the thinking behind this has been done collaboratively by myself and my Oxford Policy & Planning counterpart, Edith Halvarsson.

We decided it was a good idea to present on this topic to an international digital cultural heritage audience, who are likely to also experience similar challenges as our own researchers. It is based on some common digital preservation use cases that we are finding in each of our universities.

The Scenario

A Digital Humanities project receives project funding and develops a series of digital materials as part of the research project, and potentially some innovative tools as well. For one reason or another, ongoing funding cannot be secured and so the PIs/project team need to find a new home for the digital outputs of the project.

Example Cases

We have numerous examples of these situations at Cambridge and Oxford. Many projects containing digital content that needs to be ‘rehoused’ are created in the online environment, typically as websites. Some examples include:

Holistic Thinking

We believe that thinking holistically right at the start of a project can provide options further down the line, should an unfavourable funding outcome be received.

So it is important to consider holistic thinking, specifically a Digital Stewardship approach (incorporating Digital Curation & Digital Preservation).

Models for Preservation

Digital materials don’t necessarily exist in a static form and often they don’t exist in isolation. It’s important to think about digital content as being part of a lifecycle and managed by a variety of different workflows. Digital materials are also subject to many risks so these also need to be considered.

Some models to frame thinking about digital materials:

Documentation

It is incredibly important to document your project and when handing over the responsibility of your digital materials and data, also handing over documentation to someone responsible for hosting or preserving your digital project will need to rely on this information. Also ensuring the implementation of standards, metadata schemas and persistent identifiers etc.

This can include providing associated materials, such as:

Data Management Plans

Some better use of Data Management Plans (DMPs) could be:

  • Submitting DMPs alongside the data
  • Writing DMPs as dot-points rather than prose
  • Including Technical Specifications such as information about code, software, software versions, hardware and other dependencies

An example of a DMP from Cambridge University’s Dr Laurent Gatto: Data Management Plan for a Biotechnology and Biological Sciences Research Council

Borrowing from Other Disciplines

Rather than having to ‘rebuild the wheel’, we should also consider borrowing from other disciplines. For example, borrowing from the performing arts we might provide similar documents and information such as:

  • Technical Rider (a list of requirements for staging a music gig and theatre show)
  • Stage Plots (layout of instruments, performers and other equipment on stage)
  • Input Lists (ordered list of the different audio channels from your instruments/microphones etc. that you’ll need to send to the mixing desk)

For digital humanities projects and other complex digital works, providing simple and straight forward information about data flows (including inputs and outputs) will greatly assist digital preservationists in determining where something has broken in the future.

Several examples of Technical Riders can be found here:

Approaches

Here are some approaches to consider in regards to interim digital preservation of digital materials:

Bundling & Bitstream Preservation

The simplest and most basic approach may be to just zip up files and undertake bitstream preservation. Bitstream preservation only ensures that the zeroes and ones that went into a ‘system’ come out as the same zeroes and ones. Nothing more.

Exporting / Migrating

Consider exporting digital materials and/or data plus metadata into recognised standards as a means of migrating into another system.

For databases, the SIARD (Software Independent Archiving of Relational Databases) standard may be of use.

Hosting Code

Consider hosting code within your own institutional repository or digital preservation system (if your organisation has access to this option) or somewhere like GitHub or other services.

Packing it Down & ‘Putting on Ice’

You may need to consider ‘packing up’ your digital materials and doing it in a way that you can ‘put it on ice’. Doing this in a way that – when funding is secured in the future – it can be somewhat simply be brought back to life.

An example of this is the the work that Peter Sefton, from the University of Sydney in Australia, has been trialling. Based on Omeka, he has created a version of code called OzMeka. This is an attempt at a standardised way of being able to handle research project digital outputs that have been presented online. One example of this is Dharmae.

Alternatively, the Kings Digital Lab, provide infrastructure for eResearch and Digital Humanities projects that ensure the foundations of digital projects are stable from the get-go and mitigates risks regarding longer-term sustainability of digital content created as part of the projects.

Maintaining Access

This could be done through traditional web archiving approaches, such as using tools Web Archiving Tools (Heritrix or HTTrack) or downloading video materials using Video Download Helper for video. Alternatively, if you are part of an institution, the Internet Archive’s ArchiveIt service may be something you want to consider and can work with your institution to implement this.

Hosted Infrastructure Arrangements

Finding another organisation to take on the hosting of your service. If you do manage to negotiate this, you will need to either put in place a contract or Memorandum of Understanding (MOU) as well as handing over various documentation, which I have mentioned earlier.

Video Screen Capture

A simple way of attempting to document a journey through a complex digital work (not necessarily online, this can apply to other complex interactive digital works as well), may be by way of capturing a Video Screen Capture.

Kymata Atlas - Video Screen Capture still

Kymata Atlas – Video Screen Capture still

Alternatively, recording a journey through an interactive website using the Webrecorder, developed by Rhizome, which will produce WARC web archive files.

Documenting in Context

Another means of understanding complex digital objects is to document the work in the context in which it was experienced. One example of this is the work of Robert Sakrowski and Constant Dullart, netart.database.

An example of this is the work of Dutch and Belgian net.artists JODI (Joan Heemskerk & Dirk Paesmans) shown here.

JODI - netart.database

JODI – netart.database

Borrowing from documenting and archiving in the arts, an approach of ‘documenting around the work‘ might be suitable – for example, photographing and videoing interactive audiovisual installations.

Web Archives in Context

Another opportunity to understand websites – if they have been captured by the Internet Archive – is viewing these websites using another tool developed by Rhizome, oldweb.today.

An example of the Cambridge University Library website from 1997, shown in a Netscape 3.04 browser.

Cambridge University Library website in 1997 via oldweb.today

Cambridge University Library website in 1997 via oldweb.today

Conclusions

While there is no one perfect solution and each have their own pros and cons, using an approach that combines different methods might make your digital materials available post the lifespan of your project. These methods will help ensure that digital material is suitably documented, preserved and potentially accessible – so that both you and others can use the data in an ongoing manner.

Consider:

  • How you want to preserve the data?
  • How you want to provide access to your digital material?
  • Developing a strategy including several different methods.

Finally, I think this excerpt is relevant to how we approach digital stewardship and digital preservation:

“No man is an island entire of itself; every man is a piece of the continent, a part of the main” – Meditation XVII, John Donne

We are all in this together and rather than each having to troubleshoot alone and building our own separate solutions, it would be great if we can work to our strengths in collaborative ways, while sharing our knowledge and skills with others.

Digital Preservation futurology

I fancy attempting futurology, so here’s a list of things I believe could happen to ‘digital preservation systems’ over the next decade. I’ve mostly pinched these ideas from folks like Dave Thompson, Neil Jefferies, and my fellow Fellows. But if you see one of your ideas, please claim it using the handy commenting mechanism. And because it’s futurology, it doesn’t have to be accurate, so kindly contradict me!

Ingest becomes a relationship, not a one-off event

Many of the core concepts underpinning how computers are perceived to work are crude, paper-based metaphors – e.g. ‘files’, ‘folders’, ‘desktops’, ‘wastebaskets’ etc – that don’t relate to what your computer’s actually doing. (The early players in office computing were typewriter and photocopier manufacturers, after all…) These metaphors have succeeded at getting everyone to use computers, but they’ve also suppressed various opportunities to work smarter, too.

The concept of ingesting (oxymoronic) ‘digital papers’ is obviously heavily influenced by this paper paradigm.  Maybe the ‘paper paradigm’ has misled the archival community about computers a bit, too, given that they were experts at handling ‘papers’ before computers arrived?

As an example of what I mean: in the olden days (25 whole years ago!), Professor Plum would amass piles of important papers until the day he retired / died, and then, and only then, could these personal papers be donated and archived. Computers, of course, make it possible for the Prof both to keep his ‘papers’ where he needs them, and donate them at the same time, but the ‘ingest event’ at the centre of current digital preservation systems still seems to be underpinned by a core concept of ‘piles of stuff needing to be dealt with as a one-off task’. In future, the ‘ingest’ of a ‘donation’ will actually become a regular, repeated set of occurrences based upon ongoing relationships between donors and collectors, and forged initially when Profs are but lowly postgrads. Personal Digital Archiving and Research Data Management will become key; and ripping digital ephemera from dying hard disks will become less necessary as they become so.

The above depends heavily upon…

Object versioning / dependency management

Of course, if Dr. Damson regularly donates materials from her postgrad days onwards, some of these may be updates to things donated previously. Some of them might have mutated so much since the original donation that they can be considered ‘child’ objects, which may have ‘siblings’ with ‘common ancestors’ already extant in the archive. Hence preservation systems need to manage multiple versions of ‘digital objects’, and the relationships between them.

Some of the preservation systems we’ve looked at claim to ‘do versioning’ but it’s a bit clunky – just side-by-side copies of immutable ‘digital objects’, not records of the changes from one version to the next, and with no concept of branching siblings from a common parent. Complex structures of interdependent objects are generally problematic for current systems. The wider computing world has been pushing at the limits of the ‘paper-paradigm’ immutable object for a while now (think Git, Blockchain, various version control and dependency management platforms, etc). Digital preservation systems will soon catch up.

Further blurring of the object / metadata boundary

What’s more important, the object or the metadata? The ‘paper-paradigm’ has skewed thinking towards the former (the sacrosanct ‘digital object’, comparable to the ‘original bit of paper’), but after you’ve digitised your rare book collection, what are Humanities scholars going to text-mine? It won’t be images of pages – it’ll be the transcripts of those (i.e. the ‘descriptive metadata’)*. Also, when seminal papers about these text mining efforts are published, how is this history of the engagement with your collection going to be recorded? Using a series of PREMIS Events (that future scholars can mine in turn), perhaps?

The above talk of text mining and contextual linking of secondary resources raises two more points…

* While I’m here, can I take issue with the term ‘descriptive metadata’? All metadata is descriptive. It’s tautological; like saying ‘uptight Englishman’. Can we think of a better name?

Ability to analyse metadata at scale

‘Delivery’ no longer just means ‘giving users a viewer to look at things one-by-one with’ – it now also means ‘letting people push their Natural Language or image processing algorithms to where the data sits, and then coping with vast streams of output data’.

Storage / retention informed by well-understood usage patterns

The fact that everything’s digital, and hence easier to disseminate and link together than physical objects, also means better understanding how people use our material. This doesn’t just mean ‘wiring things up to Google Analytics’ – advances in bibliometrics that add social / mainstream media analysis, and so forth, to everyday citation counts present opportunities to judge the impact of our ‘stuff’ on the world like never before. Smart digital archives will inform their storage management and retention decisions with this sort of usage information, potentially in fully or semi-automated ways.

Ability to get data out, cleanly – all systems are only ever temporary!

Finally – it’s clear that there are no ‘long-term’ preservation system options. The system you procure today will merely be ‘custodian’ of your materials for the next ten or twenty years (if you’re lucky). This may mean moving heaps of content around in future, but perhaps it’s more pragmatic to think of future preservation systems as more like ‘lenses’ that are laid on top of more stable data stores to enable as-yet-undreamt-of functionality for future audiences?

(OK – that’s enough for now…)

Policy ramblings

For the second stage of the DPOC project Oxford and Cambridge have started looking at policy and strategy development. As part of the DPOC deliverables, the Policy and Planning Fellows will be collaborating with colleagues to produce a digital preservation policy and strategy for their local institutions. Edith (Policy and Planning Fellow at Oxford) blogs about what DPOC has been up to so far.


Last Friday I met with Somaya (Policy and Planning Fellow) and Sarah (Training and Outreach Fellow) at the British Library in London. We spent the day discussing review work which DPOC has done of digital preservation policies so far. The meeting also gave us a chance to outline an action plan for consulting stakeholders at CUL and Bodleian Libraries on future digital preservation policy development.

Step 1: Policy review work
Much work has already gone into researching digital preservation policy development [see for example the SCAPE project and OSUL’s policy case study]. As considerable effort has been exerted in this area, we want to make sure we are not reinventing the wheel while developing our own digital preservation policies. We therefore started by reading as many digital preservation policies from other organisations as we could possibly get our hands on. (Once we ran out of policies in English, I started feeding promising looking documents into Google Translate with a mixed bag of results.) The policy review drew attention to aspects of policies which we felt were particular successful, and which could potentially be re-purposed for the local CUL and Bodleian Libraries contexts.

My colleague Sarah helped me with the initial policy review work. Between the two of us we read 48 policies dating from 2008-2017. However, determining which documents were actual policies was trickier than we had first anticipated. We found that documents named ‘strategy’ sometimes read as policy, and documents named policy sometimes read as more low level procedures. For this reason, we decided to add another 12 strategy documents to the review which had strong elements of policy in them. This brought us up to a round 60 documents in total.

So we began reading…. But we soon found that once you are on your 10th policy of the day, you start to get them muddled up. To better organise our review work, we decided to put them into a classification system developed by Kirsten Snawder (2011) and adapted by Madeline Sheldon (2013). Snawder and Sheldon identified nineteen common topics from digital preservation policies. The topics range from ‘access and use’ to ‘preservation planning’ [for the full list of topics, see Sheldon’s article on The Signal from 2013]. I was interested in seeing how many policies would make direct reference to the Open Archival Information System (OAIS) reference model, so I added this in as an additional topic to the original nineteen identified by Snawder and Sheldon.

Reviewing digital preservation policies written between 2008-2017

Step 2: Looking at findings
Interestingly, after we finished annotating the policy documents we did not find a correlation between covering all of Snawder and Sheldon’s nineteen topics and having what we perceived as an effective policy. Effective in this context was defined as the ability of the policy to clearly guide and inform preservation decisions within an organisation. In fact, the opposite was more common as we judged several policies which had good coverage of topics from the classification system to be too lengthy, unclear, and sometimes inaccessible due to heavy use of digital preservation terminology.

In terms of OAIS, another interesting finding was that 33 out of 60 policies made direct reference to the OAIS. In addition to these 33, several of the ones which did not make an overt reference to the model still used language and terminology derived from it.

So while we found that the taxonomy was not able to guide us on which policy topics were an absolute essential in all circumstances, using it was a good way of arranging and documenting our thoughts.

Step 3: Thinking about guiding principles for policy writing
What this foray into digital preservation policies has shown us is that there is no ‘one fits all’ approach or a magic formula of topics which makes a policy successful. What works in the context of one institution will not work in another. What ultimately makes a successful policy also comes down to communication of the policy and organisational uptake. However, there are number of high level principles which the three of us all felt strongly about and which we would like to guide future digital preservation policy development at our local institutions.

Principle 1: Policy should be accessible to a broad audience. Contrary to findings from the policy review, we believe that digital preservation specific language (including OAIS) should be avoided at policy level if possible. While reviewing policy statements we regularly asked ourselves:

“Would my mother understand this?”

If the answer is yes, the statement gets to stay. If it is no, maybe consider re-writing it. (Of course, this does not apply if your mother works in digital preservation.)

Principle 2: Policy also needs to be high-level enough that it does not require constant re-writing in order to make minor procedural changes. In general, including individuals’ names or prescribing specific file formats can make a policy go out of date quickly. It is easier to change these if they are included in lower level procedures and guidelines.

Principle 3: Digital preservation requires resources. Getting financial commitment to invest in staff at policy level is important. It takes time to build organisation expertise in digital preservation, but losing it can happen a lot quicker. Even if you choose to outsource several aspects of digital preservation, it is important that staff have skills which enables them to understand and critically assess the work of external digital preservation service providers.

What are your thoughts? Do you have other principles guiding digital preservation policy development in your organisations? Do you agree or disagree with our high-level principles?