The Ethics of Working in Digital Preservation

Since joining the DPOC project in 2016, I have been espousing the need for holistic approaches to digital preservation. This has very much been about how skills development, policy, strategy, workflows and much more need to be included as part of a digital preservation offering. Digital preservation is never just about the tech. There is a concern I must raise: how we play nice together.

Since first drafting this post in October 2017, there have been several events I would be remiss not to mention. Ethics and how we conduct ourselves in professional contexts have been brought into the current social consciousness by the #metoo movement and the recent matter regarding Chris Bourg’s keynote at the Code4Lib conference.

Working Together

We know digital preservation can’t be done alone, and I believe the digital preservation community is well on the way to accepting this. One single person cannot hold all the information about every type of file, standard, operating system, disk file system, policy, carrier, hardware, peripheral, protocol, copyright, legislation as well as undertake advocacy, suitably negotiate with donors etc.

Dream Team – Library of Congress Digital Preservation Outreach and Education Training Materials

For each digital preservation activity, we need a ‘dream team’. This is a term Emma Jolley (Curator of Digital Archives, National Library of Australia) incorporated into the 2015 Library of Congress Digital Preservation Outreach & Education (DPOE) Train the Trainer education programme I took part in. This understanding of the needs of complementary skills, knowledge and approaches very much underpins the Polonsky Digital Preservation Project.

Step by Step, Hand in Hand

If I think back to my time working in digital preservation in the mid-2000s, it was a far more isolating experience than it is now. Remembering the challenges we were discussing back then, it doesn’t feel as if the field has progressed all that much. It may just be slow going. Or perhaps it’s fear of making a wrong decision?

As humans, we know we have the capacity to learn from mistakes. We’ve likely had someone tell us about the time they (temporarily or permanently) lost data. The short-term lifespan of media carriers, inter-dependencies between different components, changes to services where data may be stored ‘in the cloud’ and the limited availability of devices (hardware or software) to read and interpret the data means that digital content is fragile (for many reasons, not only technical) and is continually at risk.

There are enough lessons of data loss out there in the wider world that it is imperative we acknowledge these situations and learn from them. Nor should we have to face these kinds of stressful situations alone; it should be done step-by-step, hand-in-hand, supporting each other.

Acknowledging Failure

Over recent years, the international arts and cultural sector has begun to share examples of failures. While it is easy to share successes, it’s far harder to openly share information about failures. Failure in current western society is definitely not a desirable outcome. Yet we learn from failure. As a response to ‘ideas’ festivals and TED talks, events such as Failure Lab have been gaining momentum.

The need to share (in considered ways) about failures in digital preservation is somewhat new, however it’s not an entirely new concept. (The now infamous story of how parts of Toy Story 2 were deleted have helped illustrate the need for regularly checking backup functions.) More recently, at PASIG 2017, one of the most memorable presentations of the whole conference was Eduardo Del Valle’s Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation. I believe I speak for many of the PASIG conference attendees when I state how valuable a presentation this was.

In May 2017, the Digital Preservation Coalition ran possibly the most useful event I attended in all of 2017: Digital Preservationists Anonymous (aka Fail Club). We were able to share our war stories within the safety and security of Chatham House Rules and learn a lot from each other that will be able to take us forward in our work at our respective institutions. Hearing another organisation that is further ahead, inform us about the tricky things they’ve encountered helps us progress better, faster.

iPres 2017 and the Operational Pragmatism Panel

Yet there are other problematic issues within the field of digital preservation. It’s not always an easy field to work in; it doesn’t yet have the diversity it needs, nor necessarily respect the diversity of views already present.

Operational Pragmatism in Digital Preservation: Establishing context-aware minimum viable baselines was a panel session I facilitated at iPres 2017, held in September 2017 in Kyoto, Japan. The discussion was set out as a series of ‘provocations’ (developed collaboratively by the panellists) about different aspects digital preservation. (Future blog posts are yet to published about the topics and views presented during the panel discussion.) I had five experienced panellists representing a range of different countries they’ve worked in around the world (Canada, China, France, Kenya, the Netherlands, the UK and the USA) plus myself (originally from Australia). Another eight contributors (from Australia, Germany, New Zealand, the UK and the USA) also fed into forming the panel topic, panel member makeup or the provocations. Each panellist was allocated a couple of minutes to present their point of view in response to each provocation. Then the discussion was opened up to the wider audience. It was never going to be an easy panel. I was asking a lot of my panellists. They were each having to respond to one challenging question after another, providing a simple answer to each question (that could be used to inform decisions about what the ‘bare minimum’ work could be done for each digital preservation scenario). This was no small feat.

Rather than the traditional panel presentation, where only a series of experts get to speak, it was intended as a more inclusive discussion. The discussion was widened to include the audience in good faith, so that audience members could share openly throughout, if they wished. However, it became apparent that there were some other dynamics at play.

One Person Alone is Never Enough

Since I first commenced working in digital preservation in 2005, I have witnessed the passion and commitment to viewpoints that individuals within this field hold. I expected a lively discussion and healthy debate, potentially with opposing views between the panellists (who had been selected to represent different GLAM sectors, organisation sizes, nations, cultures, backgrounds and approaches to digital preservation).

As I was facilitating the panellists for this demanding session, I had organised an audience facilitator (someone well-established within the digital preservation community). Unfortunately, due to circumstances out of our control, this person was unable to be present (and an experienced replacement was unable to be found at short notice). This situation left my panellists open to criticism. One panellist was on the receiving end of a disproportionate amount of scrutiny from the audience. Despite attempts, as a lone facilitator, I was unable to defuse the situation. After the panel session finished, several audience members remarked that they didn’t feel comfortable participating in the discussion.

Facilitating a safe environment for both panellists and for the wider audience to debate topics they are passionate about is vitally important, yet this failed to occur in this instance. As a result, the panel were unable to summarise and present conclusions about possible ‘minimum baselines’ for each of the provocations. It’s clear in this instance that a single facilitator was not enough.

Community Responsibility

In this respect, we have failed as a community. While we may have vastly differing viewpoints, it is essential we cultivate environments where people feel safe to express their views and have them received in a professional and respected manner. The digital preservation community is growing – in both size and diversity. We are aware we need to put in place, improve or refresh our technical infrastructures. Now is also the time to look at how we handle our social infrastructure. It is my opinion that there is a place for a wide range of individuals, with a vast variety of backgrounds and skills needed in the digital preservation field.

There are people who are already working in digital preservation and who have great skills. They might not all be software developers, but they know how to project manage, speak, write, problem-solve, and are subject matter experts in a wide range of areas. The value of diversity has been proven. If we only have coders, computer scientists or individuals from any one background working in the field of digital preservation, then surely, we will fail.

Moving Forward

In the hours and days following the panel, I reached out to my communities online for pointers to Codes of Ethics, Codes of Conduct and other articles discussing challenging situations in similar industries. Borrowing from other industries and adapting to fit the context at hand has always been important to me. I don’t want to reinvent the wheel and would prefer to learn from others’ experiences. The panel ‘provocations’ presented were not contentious, yet how the discussion evolved throughout the duration of the panel somewhat echoes other events that have occurred within the tech industry.

At the time of publishing this post, neither the digital preservation community nor iPres has a Code of Conduct or Code of Ethics. There have been mentions of the lack of an iPres Code of Conduct in previous years. For iPres 2018, developing a Code of Conduct has become a priority. However, it shouldn’t have taken us this long to put in place some frameworks of this type, given we all know we must work collaboratively if we are to succeed. Back in 1997, UNESCO suggested that if Audiovisual Archiving was a profession, it would also require a Code of Ethics (Audiovisual archives: a practical reader – section 4, pages 15-17).

Codes of Conduct and Codes of Ethics are a starting point. Several examples include:

There’s a longer list of Codes of Conduct and Codes of Ethics that have been compiled over the past six months since iPres 2017. Even the Loop electronic music makers summit (an initiative of the Ableton software company) I attended last November in Berlin, had in place a thorough Code of Conduct.

Building Better Communities

Codes are not enough. This is about building better communities.

A 2016 article emerging from the tech community has a list of suggestions for facilitating the development of ‘plumbers’ (and therefore functional infrastructure) rather than ‘rock stars’, under the section titled: “How do we as a community prevent rock stars?”.

Building and maintaining infrastructure is typically not fun nor sexy – but this is what digital preservation demands. Without us working collaboratively and inclusively, we will not be able to acquire, preserve or provide access to the digital content we are the stewards of. This is because we won’t fully understand the contexts of the individuals producing the content, if we don’t have the same kind of diversity within our own field of digital preservation.

Diversity may not be easy, but neither is digital preservation. While it might not be rocket science per se, we’re accustomed to working on hard and complex things. Here are some suggestions to help us take the next step(s):

  • Organisers: encourage, model and – where necessary – enforce ‘good practice’ behaviours codes
  • Participants: recognise, appreciate and celebrate the privilege of being able to debate digital preservation as part of what we do. Allow and encourage minority, less confident and new voices to hold an equal place in our discussions
  • Everyone: recognise and work towards addressing our own unconscious biases and privileges

Like Kenney and McGovern’s Three-Legged Stool for Digital Preservation (a model our DPOC project is very much based on), where the organisational infrastructure, resources framework and technological infrastructure are of equal importance, recognising that the complexity of the digital preservation challenge is best addressed through multiple perspectives is essential. We must model and welcome the benefits of our diversity. Each of us brings something unique and every skill or bit of knowledge is valuable.

Using ePADD with Josh Schneider

Edith, Policy and Planning Fellow at Bodleian Libraries, writes about her favourite features in ePADD (an open source software for email archives) and about how the tool aligns with digital preservation workflows.


At iPres a few weeks ago I had the pleasure of attending an ePadd workshop ran by Josh Schneider from Stanford University Libraries. The workshop was for me one of the major highlights of the conference, as I have been keen to try out ePADD since first hearing about it at DPC’s Email Preservation Day. I wrote a blog about the event back in July, and have now finally taken the time to review ePADD using my own email archive.

ePADD is primarily for appraisal and delivery, rather than a digital preservation tool. However, as a potential component in ingest workflows to an institutional repository, ensuring that email content retains integrity during processing in ePADD is paramount. The creators behind ePADD are therefore thinking about how to enhance current features to make the tool fit better into digital preservation workflows. I will discuss these features later in the blog, but first I wanted to show some of the capabilities of ePADD. I can definitely recommend having a play with this tool yourself as it is very addictive!

ePADD: Appraisal module dashboard

Josh, our lovely workshop leader, recommends that new ePADD users go home and try it on their own email collections. As you know your own material fairly well it is a good way of learning about both what ePADD does well and its limits. So I decided to feed in my work emails from the past year into ePADD – and found some interesting trends about my own working patterns.

ePADD consists of four modules, although I will only be showing features from the first two in this blog:

Module 1: Appraisal (Module used by donors for annotation and sensitivity review of emails before delivering them to the archive)

Module 2: Processing (A module with some enhanced appraisal features used by archivist to find additional sensitive information which may have been missed in the first round of appraisal)

Module 3: Discovery (A module which provides users with limited key word searching for entities in the email archive)

Module 4: Delivery (This module provides more enhanced viewing of the content of the email archive – including a gallery for viewing images and other document attachments)

Note that ePADD only support MBOX files, so if you are an Outlook user like myself you will need to first convert from PST to MBOX. After you have created an MBOX file, setting up ePADD is fairly simple and quick. Once the first ePADD module (“Appraisal”) was up and running, processing my 1,500 emails and 450 attachments took around four minutes. This time includes time for natural language processing. ePADD recognises and indexes various “entities” – including persons, places and events – and presents these in a digestible way.

ePADD: Appraisal module processing MBOX file

Looking at the entities recognised by ePADD, I was able to see who I have been speaking with/about during the past year. There were some not so surprising figures that popped up (such as my DPOC colleagues James Mooney and Dave Gerrard). However, curiously I seem to also have received a lot of messages about the “black spider” this year (turns out they were emails from the Libraries’ Dungeons and Dragons group).

ePADD entity type: Person (some details removed)

An example of why you need to look deeper at the results of natural language processing was evident when I looked under the “place entities” list in ePADD:

ePADD entity type: Place

San Francisco comes highest up on the list of mentioned places in my inbox. I was initially quite surprised by this result. Looking a bit closer, all 126 emails containing a mention of San Francisco turned out to be from “Slack”.  Slack is an instant messaging service used by the DPOC team, which has its headquarters in San Francisco. All email digests from Slack contains the head office address!

Another one of my favourite things about ePADD is its ability to track frequency of messages between email accounts. Below is a graph showing correspondence between myself and Sarah Mason (outreach and training fellow on the DPOC project). The graph shows that our peak period of emailing each other was during the PASIG conference, which DPOC hosted in Oxford at the start of September this year. It is easy to imagine how this feature could be useful to academics using email archives to research correspondence between particular individuals.

ePADD displaying correspondence frequency over time between two users

The last feature I wanted to talk about is “sensitivity review” in ePADD. Although I annotate personal data I receive, I thought that the one year mark of the DPOC project would also be a good time to run a second sensitivity review of my own email archive. Using ePADD’s “lexicon hits search” I was able to sift through a number of potentially sensitive emails. See image below for categories identified which cover everything from employment to health. These were all false positives in the end, but it is a feature I believe I will make use of again.

ePADD processing module: Lexicon hits for sensitive data

So now on to the Digital Preservation bit. There are currently three risks of using ePADD in terms of preservation which stands out to me.

1) For practical reasons, MBOX is currently the only email format option supported by ePADD. If MBOX is not the preferred preservation format of an archive it may end up running multiple migrations between email formats resulting in progressive loss of data

2) There are no checksums being generated when you download content from an ePADD module in order to copy it onto the next one. This could be an  issue as emails are copied multiple times without monitoring of the integrity of the email archive files occurring

3) There is currently limited support for assigning multiple identifiers to archives in ePADD. This could potentially become an issue when trying to aggregate email archives from different intuitions. Local identifiers could in this scenario clash and other additional unique identifiers would then also be required

Note however that these concerns are already on the ePADD roadmap, so they are likely to improve or even be solved within the next year.

To watch out for ePADD updates, or just have a play with your own email archive (it is loads of fun!), check out their:

Computers are the apogee of profligacy: a response to THE most important PASIG 2017 presentations

Following the PASIG conference, Cambridge Technical Fellow Dave Gerrard couldn’t simply wait to fire off his thoughts on the global context of digital preservation and how we need to better consider the world around us to work on a global solution and not just one that suits capitalist agenda. We usually preface these blogs with “enjoy” but in this instance, please, find a quiet moment, make yourself comfortable, read on and contemplate the global issues presented passionately presented here.


I’m going to work on a more technical blog about PASIG later, but first I want to get this one off my chest. It’s about the two most important presentations: Angeline Takawira’s Digital preservation at the United Nations Mechanism for International Criminal Tribunals and Keep your eyes on the information, Patricia Sleeman’s discussion of preservation work at the UN Refugee Agency (UNHCR).

Angeline Takawira described, in a very precise and formal manner, how the current best practice in Digital Preservation is being meticulously applied to preserving information from UN war crimes tribunals in The Hague (covering the Balkan conflict) and Arusha, Tanzania (covering the Rwandan genocide). As befitted her work, it was striking how calm Angeline was; how well the facts were stuck to, despite the emotive context. Of course, this has to be the case for work underpinning legal processes: intrusion of emotion into the capture of facts could let those trying to avoid justice escape it.

And the importance of maintaining a dispassionate outlook was echoed in the title of the other talk. “Keep your eyes on the information” was what Patricia Sleeman was told when learning to work with the UNHCR, as to engage too emotionally with the refugee crisis could make vital work impossible to perform. However, Patricia provided some context, in part by playing Head Over Heels, (Emi Mahmoud’s poem about the conflict and refugee crisis in Darfur), and by describing the brave, inspirational people she had met in Syria and Kurdistan. An emotionless response was impossible: the talk resulted in the conference’s longest and loudest applause.

Indeed, I think the audience was so stunned by Patricia’s words that questions were hard to formulate. However, my colleague Somaya at least asked the $64,000 one: how can we help? I’d like to tie this question back to one that Patricia raised in her talk, namely (and I paraphrase here): how do you justify expenditure on tasks like preservation when doing so takes food from the mouths of refugees?

So, now I’m less stunned, here’s my take: feeding refugees solves a symptom of the problem. Telling their stories helps to solve the problem, by making us engage our emotions, and think about how our lives are related to theirs, and about how we behave impacts upon them. And how can we help? Sure, we can help Patricia with her data management and preservation problems. But how can we really contribute to a solution? How can we stop refugee crises occurring in the first place?

We have a responsibility to recognise the connections between our own behaviour and the circumstances refugees find themselves in, and it all comes down, of course, to resources, and the profligate waste of them in the developed world. Indeed, Angeline and Patricia’s talks illustrated the borderline absurdity of a bunch of (mostly) privileged ‘Westerners’ / ‘Northerners’ (take your pick) talking about the ‘preservation’ of anything, when we’re products of a society that’s based upon throwing everything away.

And computers / all things ‘digital’ are at the apogee of this profligacy: Natasa Milic-Frayling highlighted this when she (diplomatically) referred to the way in which the ‘innovators’ hold all the cards, currently, in the relationship with ‘content producers’, and can hence render the technologies upon which we depend obsolete across ever-shorter cycles. Though, after Patricia’s talk, I’m inclined to frame this more in terms of ‘capitalist industrialists generating unnecessary markets at the expense of consumers’; particularly given that, while we were listening to Patricia, the latest iPhone was being launched in the US.

Though, of course, it’s not really the ‘poor consumers’ who genuinely suffer due to planned obsolescence… That would be the people in Africa and the Middle East whose countries are war zones due to grabs for oil or droughts caused by global warming. As the world’s most advanced tech companies, Apple, Google, Facebook, Amazon, Microsoft et al are the biggest players in a society that – at best indirectly, at worst carelessly – causes the suffering of the people Patricia and Angeline are helping and providing justice for. And, as someone typing a blog post using a Macbook Pro that doesn’t even let me add a new battery – I’m clearly part of the problem, not the solution.

So – in answer to Somaya’s question: how can we help? Well, for a start, we can stop fetishising the iPhone and start bigging up Fairphone and Phonebloks. However, keeping the focus on Digital Preservation, we’ve got to be really careful that our efforts aren’t used to support an IT industry that’s currently profligate way beyond moral acceptability. So rather than assuming (as I did above) that all the ‘best-practice’ of digital preservation flows from the ‘developed’ (ahem) world to the ‘developing’, we ought to seek some lessons in how to preserve technology from those who have fewer opportunities to waste it.

Somaya’s already on the case with her upcoming panel at iPres on the 28th September: Then we ought to continue down the road of holding PASIG in Mexico City next year by holding one in Africa as soon as possible. As long as – when we’re there, we make sure we shut up and listen.