The Ethics of Working in Digital Preservation

Since joining the DPOC project in 2016, I have been espousing the need for holistic approaches to digital preservation. This has very much been about how skills development, policy, strategy, workflows and much more need to be included as part of a digital preservation offering. Digital preservation is never just about the tech. There is a concern I must raise: how we play nice together.

Since first drafting this post in October 2017, there have been several events I would be remiss not to mention. Ethics and how we conduct ourselves in professional contexts have been brought into the current social consciousness by the #metoo movement and the recent matter regarding Chris Bourg’s keynote at the Code4Lib conference.

Working Together

We know digital preservation can’t be done alone, and I believe the digital preservation community is well on the way to accepting this. One single person cannot hold all the information about every type of file, standard, operating system, disk file system, policy, carrier, hardware, peripheral, protocol, copyright, legislation as well as undertake advocacy, suitably negotiate with donors etc.

Dream Team – Library of Congress Digital Preservation Outreach and Education Training Materials

For each digital preservation activity, we need a ‘dream team’. This is a term Emma Jolley (Curator of Digital Archives, National Library of Australia) incorporated into the 2015 Library of Congress Digital Preservation Outreach & Education (DPOE) Train the Trainer education programme I took part in. This understanding of the needs of complementary skills, knowledge and approaches very much underpins the Polonsky Digital Preservation Project.

Step by Step, Hand in Hand

If I think back to my time working in digital preservation in the mid-2000s, it was a far more isolating experience than it is now. Remembering the challenges we were discussing back then, it doesn’t feel as if the field has progressed all that much. It may just be slow going. Or perhaps it’s fear of making a wrong decision?

As humans, we know we have the capacity to learn from mistakes. We’ve likely had someone tell us about the time they (temporarily or permanently) lost data. The short-term lifespan of media carriers, inter-dependencies between different components, changes to services where data may be stored ‘in the cloud’ and the limited availability of devices (hardware or software) to read and interpret the data means that digital content is fragile (for many reasons, not only technical) and is continually at risk.

There are enough lessons of data loss out there in the wider world that it is imperative we acknowledge these situations and learn from them. Nor should we have to face these kinds of stressful situations alone; it should be done step-by-step, hand-in-hand, supporting each other.

Acknowledging Failure

Over recent years, the international arts and cultural sector has begun to share examples of failures. While it is easy to share successes, it’s far harder to openly share information about failures. Failure in current western society is definitely not a desirable outcome. Yet we learn from failure. As a response to ‘ideas’ festivals and TED talks, events such as Failure Lab have been gaining momentum.

The need to share (in considered ways) about failures in digital preservation is somewhat new, however it’s not an entirely new concept. (The now infamous story of how parts of Toy Story 2 were deleted have helped illustrate the need for regularly checking backup functions.) More recently, at PASIG 2017, one of the most memorable presentations of the whole conference was Eduardo Del Valle’s Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation. I believe I speak for many of the PASIG conference attendees when I state how valuable a presentation this was.

In May 2017, the Digital Preservation Coalition ran possibly the most useful event I attended in all of 2017: Digital Preservationists Anonymous (aka Fail Club). We were able to share our war stories within the safety and security of Chatham House Rules and learn a lot from each other that will be able to take us forward in our work at our respective institutions. Hearing another organisation that is further ahead, inform us about the tricky things they’ve encountered helps us progress better, faster.

iPres 2017 and the Operational Pragmatism Panel

Yet there are other problematic issues within the field of digital preservation. It’s not always an easy field to work in; it doesn’t yet have the diversity it needs, nor necessarily respect the diversity of views already present.

Operational Pragmatism in Digital Preservation: Establishing context-aware minimum viable baselines was a panel session I facilitated at iPres 2017, held in September 2017 in Kyoto, Japan. The discussion was set out as a series of ‘provocations’ (developed collaboratively by the panellists) about different aspects digital preservation. (Future blog posts are yet to published about the topics and views presented during the panel discussion.) I had five experienced panellists representing a range of different countries they’ve worked in around the world (Canada, China, France, Kenya, the Netherlands, the UK and the USA) plus myself (originally from Australia). Another eight contributors (from Australia, Germany, New Zealand, the UK and the USA) also fed into forming the panel topic, panel member makeup or the provocations. Each panellist was allocated a couple of minutes to present their point of view in response to each provocation. Then the discussion was opened up to the wider audience. It was never going to be an easy panel. I was asking a lot of my panellists. They were each having to respond to one challenging question after another, providing a simple answer to each question (that could be used to inform decisions about what the ‘bare minimum’ work could be done for each digital preservation scenario). This was no small feat.

Rather than the traditional panel presentation, where only a series of experts get to speak, it was intended as a more inclusive discussion. The discussion was widened to include the audience in good faith, so that audience members could share openly throughout, if they wished. However, it became apparent that there were some other dynamics at play.

One Person Alone is Never Enough

Since I first commenced working in digital preservation in 2005, I have witnessed the passion and commitment to viewpoints that individuals within this field hold. I expected a lively discussion and healthy debate, potentially with opposing views between the panellists (who had been selected to represent different GLAM sectors, organisation sizes, nations, cultures, backgrounds and approaches to digital preservation).

As I was facilitating the panellists for this demanding session, I had organised an audience facilitator (someone well-established within the digital preservation community). Unfortunately, due to circumstances out of our control, this person was unable to be present (and an experienced replacement was unable to be found at short notice). This situation left my panellists open to criticism. One panellist was on the receiving end of a disproportionate amount of scrutiny from the audience. Despite attempts, as a lone facilitator, I was unable to defuse the situation. After the panel session finished, several audience members remarked that they didn’t feel comfortable participating in the discussion.

Facilitating a safe environment for both panellists and for the wider audience to debate topics they are passionate about is vitally important, yet this failed to occur in this instance. As a result, the panel were unable to summarise and present conclusions about possible ‘minimum baselines’ for each of the provocations. It’s clear in this instance that a single facilitator was not enough.

Community Responsibility

In this respect, we have failed as a community. While we may have vastly differing viewpoints, it is essential we cultivate environments where people feel safe to express their views and have them received in a professional and respected manner. The digital preservation community is growing – in both size and diversity. We are aware we need to put in place, improve or refresh our technical infrastructures. Now is also the time to look at how we handle our social infrastructure. It is my opinion that there is a place for a wide range of individuals, with a vast variety of backgrounds and skills needed in the digital preservation field.

There are people who are already working in digital preservation and who have great skills. They might not all be software developers, but they know how to project manage, speak, write, problem-solve, and are subject matter experts in a wide range of areas. The value of diversity has been proven. If we only have coders, computer scientists or individuals from any one background working in the field of digital preservation, then surely, we will fail.

Moving Forward

In the hours and days following the panel, I reached out to my communities online for pointers to Codes of Ethics, Codes of Conduct and other articles discussing challenging situations in similar industries. Borrowing from other industries and adapting to fit the context at hand has always been important to me. I don’t want to reinvent the wheel and would prefer to learn from others’ experiences. The panel ‘provocations’ presented were not contentious, yet how the discussion evolved throughout the duration of the panel somewhat echoes other events that have occurred within the tech industry.

At the time of publishing this post, neither the digital preservation community nor iPres has a Code of Conduct or Code of Ethics. There have been mentions of the lack of an iPres Code of Conduct in previous years. For iPres 2018, developing a Code of Conduct has become a priority. However, it shouldn’t have taken us this long to put in place some frameworks of this type, given we all know we must work collaboratively if we are to succeed. Back in 1997, UNESCO suggested that if Audiovisual Archiving was a profession, it would also require a Code of Ethics (Audiovisual archives: a practical reader – section 4, pages 15-17).

Codes of Conduct and Codes of Ethics are a starting point. Several examples include:

There’s a longer list of Codes of Conduct and Codes of Ethics that have been compiled over the past six months since iPres 2017. Even the Loop electronic music makers summit (an initiative of the Ableton software company) I attended last November in Berlin, had in place a thorough Code of Conduct.

Building Better Communities

Codes are not enough. This is about building better communities.

A 2016 article emerging from the tech community has a list of suggestions for facilitating the development of ‘plumbers’ (and therefore functional infrastructure) rather than ‘rock stars’, under the section titled: “How do we as a community prevent rock stars?”.

Building and maintaining infrastructure is typically not fun nor sexy – but this is what digital preservation demands. Without us working collaboratively and inclusively, we will not be able to acquire, preserve or provide access to the digital content we are the stewards of. This is because we won’t fully understand the contexts of the individuals producing the content, if we don’t have the same kind of diversity within our own field of digital preservation.

Diversity may not be easy, but neither is digital preservation. While it might not be rocket science per se, we’re accustomed to working on hard and complex things. Here are some suggestions to help us take the next step(s):

  • Organisers: encourage, model and – where necessary – enforce ‘good practice’ behaviours codes
  • Participants: recognise, appreciate and celebrate the privilege of being able to debate digital preservation as part of what we do. Allow and encourage minority, less confident and new voices to hold an equal place in our discussions
  • Everyone: recognise and work towards addressing our own unconscious biases and privileges

Like Kenney and McGovern’s Three-Legged Stool for Digital Preservation (a model our DPOC project is very much based on), where the organisational infrastructure, resources framework and technological infrastructure are of equal importance, recognising that the complexity of the digital preservation challenge is best addressed through multiple perspectives is essential. We must model and welcome the benefits of our diversity. Each of us brings something unique and every skill or bit of knowledge is valuable.

Email preservation 2: it is hard, but why?

A post from Sarah (Oxford) with input from Somaya (Cambridge) about the 24 January 2018 DPC event on email archiving from the Task Force on Technical Approaches to Email Archives.

The discussion of the day circulated around what they had learnt during the year of the task force, that personal and public stories are buried in email, considerable amounts of email have been lost over previous decades, that we should be treating email as data (it allows us to understand other datasets), that current approaches to collecting and preserving email don’t work as they’re not scalable and the need for the integration of artificial intelligence and machine learning (this is already taking place in legal professions with ‘predictive coding’ and clustering technologies) to address email archives, including natural language processing functions is important.


Back in July, Edith attended the first DPC event on email preservation, presented by the Task Force on Technical Approaches to Email Archives. She blogged about here. In January this year, Somaya and I attended the second event hosted again by the DPC.

Under the framework of five working groups, this task force has spent 12 months (2017) focused on five separate areas of the final report, which is due out in around May this year:

  • The Why: Overview / Introduction
  • The When/Who/Where: Email Lifecycles Perspectives
  • The What: The Needs of Researchers
  • The How: Technical Approaches and Solutions
  • The Path Forward: Sustainability & Community Development

The approach being taken is technical, rather than on policy. Membership of the task force includes the DPC, representatives from universities and national institutions from around the world and technology companies including Google and Microsoft.

For Chris Prom (from University of Illinois Urbana Champaign, who authored the 2011 DPC Technology Watch Report on Preserving Email) and Kate Murray’s (Library of Congress and contributor to FADGI) presentation about the work they have been doing, you can view their slides here. Until the final report is published, I have been reviewing the preliminary draft (of June 2017) and available documents to help develop my email preservation training course for Oxford staff in April.

So, when it comes to email preservation, most of the tools and discussions focus on processing email archives. Very little of the discussion has to do with the preservation of email archives over time. There’s a very good reason for this. Processing email archives is the bottleneck in the process, the point at which most institutions are still stuck at. It is hard to make decisions around preservation, when there is no means for collecting email archives or processing them in a timely manner.

There were many excellent questions and proposed solutions from the speakers at the January event. Below are some of the major points from the day that have informed my thinking of how to frame training on email preservation:

Why are email archives so hard to process?

  1. They are big. Few people cull their emails and over time they build up. Reply and ‘reply all’ functions expand out emails chains and attachments are growing in size and diversity. It takes a donor a while to prepare their email archives, much less for an institution to transfer and process them.
  2. They are full of sensitive information. Which is hard to find. Many open source technology assisted review (TAR) tools miss sensitive information. Software used for ‘predictive coding’ and machine learning for reviewing email archives are well out of budget for heritage institutions. Manual review is far too labour intensive.
  3. There is no one tool that can do it all. Email preservation requires ‘tool chaining’ in order to transfer, migrate and process email archives. There are a very wide variety of email software programs which in turn create a many different email file format types. Many of the tools used in email archive processing are not compatible with each of the different email file types; this requires a multiple file format migrations to allow for processing. For a list of some of the current available tools, see the Task Force’s list here.

What are some of the solutions?

  1. Tool chaining will continue. It appears for now, tool chaining is here to stay, often mixing proprietary with open source tools to get workflows running smoothly. This means institutions will need to invest in establishing email processing workflows: the software, people who know about how to handle different email formats etc.
  2. What about researchers? Access to emails is tightly controlled due to sensitivity restraints, but is there space to get researchers to help with the review? If they use the collection for research, could they also be responsible for flagging anything deemed as sensitive? How could this be done ethically?
  3. More automation. Better tool development to assisted with TAR. Reviewing processes must become more automated if email archives are ever to be processed. The scale of work is increasing and traditional appraisal approaches (handling one document at a time) and record schedules are no longer suitable.
  4. Focus on bit-level preservation first. Processing of email archives can come later, but preserving it needs to start on transfer. (But we know users want access and our institutions want to provide this access to email archives.)
  5. Perfection is no longer possible. While archivists would like to be precise, in ‘scaling up’ email archive processing we need to think about it as ‘big data’ and take a ‘good enough’ approach.

Digital Preservation Roadshow – Part 2

Building on the success of CUL’s digital preservation roadshow kit, the Oxford fellows have begun assembling a local version. The kit is a mixture of samples of old hardware, storage technology, quiz activities, and general “digital preservation swag”.

Pens, pins, and a BBC Micro

We were able to trial run it as part of a GLAM (Gardens, Libraries and Museums) showcase at the Weston Library this January. Among the showcase attendees’ favourite items was an early floppy disk camera (c.1998) and our BBC Micro Computer (1981).

Sony Digital Mavica (MVC-FD7) 

Technical Fellow James Mooney at the Oxford GLAM Showcase

Our floppy disk camera was among the first in the Mavica “FD” series from Sony. Sony produced 3.5” floppy disk cameras from late 1997 until 2002 (when it moved on to Mavica for CD). MVC-FD7 takes 8-bit images which can be easily transferred to a home computer. This is one of the reasons that the Mavica FD series was so popular – the FAT12 file system and wide spread adoption of 3.5″ floppy disk drives in computers made transfer a simple and quick task.

It is easy to forget that the floppy disk camera is really the grandfather of the microSD card!

 

 

BBC Micro

The BBC Micro is well known by most British people who went to school in the 1980s and ’90s – but even today some UK classrooms will feature a BBC Micro for more nostalgic reasons.  The BBC Microcomputer series was design and built by Acorn for the BBC Computer Literacy Project. Most schools in the UK adopted the system, and for many children the BBC BASIC programming language was the first one they learnt.

There is to this day a cult following of BBC Micro educational games, such as Granny’s Garden (1983).


The kit will be displayed in different Oxford libraries throughout 2018 to promote the DPOC training programme and raise awareness of Bodleian Libraries’ new digital preservation policy.


Breaking through with Library Carpentry

Thursday 11th January saw the Cambridge University Library’s annual conference take place. This year, it was entitled ‘Breakthrough the Library’, and focused on cutting-edge innovation in libraries and archives. I can honestly say that this was the first ever conference I’ve been to where every single speaker I saw (including the ten or so who gave lightning talks) were absolutely excellent.

So it’s hard to pick the one that made the most impression. Of course, an honourable mention must go to the talk about Jasper the three legged cat, but if I had to plump for the one that was most pertinent to moving Digital Preservation forward, I’d have picked “Library Carpentry: software and data skills for librarian professionals”, from Dr James Baker of the University of Sussex.

I’d heard of the term ‘Library Carpentry’ (and the initiatives it stems from – Software Carpentry and Data Carpentry) and thus had an idea what the talk was about on the way in. Their web presence explains things far better than I can, too (see https://librarycarpentry.github.io/), so I’m going to skip the exposition and make a different point…

As a full-blown, time-served nerd who’s clearly been embittered by 20 years in the IT profession (though I’m pleased to report, not as much as most of my long-term friends and colleagues!), I went into the talk with a bit of a pessimistic outlook. This was because, in my experience, there are three stages one passes through when learning IT skills:

  • Stage 1: I know nothing. This computer is a bit weird and confuses me.
  • Stage 2: I know EVERYTHING. I can make this computer sing and dance, and now I have the power to conquer the world.
  • Stage 3: … er – hang on… The computer might not have been doing exactly what I thought it was, after all… Ooops! What did I just do?

Stage 1 is just something you get through (if you want – I have nothing but respect for happy Stage 1 dwellers, though). If so inclined, all it really takes is a bit of persistence and a dollop of enthusiasm to get through it. If you want to but think you might struggle, then have a go at this computer programming aptitude test from the University of Kent – you may be pleasantly surprised… In my own case, I got stuck there for quite a while until one day a whole pile of O Level algebra that was lurking in my brain suddenly rose out of the murk, and that was that.

Stage 2 people, on the other hand, tend to be really dangerous… I have personally worked with quite a few well-paid developers who are stuck in Stage 2, and they tend to be the ones who drop all the bombs on your system. So the faster you can get through to Stage 3, the better. This was at the root of my concern, as one of the ideas of Library Carpentry is to pick up skills quick, and then pass them on. But I needn’t have worried because…

When I asked Dr Baker about this issue, he reassured me that ‘questioning whether the computer has done what you expected’ is a core learning point that is central to Library Carpentry, too. He also declared the following (which I’m going to steal): “I make a point of only ever working with people with Impostor Syndrome”.

Hence it really does look as if getting to Stage 3 without even going through Stage 2 at all is what Library Carpentry is all about. I believe moves are afoot to get some of this good stuff going at Cambridge… I watch with interest and might even be able to find the time to join in..? I bet it’ll be fun.

Towards a common understanding?

Cambridge Outreach and Training Fellow, Lee, describes the rationale behind trialling a recent workshop on archival science for developers, as well as reflecting on the workshop itself. Its aim was to get those all those working in digital preservation within the organisation to have a better understanding of each other’s work to improve co-operation for a sustainable digital preservation effort.


Quite often, there is a perceived language barrier due to the wide range of practitioners that work in digital preservation. We may be using the same words, but there’s not always a shared common understanding of what they mean. This became clear when I was sitting next to my colleague, a systems integration manager, at an Archivematica workshop in September. Whilst not a member of the core Cambridge DPOC team, our colleague is a key member of our extended digital preservation network at Cambridge University Library a is a key member for development for understanding and retaining digital preservation knowledge in the institution.

For those from a recordkeeping background, the design principles behind the front end of Archivematica should be obvious, as it incorporates both traditional principles of archival practice and features of the OAIS model. However, coming from a systems integration point of view, there was a need to have to translate for my colleague words such as ‘accession’, ‘appraisal’ and ‘arrangement’, which many of us with archival education take their meanings for granted.

I asked my colleague if an introductory workshop on archival science would be useful, and she said, “yes, please!” Thus, the workshop was born. Last week, a two and a half hour workshop was trialled for members of our developer and systems integration colleagues. The aim of the workshop was to enable them to understand what archivists are taught on postgraduate courses and how this teaching informs their practice. After understanding the attendees’ impressions of an archivist and the things that they do (see image) the workshop then practically explored how an archivist would acquire and describe a collection. The workshop was based on an imaginary company, complete with a history and description of the business units and examples of potential records they would deposit. There were practical exercises on making an accession record, appraising a collection, artificial arrangement and subsequent description through ISAD(G).

Sticky notes about archivists

Sticky notes about archivists from a developer point of view.

Having then seen how an archivist would approach a collection, the workshop moved into explaining physical storage and preservation before moving onto digital preservation, specifically looking at OAIS and then examples of digital preservation software systems. One exercise was to get the attendees to use what they had learned in the workshop to see where archival ideas mapped onto the systems.

The workshop tried to demonstrate how archivists have approached digital preservation armed with the professional skills and knowledge that they have. The idea was to inform to teams working with archivists and the digital preservation of how archivists think and how and why some of the tools and products are design in the way that they are. My hope was for ‘IT’ to understand the depth of knowledge that archivists have in order to help everyone work together on a collaborative digital preservation solution.

Feedback was positive and it will be run again in the New Year. Similarly, I’m hoping to devise a course from a developer perspective that will help archivists communicate more effectively with developers. Ultimately, both will be working from a better level of understanding each other’s professional skill sets. Co-operation and collaboration on digital preservation projects will become much easier across disciplines and we’ll have a better informed (and relaxed) environment to share practices and thoughts.

Advocating for digital preservation

Bodleian Libraries and Cambridge University Library are entering into the last phase of the DPOC project, where they are starting to write up business cases for digital preservation. In preparation, the Fellows attended DPC’s “advocacy briefing day” in London.  Policy and Planning Fellow, Edith, blogs about some of the highlights and lessons from the day.


This week I had the pleasure of attending DPC’s advocacy training day. It was ran by Catherine Heaney, the founder of DHR Communications, and a veteran when it comes to advocating for supporting digital heritage. Before the event I thought I had a clear idea of what advocacy means in broad terms. You invite yourself into formal meetings and try to deliver measured facts and figures which will be compelling to the people in front of you – right?

Well… not quite it turns out. Many of these assumptions were put on their head during this session. Here are my four favourite pieces of (sometimes surprising) advocacy advice from Catherine.

Tip 1: Advocacy requires tenaciousness

The scenario which was described above is what communications professionals might call “the speech” – but it is only one little part of effective advocacy. “The digital preservation speech” is important, but it is not necessarily where you will get the most buy-in for digital preservation. Research has shown that one-off communications like these are usually not effective.

In fact, all of those informal connections and conversations you have with colleagues also come under advocacy and may reap greater benefits due to their frequency. And if one of these colleagues are themselves talented at influencing others, they can be invaluable in advocating for digital preservation when you are not there in person.

Lesson learnt: you need to keep communicating the message whenever and wherever you can if you want it to seep in to peoples’ consciousness. Since digital preservation issues do not crop up that often in popular culture and the news, it is up to us to deliver, re-deliver… and then re-deliver the message if we want it to stick.

Tip 2: Do your background research

When you know that you will be interacting with colleagues and senior management, it is important to do your background research and find out what argument will most appeal to the person you are meeting. Having a bog-standard ‘speech’ about digital preservation which you pull out at all occasions is not the most effective approach. In order to make your case, the problem you are attempting to solve should also reflect the goals and the challenges which the person you are trying to advocate to are facing.

The aspects which appeal about digital preservation will be different depending on the role, concerns and responsibilities of the person you are advocating to. Are they concerned with:

  • Legal or reputational risk?
  • Financial costs and return on investment?
  • About being seen as someone at the forefront of the digital preservation fields?
  • Creating reproducible research?
  • Collecting unique collections?
  • Or perhaps about the opportunity to collaborate cross-institutionally?

Tip 3: Ensure that you have material for a “stump speech” ready

Tailoring your message to the audience is important, and this will be easier if you have material ready at hand which you can pick and choose from. Catherine suggested preparing a folder of stories, case studies, data and facts about digital preservation which you can cut and paste from to suit the occasion.

What is interesting though is the order of that list of “things to collect”:

  1. Stories
  2. Case studies
  3. Data and facts

The ranking is intentional. We tend to think that statistics and raw data will convince people, as this appeals to their logic. In fact, your argument will be stronger if your pitch starts with a narrative (a story) about WHY we need digital preservation and case studies to illustrate your point.  Catherine advises that it is then when the audience is listening that you bring out the data and facts. This approach is both more memorable and more effective in capturing your audience’s attention.

Tip 4: Personalise your follow up

This connects to tip 2 – about knowing your audience. Catherine advised that, although it may feel strange at first, writing a personalised follow up message is a very effective tool. When you do have the chance to present your case to an important group within your organisation, the follow up message can further solidify that initial pitch (again – see tip 1 about repeated communication).

By taking notes about the concerns or points that have been made during a meeting, you have the opportunity to write personalised messages which captures and refers back to the concerns raised by that particular person. The personalised message also has the additional benefit of opening up a channel for future communication.


This was just a small subsection of all the interesting things we talked about on the advocacy briefing day. For some more information have a look at the hashtag for the day #DPAdvocacy.

A portable digital preservation roadshow kit

As a part of the lead up to Digital Preservation Day, the Cambridge team held a series of roadshows with a pop-up exhibition to raise awareness of digital preservation within the wider University. They wanted to let people know that there was a team that was concentrating in this area. They also wanted to find out people’s concerns regarding the long term continuity of the digital content that they create and digital content they use. Outreach and Training Fellow, Lee, writes about what is in the pop-up kit and how it can be used at your institution to generate awareness of digital preservation.


The exhibition kit

In the lead up to the exhibition we created a portable carry kit that so that we could repeat the exhibition in various locations day after day.

To stimulate discussion as well as having an interactive experience, the first portable exhibition consisted of:

  • An A1 poster, printed on cloth for ease of carrying and to reduce wear and tear. Images attributed as correctly as possible and in line with open and creative commons requirements.
Exhibition poster

Prototype exhibition poster.

  • A roll-up display banner with an image sourced from the Cambridge Digital Library (appropriately from the Book of Apocalypse), plus a bit of their Photoshop skills to make a corrupted version. I like to describe the image as the digital equivalent of mould affecting a precious manuscript. You can still see the image but it’s not quite right and so work needs to be done to put to ‘right’.
  • A laptop with the URLs to various playable games on the Internet Archive, to make the point about emulation and how digital is different from traditional media. The games we used were:
  • A small collection of tangible technology from the past to the present. This was sourced from the Fellows’ collections of materials and included:
    • 8” floppy disk
    • 25” floppy disk
    • 5.25” floppy disk
    • 5.25” floppy disk drive
    • Compact Disc Recordable (CD-R)
    • Commercial double sided film on Digital Versatile Disk (DVD)
    • Digital Versatile Disk ReWritable (DVD-RW)
    • A Hard Disk Drive 250GB from a laptop
    • 2GB and 1GB Randow Access Memory (RAM) chips
    • USB stick with the hard cases removed to show the small PCB and memory chip
    • An SD card enclosure
    • A 2GB micro SD card
    • A micro SD card USB enclosure
    • An iPod c. 2012
    • An acetate, c. 1990, with degradation (courtesy of JISC’s Dom Fripp) to make a visual point through an analogue item about the degradation and the fragile nature of materials we are working with.

A close up of the tech on display.

As a part of future work we’d like to develop this into a more generic display kit for those who do not have the time to create such materials, but have an opportunity to run displays. When it’s up and running, this is how the display looked in the University Library’s Entrance Hall.

Roadshow display at set up in the Entrance Hall of the Cambridge University Library.

We also relied on the generous acceptance and space from the hosting venues so that we could come and visit. It was important that we toured around the site to widen the message amongst the Cambridge University community, so we visited to following venues:

  • Alison Richard Building – 16th November
  • Gordon and Betty Moore Library – 17th November
  • Department of Engineering Library – 20th November
  • University Library Entrance Hall – 21st November
  • Churchill College – 22nd November
  • Faculty of English Social Space – 23rd November

The following is a summary of some of the views captured from the Post-It notes. As it’s not part of a proper study, we removed the views that repeated each other. The most popular answer for the “what digital materials should be saved” question was ‘all’ or ‘everything’. Most thought that the Library should be responsible for the preservation of all materials and the most common challenges were money, time, and reacting to change.

Summary of Post-It note capture.

There was a lot of work put into the creation of the pop-up exhibition and it was developed carefully so that it could be used beyond the life of the DPOC project. We have created a resource that can be used a moments notice to begin the digital preservation conversation to a wider audience. We’d like to develop this kit a bit further so it can be personalised for your own outreach efforts.


Please get in touch if you would like to collaborate on this kit in the comments below or via the ‘contact us’ page.

Institutional risk and born-digital content: the shutdown of DCist #IDPD17

Another post for today’s International Digital Preservation Day 2017. Outreach and Training Fellow, Sarah, discusses just how real institutional risk is and how it can lead to a loss of born digital archives — a risk that digital-only sites like DCist have recently proven. Read more about the Gothamist’s website shutdowns this November.


In today’s world, so much of what we create and share exists only in digital form. These digital-only creations are referred to as born-digital — they were created digitally and they often continue in that way. And so much of our born-digital content is shared online. We often take for granted content on the Internet, assuming it will always be there. But is it? Likely it will at least be captured by the Internet Archive’s Wayback Machine or a library web archiving equivalent. But is that actually enough? Does it capture a complete, usable record? What happens when a digital-only creation, like a magazine or newspaper, is shut down?

Institutional risk is real. In the commercial world of born-digital content that persists only in digital form, the risk of loss is high.

Unfortunately, there’s recently been a very good example of this kind of risk when the Gothamist shut down its digital-only content sites such as the DCist. This happened in early November this year.

The sites and all the associated content was completely removed from the Internet by the morning of 3 November. Gone. Taken down and replaced with a letter from billionaire CEO, Joe Ricketts, justifying the shutdown because despite its enormous popularity and readership, it just wasn’t “economically successful.”

Wayback Machine’s capture of the redirect page and Ricketts’ letter

The DCist site and all of its content was gone completely; readers instead were redirected to another page entirely to read Joe Ricketts’ letter. Someone had literally pulled the plug on the whole thing.

Internet Archive’s 3 November 2017 capture, showing a redirect from the DCist.com page. DCist was gone from the Internet.

The access to content was completely lost, save for what the Internet Archive captured and what content was saved by creators elsewhere. But access to the archives of 13 years of DCist content was taken from the Internet and its millions of readers. At that point all we had were some web captures, incomplete records of the content left to us.

The Internet Archive’s web captures for DCist.com over the past 13 years.

What would happen to the DCist’s archive now? All over Twitter people were being sent to Internet Archive or to check Google’s cache to download the lost content. But as Benjamin Freed pointed out in his recent Washingtonian article:

“Those were noble recommendations, but would have been incomplete. The Wayback Machine requires knowledge about URLs, and versions stored in Google’s memory banks do not last long enough. And, sure, many of the subjects DCist wrote about were covered by others, but not all of them, and certainly not with the attitude with which the site approached the world.”

As Freed reminds us “A newspaper going out of business is tragic, but when it happens, we don’t torch the old issues or yank the microfilms from the local library.” In the world of born-digital content, simply unplugging the servers and leaving the digital archive to rot means that at best, we may only have an incomplete record of the 1,000s of articles and content of a community.

If large organisations are not immune to this kind of institutional risk, what about the small ones? The underfunded ones?

To be clear, I think web archiving is important and I have used it a number of times when a site is no longer available — it’s a valuable resource. But it only goes so far and sometimes the record of website is incomplete. So what else can we do? How can we keep the digital archive alive? The good news is that while Ricketts has put the DCist site back up as an “archive” — it’s more like a “digital graveyard” that he could pull the plug on again any time he wants. How do you preserve something so fragile, so at risk? The custodians of the digital content care little for it, so how will it survive for the future?

The good news is that the DCist archive may have another home, not just one that survives on the mercy of a CEO.

The born-digital archives of the DCist require more than just a functioning server over time to ensure access. Fortunately, there are places where digital preservation is happening to all kinds of born-digital collections and there are passionate people who are custodians of this content. These custodians care about keeping it accessible and understandable for future generations. Something that Joe Ricketts clearly does not.


What are your thoughts on this type of institutional risk and its impacts on digital preservation? How can we preserve this type of content in the future? Is web archiving enough or do we need a multi-prong approach? Share your thoughts below and on Twitter using the #IDPD17 hashtag.

 

International Digital Preservation Day 2017 #IDPD17

It is International Digital Preservation Day. Today, around the world we celebrate the field that is fighting against time and technology to make sure that our digital “things” survive. And in turn, we are trying to make time and technology work with us.


We’re the people that see a 5.25” floppy disk and think “I bet I can read that. I wonder what I’ll find?” and we’re already making a list of where we can find the hardware and software to read it. We’re already dating it to wonder what kind of files would be on it, what software created those files—can we still find them? We’re willing to try, because every day that disk is ageing and every day is the possibility that when we get around to reading it, the data might be corrupted.

We’re the people fighting against the inevitable technological obsolescence, juggling media carriers, file formats, technological failures, software obsolescence and hardware degradation. It is like a carefully coordinated dance, where one wrong thing can end up in some sort of error. A file can’t open, or if I can open it what am I even staring at? We’re trying to save our digital world, before it degrades and corrupts.

Sometimes it’s not always that dire, but it’s the knowledge that if something gets overlooked, at some point – often in the blink of an eye – something will be lost. Something will be damaged. It’s like playing a kind of Russian roulette, expect for those of us who are custodians of unique digital collections, we can’t take those chances. We cannot lose our digital assets, our digital “things” that we collect on behalf of the public, or for compliance reasons, or because we are keeping a record of the now for the future. After all, we have stories to tell, histories to save – what is it that we want to leave for the future?

If we don’t consider preserving our digital “things” now, then we might not leave a story behind to tell.

For some reason, while this is an issue we all struggle with (raise your hand if you’ve lost a digital file in your life or if your computer/tablet/phone has crashed and you lost everything and didn’t have a backup) digital preservation is still something people don’t know about or just don’t talk about. Why is something that we are all struggling with ignored so much? Is it because we’re not speaking up enough? Is it because people just lose their stuff and move on, forgetting about it? When so much of our lives’ records are now only digital, how can we just forget what we lose? How can we not care?

The truth is we should. And we should all be looking to digital preservation in one form or another. From individuals to big business, digital preservation matters. It’s not just for the cultural heritage and higher education institutions to “do” or to “worry” about. It involves you too.

The good news is that the world is starting to catch on. They are starting to look to us, the digital preservation practitioners, to see what they should do. They are starting to worry, starting to see the cracks in the digital world. Nothing lasts forever and sometimes in the digital world, it can be gone in a second with just a flick of a switch. Maybe it lives on somewhere, on those motionless hard drives, but without active management and commitment, even those hard drives will fail you some days. The events around the Gothamist’s shut down of its online news sites (inc. DCist and LAist) has highlighted this. The recent Slate article of streaming only services has us worried about preservation of TV and film content that is born digital and so centralised, that it cannot rely on a LOCKSS-based approach (Lots of Copies Keeps Stuff Safe).

These are of course just some of the things we need to worry about. Just some of things we’ll have to try to save. There’s still the other approximately 2.5 quintillion bytes (or roughly about 2.5 exabytes or 2.5 billion gigabytes) of data being created around the world each day to worry about. We’re not going to keep it all, but we’re going to want to keep some of it. And that some of it is rapidly increasing.

So this International Digital Preservation Day, I encourage everyone to think about their digital lives, at home and at work, and think about what you need to do to make your digital “things” last. There are a field of experts in the world, who are here to help. We are no further than a tweet away. We survive by collaborating and helping each other. And we’re here to help you save the bits.


Want to learn more?

Visit the Digital Preservation Coalition for advice, reports and further information: http://www.dpconline.org/ 

Speak to the digital preservation hive mind on Twitter using any of these hashtags: #digitalpreservation #digipres #digpres

For more International Digital Preservation Day activities, visit: http://www.dpconline.org/events/international-digital-preservation-day or check out the hashtag #IDPD17

The vision for a preservation repository

Over the last couple of months, work at Cambridge University Library has begun to look at what a potential digital preservation system will look like, considering technical infrastructure, the key stakeholders and the policies underpinning them. Technical Fellow, Dave, tells us more about the holistic vision…


This post discusses some of the work we’ve been doing to lay foundations beneath the requirements for a ‘preservation system’ here at Cambridge. In particular, we’re looking at the core vision for the system. It comes with the standard ‘work in progress’ caveats – do not be surprised if the actual vision varies slightly (or more) from what’s discussed here. A lot of the below comes from Mastering the Requirements Process by Suzanne and James Robertson.

Also – it’s important to note that what follows is based upon a holistic definition of ‘system’ – a definition that’s more about what people know and do, and less about Information Technology, bits of tin and wiring.

Why does a system change need a vision?

New systems represent changes to the existing status-quo. The vision is like the Pole Star for such a change effort – it ensures that people have something fixed to move towards when they’re buried under minute details. When confusion reigns, you can point to the vision for the system to guide you back to sanity.

Plus, as with all digital efforts, none of this is real: there’s no definite, obvious end point to the change. So the vision will help us recognise when we’ve achieved what we set out to.

Establishing scope and context

Defining what the system change isn’t is a particularly good a way of working out what it actually represents. This can be achieved by thinking about the systems around the area you’re changing and the information that’s going to flow in and out. This sort of thinking makes for good diagrams: one that shows how a preservation repository system might sit within the broader ecosystem of digitisation, research outputs / data, digital archives and digital published material is shown below.

System goals

Being able to concisely sum-up the key goals of the system is another important part of the vision. This is a lot harder than it sounds and there’s something journalistic about it – what you leave out is definitely more important than what you keep in. Fortunately, the vision is about broad brush strokes, not detail, which helps at this stage.

I found some great inspiration in Sustainable Economics for a Digital Planet, which indicated goals such as: “the system should make the value of preserving digital resources clear”, “the system should clearly support stakeholders’ incentives to preserve digital resources” and “the functional aspects of the system should map onto clearly-defined preservation roles and responsibilities”.

Who are we implementing this for?

The final main part of the ‘vision’ puzzle is the stakeholders: who is going to benefit from a preservation system? Who might not benefit directly, but really cares that one exists?

Any significant project is likely to have a LOT of these, so the Robertsons suggest breaking the list down by proximity to the system (using Ian Alexander’s Onion Model), from the core team that uses the system, through the ‘operational work area’ (i.e. those with the need to actually use it) and out to interested parties within the host organisation, and then those in the wider world beyond. An initial attempt at thinking about our stakeholders this way is shown below.

One important thing that we realised was that it’s easy to confuse ‘closeness’ with ‘importance’: there are some very important stakeholders in the ‘wider world’ (e.g. Research Councils or historians) that need to be kept in the loop.

A proposed vision for our preservation repository

After iterating through all the above a couple of times, the current working vision (subject to change!) for a digital preservation repository at Cambridge University Library is as follows:

The repository is the place where the best possible copies of digital resources are stored, kept safe, and have their usefulness maintained. Any future initiatives that need the most perfect copy of those resources will be able to retrieve them from the repository, if authorised to do so. At any given time, it will be clear how the digital resources stored in the repository are being used, how the repository meets the preservation requirements of stakeholders, and who is responsible for the various aspects of maintaining the digital resources stored there.

Hopefully this will give us a clear concept to refer back to as we delve into more detail throughout the months and years to come…