A portable digital preservation roadshow kit

As a part of the lead up to Digital Preservation Day, the Cambridge team held a series of roadshows with a pop-up exhibition to raise awareness of digital preservation within the wider University. They wanted to let people know that there was a team that was concentrating in this area. They also wanted to find out people’s concerns regarding the long term continuity of the digital content that they create and digital content they use. Outreach and Training Fellow, Lee, writes about what is in the pop-up kit and how it can be used at your institution to generate awareness of digital preservation.


The exhibition kit

In the lead up to the exhibition we created a portable carry kit that so that we could repeat the exhibition in various locations day after day.

To stimulate discussion as well as having an interactive experience, the first portable exhibition consisted of:

  • An A1 poster, printed on cloth for ease of carrying and to reduce wear and tear. Images attributed as correctly as possible and in line with open and creative commons requirements.
Exhibition poster

Prototype exhibition poster.

  • A roll-up display banner with an image sourced from the Cambridge Digital Library (appropriately from the Book of Apocalypse), plus a bit of their Photoshop skills to make a corrupted version. I like to describe the image as the digital equivalent of mould affecting a precious manuscript. You can still see the image but it’s not quite right and so work needs to be done to put to ‘right’.
  • A laptop with the URLs to various playable games on the Internet Archive, to make the point about emulation and how digital is different from traditional media. The games we used were:
  • A small collection of tangible technology from the past to the present. This was sourced from the Fellows’ collections of materials and included:
    • 8” floppy disk
    • 25” floppy disk
    • 5.25” floppy disk
    • 5.25” floppy disk drive
    • Compact Disc Recordable (CD-R)
    • Commercial double sided film on Digital Versatile Disk (DVD)
    • Digital Versatile Disk ReWritable (DVD-RW)
    • A Hard Disk Drive 250GB from a laptop
    • 2GB and 1GB Randow Access Memory (RAM) chips
    • USB stick with the hard cases removed to show the small PCB and memory chip
    • An SD card enclosure
    • A 2GB micro SD card
    • A micro SD card USB enclosure
    • An iPod c. 2012
    • An acetate, c. 1990, with degradation (courtesy of JISC’s Dom Fripp) to make a visual point through an analogue item about the degradation and the fragile nature of materials we are working with.

A close up of the tech on display.

As a part of future work we’d like to develop this into a more generic display kit for those who do not have the time to create such materials, but have an opportunity to run displays. When it’s up and running, this is how the display looked in the University Library’s Entrance Hall.

Roadshow display at set up in the Entrance Hall of the Cambridge University Library.

We also relied on the generous acceptance and space from the hosting venues so that we could come and visit. It was important that we toured around the site to widen the message amongst the Cambridge University community, so we visited to following venues:

  • Alison Richard Building – 16th November
  • Gordon and Betty Moore Library – 17th November
  • Department of Engineering Library – 20th November
  • University Library Entrance Hall – 21st November
  • Churchill College – 22nd November
  • Faculty of English Social Space – 23rd November

The following is a summary of some of the views captured from the Post-It notes. As it’s not part of a proper study, we removed the views that repeated each other. The most popular answer for the “what digital materials should be saved” question was ‘all’ or ‘everything’. Most thought that the Library should be responsible for the preservation of all materials and the most common challenges were money, time, and reacting to change.

Summary of Post-It note capture.

There was a lot of work put into the creation of the pop-up exhibition and it was developed carefully so that it could be used beyond the life of the DPOC project. We have created a resource that can be used a moments notice to begin the digital preservation conversation to a wider audience. We’d like to develop this kit a bit further so it can be personalised for your own outreach efforts.


Please get in touch if you would like to collaborate on this kit in the comments below or via the ‘contact us’ page.

Institutional risk and born-digital content: the shutdown of DCist #IDPD17

Another post for today’s International Digital Preservation Day 2017. Outreach and Training Fellow, Sarah, discusses just how real institutional risk is and how it can lead to a loss of born digital archives — a risk that digital-only sites like DCist have recently proven. Read more about the Gothamist’s website shutdowns this November.


In today’s world, so much of what we create and share exists only in digital form. These digital-only creations are referred to as born-digital — they were created digitally and they often continue in that way. And so much of our born-digital content is shared online. We often take for granted content on the Internet, assuming it will always be there. But is it? Likely it will at least be captured by the Internet Archive’s Wayback Machine or a library web archiving equivalent. But is that actually enough? Does it capture a complete, usable record? What happens when a digital-only creation, like a magazine or newspaper, is shut down?

Institutional risk is real. In the commercial world of born-digital content that persists only in digital form, the risk of loss is high.

Unfortunately, there’s recently been a very good example of this kind of risk when the Gothamist shut down its digital-only content sites such as the DCist. This happened in early November this year.

The sites and all the associated content was completely removed from the Internet by the morning of 3 November. Gone. Taken down and replaced with a letter from billionaire CEO, Joe Ricketts, justifying the shutdown because despite its enormous popularity and readership, it just wasn’t “economically successful.”

Wayback Machine’s capture of the redirect page and Ricketts’ letter

The DCist site and all of its content was gone completely; readers instead were redirected to another page entirely to read Joe Ricketts’ letter. Someone had literally pulled the plug on the whole thing.

Internet Archive’s 3 November 2017 capture, showing a redirect from the DCist.com page. DCist was gone from the Internet.

The access to content was completely lost, save for what the Internet Archive captured and what content was saved by creators elsewhere. But access to the archives of 13 years of DCist content was taken from the Internet and its millions of readers. At that point all we had were some web captures, incomplete records of the content left to us.

The Internet Archive’s web captures for DCist.com over the past 13 years.

What would happen to the DCist’s archive now? All over Twitter people were being sent to Internet Archive or to check Google’s cache to download the lost content. But as Benjamin Freed pointed out in his recent Washingtonian article:

“Those were noble recommendations, but would have been incomplete. The Wayback Machine requires knowledge about URLs, and versions stored in Google’s memory banks do not last long enough. And, sure, many of the subjects DCist wrote about were covered by others, but not all of them, and certainly not with the attitude with which the site approached the world.”

As Freed reminds us “A newspaper going out of business is tragic, but when it happens, we don’t torch the old issues or yank the microfilms from the local library.” In the world of born-digital content, simply unplugging the servers and leaving the digital archive to rot means that at best, we may only have an incomplete record of the 1,000s of articles and content of a community.

If large organisations are not immune to this kind of institutional risk, what about the small ones? The underfunded ones?

To be clear, I think web archiving is important and I have used it a number of times when a site is no longer available — it’s a valuable resource. But it only goes so far and sometimes the record of website is incomplete. So what else can we do? How can we keep the digital archive alive? The good news is that while Ricketts has put the DCist site back up as an “archive” — it’s more like a “digital graveyard” that he could pull the plug on again any time he wants. How do you preserve something so fragile, so at risk? The custodians of the digital content care little for it, so how will it survive for the future?

The good news is that the DCist archive may have another home, not just one that survives on the mercy of a CEO.

The born-digital archives of the DCist require more than just a functioning server over time to ensure access. Fortunately, there are places where digital preservation is happening to all kinds of born-digital collections and there are passionate people who are custodians of this content. These custodians care about keeping it accessible and understandable for future generations. Something that Joe Ricketts clearly does not.


What are your thoughts on this type of institutional risk and its impacts on digital preservation? How can we preserve this type of content in the future? Is web archiving enough or do we need a multi-prong approach? Share your thoughts below and on Twitter using the #IDPD17 hashtag.

 

International Digital Preservation Day 2017 #IDPD17

It is International Digital Preservation Day. Today, around the world we celebrate the field that is fighting against time and technology to make sure that our digital “things” survive. And in turn, we are trying to make time and technology work with us.


We’re the people that see a 5.25” floppy disk and think “I bet I can read that. I wonder what I’ll find?” and we’re already making a list of where we can find the hardware and software to read it. We’re already dating it to wonder what kind of files would be on it, what software created those files—can we still find them? We’re willing to try, because every day that disk is ageing and every day is the possibility that when we get around to reading it, the data might be corrupted.

We’re the people fighting against the inevitable technological obsolescence, juggling media carriers, file formats, technological failures, software obsolescence and hardware degradation. It is like a carefully coordinated dance, where one wrong thing can end up in some sort of error. A file can’t open, or if I can open it what am I even staring at? We’re trying to save our digital world, before it degrades and corrupts.

Sometimes it’s not always that dire, but it’s the knowledge that if something gets overlooked, at some point – often in the blink of an eye – something will be lost. Something will be damaged. It’s like playing a kind of Russian roulette, expect for those of us who are custodians of unique digital collections, we can’t take those chances. We cannot lose our digital assets, our digital “things” that we collect on behalf of the public, or for compliance reasons, or because we are keeping a record of the now for the future. After all, we have stories to tell, histories to save – what is it that we want to leave for the future?

If we don’t consider preserving our digital “things” now, then we might not leave a story behind to tell.

For some reason, while this is an issue we all struggle with (raise your hand if you’ve lost a digital file in your life or if your computer/tablet/phone has crashed and you lost everything and didn’t have a backup) digital preservation is still something people don’t know about or just don’t talk about. Why is something that we are all struggling with ignored so much? Is it because we’re not speaking up enough? Is it because people just lose their stuff and move on, forgetting about it? When so much of our lives’ records are now only digital, how can we just forget what we lose? How can we not care?

The truth is we should. And we should all be looking to digital preservation in one form or another. From individuals to big business, digital preservation matters. It’s not just for the cultural heritage and higher education institutions to “do” or to “worry” about. It involves you too.

The good news is that the world is starting to catch on. They are starting to look to us, the digital preservation practitioners, to see what they should do. They are starting to worry, starting to see the cracks in the digital world. Nothing lasts forever and sometimes in the digital world, it can be gone in a second with just a flick of a switch. Maybe it lives on somewhere, on those motionless hard drives, but without active management and commitment, even those hard drives will fail you some days. The events around the Gothamist’s shut down of its online news sites (inc. DCist and LAist) has highlighted this. The recent Slate article of streaming only services has us worried about preservation of TV and film content that is born digital and so centralised, that it cannot rely on a LOCKSS-based approach (Lots of Copies Keeps Stuff Safe).

These are of course just some of the things we need to worry about. Just some of things we’ll have to try to save. There’s still the other approximately 2.5 quintillion bytes (or roughly about 2.5 exabytes or 2.5 billion gigabytes) of data being created around the world each day to worry about. We’re not going to keep it all, but we’re going to want to keep some of it. And that some of it is rapidly increasing.

So this International Digital Preservation Day, I encourage everyone to think about their digital lives, at home and at work, and think about what you need to do to make your digital “things” last. There are a field of experts in the world, who are here to help. We are no further than a tweet away. We survive by collaborating and helping each other. And we’re here to help you save the bits.


Want to learn more?

Visit the Digital Preservation Coalition for advice, reports and further information: http://www.dpconline.org/ 

Speak to the digital preservation hive mind on Twitter using any of these hashtags: #digitalpreservation #digipres #digpres

For more International Digital Preservation Day activities, visit: http://www.dpconline.org/events/international-digital-preservation-day or check out the hashtag #IDPD17

The vision for a preservation repository

Over the last couple of months, work at Cambridge University Library has begun to look at what a potential digital preservation system will look like, considering technical infrastructure, the key stakeholders and the policies underpinning them. Technical Fellow, Dave, tells us more about the holistic vision…


This post discusses some of the work we’ve been doing to lay foundations beneath the requirements for a ‘preservation system’ here at Cambridge. In particular, we’re looking at the core vision for the system. It comes with the standard ‘work in progress’ caveats – do not be surprised if the actual vision varies slightly (or more) from what’s discussed here. A lot of the below comes from Mastering the Requirements Process by Suzanne and James Robertson.

Also – it’s important to note that what follows is based upon a holistic definition of ‘system’ – a definition that’s more about what people know and do, and less about Information Technology, bits of tin and wiring.

Why does a system change need a vision?

New systems represent changes to the existing status-quo. The vision is like the Pole Star for such a change effort – it ensures that people have something fixed to move towards when they’re buried under minute details. When confusion reigns, you can point to the vision for the system to guide you back to sanity.

Plus, as with all digital efforts, none of this is real: there’s no definite, obvious end point to the change. So the vision will help us recognise when we’ve achieved what we set out to.

Establishing scope and context

Defining what the system change isn’t is a particularly good a way of working out what it actually represents. This can be achieved by thinking about the systems around the area you’re changing and the information that’s going to flow in and out. This sort of thinking makes for good diagrams: one that shows how a preservation repository system might sit within the broader ecosystem of digitisation, research outputs / data, digital archives and digital published material is shown below.

System goals

Being able to concisely sum-up the key goals of the system is another important part of the vision. This is a lot harder than it sounds and there’s something journalistic about it – what you leave out is definitely more important than what you keep in. Fortunately, the vision is about broad brush strokes, not detail, which helps at this stage.

I found some great inspiration in Sustainable Economics for a Digital Planet, which indicated goals such as: “the system should make the value of preserving digital resources clear”, “the system should clearly support stakeholders’ incentives to preserve digital resources” and “the functional aspects of the system should map onto clearly-defined preservation roles and responsibilities”.

Who are we implementing this for?

The final main part of the ‘vision’ puzzle is the stakeholders: who is going to benefit from a preservation system? Who might not benefit directly, but really cares that one exists?

Any significant project is likely to have a LOT of these, so the Robertsons suggest breaking the list down by proximity to the system (using Ian Alexander’s Onion Model), from the core team that uses the system, through the ‘operational work area’ (i.e. those with the need to actually use it) and out to interested parties within the host organisation, and then those in the wider world beyond. An initial attempt at thinking about our stakeholders this way is shown below.

One important thing that we realised was that it’s easy to confuse ‘closeness’ with ‘importance’: there are some very important stakeholders in the ‘wider world’ (e.g. Research Councils or historians) that need to be kept in the loop.

A proposed vision for our preservation repository

After iterating through all the above a couple of times, the current working vision (subject to change!) for a digital preservation repository at Cambridge University Library is as follows:

The repository is the place where the best possible copies of digital resources are stored, kept safe, and have their usefulness maintained. Any future initiatives that need the most perfect copy of those resources will be able to retrieve them from the repository, if authorised to do so. At any given time, it will be clear how the digital resources stored in the repository are being used, how the repository meets the preservation requirements of stakeholders, and who is responsible for the various aspects of maintaining the digital resources stored there.

Hopefully this will give us a clear concept to refer back to as we delve into more detail throughout the months and years to come…