Gathering the numbers: a maturity and resourcing survey for digital preservation

The ability to compare ourselves to peer institutions is key when arguing the case for digital preservation within our own organisations. However, finding up-to-date and correct information is not always straight forward.

The Digital Preservation at Oxford and Cambridge (DPOC) project has joined forces with the Digital Preservation Coalition (DPC) to gather some of the basic numbers that can assist staff in seeking to build a business case for digital preservation in their local institution.

We need your input to make this happen!

The DPOC and the DPC have developed a survey aimed at gathering basic data about maturity levels, staff resources, and the policy and strategy landscapes of institutions currently doing or considering digital preservation activities. (The survey intentionally does not include questions about the type or size of the data organisations are required to preserve.)

Completing the survey will only take 10-20 minutes of your time, and will help us better understand the current digital preservation landscape. The survey can be taken at: https://cambridge.eu.qualtrics.com/jfe/form/SV_brWr12R8hMwfIOh

Deadline for survey responses is: Thursday 31 May 2018.

For those wanting to know upfront what questions are asked in the survey – here is the full set of Survey Questions (PDF). Please keep in mind the survey is interactive and you may not see all of the questions when filling this in online (as the questions only appear in relation to your previous responses). Responses must be submitted through the online survey.

Anonymised data gathered as part of this maturity and resourcing survey will be made available via this DPOC website.

For any questions about the survey and its content, please contact: digitalpreservation@lib.cam.ac.uk

Institutional risk and born-digital content: the shutdown of DCist #IDPD17

Another post for today’s International Digital Preservation Day 2017. Outreach and Training Fellow, Sarah, discusses just how real institutional risk is and how it can lead to a loss of born digital archives — a risk that digital-only sites like DCist have recently proven. Read more about the Gothamist’s website shutdowns this November.


In today’s world, so much of what we create and share exists only in digital form. These digital-only creations are referred to as born-digital — they were created digitally and they often continue in that way. And so much of our born-digital content is shared online. We often take for granted content on the Internet, assuming it will always be there. But is it? Likely it will at least be captured by the Internet Archive’s Wayback Machine or a library web archiving equivalent. But is that actually enough? Does it capture a complete, usable record? What happens when a digital-only creation, like a magazine or newspaper, is shut down?

Institutional risk is real. In the commercial world of born-digital content that persists only in digital form, the risk of loss is high.

Unfortunately, there’s recently been a very good example of this kind of risk when the Gothamist shut down its digital-only content sites such as the DCist. This happened in early November this year.

The sites and all the associated content was completely removed from the Internet by the morning of 3 November. Gone. Taken down and replaced with a letter from billionaire CEO, Joe Ricketts, justifying the shutdown because despite its enormous popularity and readership, it just wasn’t “economically successful.”

Wayback Machine’s capture of the redirect page and Ricketts’ letter

The DCist site and all of its content was gone completely; readers instead were redirected to another page entirely to read Joe Ricketts’ letter. Someone had literally pulled the plug on the whole thing.

Internet Archive’s 3 November 2017 capture, showing a redirect from the DCist.com page. DCist was gone from the Internet.

The access to content was completely lost, save for what the Internet Archive captured and what content was saved by creators elsewhere. But access to the archives of 13 years of DCist content was taken from the Internet and its millions of readers. At that point all we had were some web captures, incomplete records of the content left to us.

The Internet Archive’s web captures for DCist.com over the past 13 years.

What would happen to the DCist’s archive now? All over Twitter people were being sent to Internet Archive or to check Google’s cache to download the lost content. But as Benjamin Freed pointed out in his recent Washingtonian article:

“Those were noble recommendations, but would have been incomplete. The Wayback Machine requires knowledge about URLs, and versions stored in Google’s memory banks do not last long enough. And, sure, many of the subjects DCist wrote about were covered by others, but not all of them, and certainly not with the attitude with which the site approached the world.”

As Freed reminds us “A newspaper going out of business is tragic, but when it happens, we don’t torch the old issues or yank the microfilms from the local library.” In the world of born-digital content, simply unplugging the servers and leaving the digital archive to rot means that at best, we may only have an incomplete record of the 1,000s of articles and content of a community.

If large organisations are not immune to this kind of institutional risk, what about the small ones? The underfunded ones?

To be clear, I think web archiving is important and I have used it a number of times when a site is no longer available — it’s a valuable resource. But it only goes so far and sometimes the record of website is incomplete. So what else can we do? How can we keep the digital archive alive? The good news is that while Ricketts has put the DCist site back up as an “archive” — it’s more like a “digital graveyard” that he could pull the plug on again any time he wants. How do you preserve something so fragile, so at risk? The custodians of the digital content care little for it, so how will it survive for the future?

The good news is that the DCist archive may have another home, not just one that survives on the mercy of a CEO.

The born-digital archives of the DCist require more than just a functioning server over time to ensure access. Fortunately, there are places where digital preservation is happening to all kinds of born-digital collections and there are passionate people who are custodians of this content. These custodians care about keeping it accessible and understandable for future generations. Something that Joe Ricketts clearly does not.


What are your thoughts on this type of institutional risk and its impacts on digital preservation? How can we preserve this type of content in the future? Is web archiving enough or do we need a multi-prong approach? Share your thoughts below and on Twitter using the #IDPD17 hashtag.

 

DPASSH: Getting close to producers, consumers and digital preservation

Sarah shares her thoughts after attending the DPASSH (Digital Preservation in the Arts, Social Sciences and Humanities) Conference at the University of Sussex (14 – 15 June).


DPASSH is a conference that the Digital Repository Ireland (DRI) puts on with a host organisation. This year, it was hosted by the Sussex Humanities Lab at the University of Sussex, Brighton. What is exciting about this digital preservation conference is that it brings together creators (producers) and users (consumers) with digital preservation experts. Most digital preservation conferences end up being a bit of an echo chamber, full of practitioners and vendors only. But what about the creators and the users? What knowledge can we share? What can we learn?

DPASSH is a small conference, but it was an opportunity to see what researchers are creating and how they are engaging with digital collections. For example in Stefania Forlini’s talk, she discussed the perils of a content-centric digitisation process where unique print artefacts are all treated the same; the process flattens everything into identical objects though they are very different. What about the materials and the physicality of the object? It has stories to tell as well.

To Forlini, books span several domains of sensory experience and our digitised collections should reflect that. With the Gibson Project, Forlini and project researchers are trying to find ways to bring some of those experiences back through the Speculative W@nderverse. They are currently experimenting with embossing different kinds of paper with a code that can be read by a computer. The computer can then bring up the science fiction pamphlets that are made of that specific material. Then a user can feel the physicality of the digitised item and then explore the text, themes and relationships to other items in the collection using generous interfaces. This combines a physical sensory experience with a digital experience.

For creators, the decision of what research to capture and preserve is sometimes difficult; often they lack the tools to capture the information. Other times, creators do not have the skills to perform proper archival selection. Athanasios Velios offered a tool solution for digital artists called Artivity. Artivity can capture the actions performed on a digital artwork in certain programs, like Photoshop or Illustrator. This allows the artist to record their creative process and gives future researchers the opportunity to study the creative process. Steph Taylor from CoSector suggested in her talk that creators are archivists now, because they are constantly appraising their digital collections and making selection decisions.  It is important that archivists and digital preservation practitioners empower creators to make good decisions around what should be kept for the long-term.

As a bonus to the conference, I was awarded with the ‘Best Tweet’ award by the DPC and DPASSH. It was a nice way to round out two good, informative days. I plan to purchase many books with my gift voucher!

I certainly hope they hold the conference next year, as I think it is important for researchers in the humanities, arts and social sciences to engage with digital preservation experts, archivists and librarians. There is a lot to learn from each other. How often do we get our creators and users in one room with us digital preservation nerds?

(Mis)Adventures in guest blogging

Sarah shares her recent DPC guest blogging experience. The post is available to read at: http://www.dpconline.org/blog/beware-of-the-leopard-oxford-s-adventures-in-the-bottom-drawer 


As members of the Digital Preservation Coalition (DPC), we have the opportunity to contribute to their blog on issues in digital preservation. As the Outreach & Training Fellow at Oxford, that tasks falls upon me when its our turn to contribute.

You would think that because I contribute to this blog regularly,  I’d be an old hat at blogging. It turns out that writer’s block can hit at precisely the worst possible time. But, I forced out what I could and then turned to the other Fellows at Oxford for support. Edith and James both added their own work to the post.

With a final draft ready, the day approached when we could submit it to the blog. Even the technically-minded struggled with technology now and again. First, it was the challenge of uploading images—it only took about 2 or 3 tries and then I deleted the evidence mistakes. Finally, I clicked ‘submit’ and waited for confirmation.

And I waited…

And got sent back to the homepage. Then I got a ‘failure notice’ email that said “I’m afraid I wasn’t able to deliver your message to the following addresses. This is a permanent error; I’ve given up. Sorry it didn’t work out.” What just happened? Did it work or not?

So I tried again….

And again…

And again.  I think I submitted 6 more times before I emailed to the DPC to ask what I had done wrong. I had done NOTHING wrong, except press ‘submit’ too much. There were as many copies waiting for approval as there were times when I had hit ‘submit’. There was no way to delete the evidence, so I couldn’t avoid that embarrassment.

Minus those technological snafus, everything worked and the DPOC team’s first guest blog post is live! You can read the post here for an Oxford DPOC project update.

Now that I’ve got my technological mistakes out of the way, I think I’m ready to continue contributing to the wider digital preservation community through guest blogging. We are a growing (but still relatively small) community and sharing our knowledge, ideas and experiences freely through blogs is important. We rely on each other to navigate the field where things can be complex and ever-changing. Journals and project websites date quickly, but community-driven and non-profit blogs remain a good source of relevant and immediate information. They are valuable part of my digital preservation work and I am happy to be giving back.

 

A view from the basement – a visit the DPC Glasgow

Last Monday, Sarah, Edith and Lee visited the Digital Preservation Coalition (DPC) at their DPC Glasgow Office on University Gardens. The aim of the visit was to understand how the DPC has and will lend support to the DPOC project. The DPOC team is very fortunate in having the DPC’s expertise, resources and services at their disposal as a supporting partner in the project and we were keen to find out more.

Plied with tea, coffee and Sharon McMeekin’s awesome lemon cake, William Kilbride gave us an overview of the DPC, explaining that that they are not-for-profit membership based organisation who used to mainly cater for the UK and Ireland. However, international agencies are now welcome (UN, NATO, ICC to name a few) and this has changed the nature of their program and the features that they offer (website, streaming, event recording). They are vendor neutral but do have a ‘Commercial Supporter’ community to help support events and raise funds for digital preservation work. They have six members of staff working from the DPC Glasgow and DPC York offices. They focus upon four main areas of:

  • Workforce Development, Training and Skills
  • Communication and Advocacy
  • Research and Practice
  • Partnerships and Sustainability

William explained the last three areas and Sharon gave us an overview of the work that she does for developing workforce skills and offering training events, especially the ‘Getting Started in Digital Preservation’ and ‘Making Progress’ workshops. The DPC also provide Leadership Scholarships to help develop knowledge and CPD in digital preservation, so please do apply for those if you are working somewhere that can spare your time out of the office but can’t fund you.

In terms of helping DPOC, the DPC can help with hosting events (such as PASIG 2017) and provide supporting training resources for our organisations. They can also help with procurement processes, auditing as well as calling on the wealth of advice gained from their six members of staff.

We left feeling that, despite working as a collaborative team with colleagues we can already bounce ideas off, we had a wider support network that we could call on, guide us and help us share our work more widely. From a skills and training perspective, the idea that they are happy to review, comment and suggest further avenues for the skills needs analysis toolkit to ensure it will benefit of the wider community is of tremendous use. Yet this is one such example, and help with procurement, policy development and auditing is also something they are willing to help the project with.

It is reassuring that the DPC are there and have plenty of experience to share in the digital preservation sphere. Tapping into networks, sharing knowledge and collaborating really is the best way to help achieve a coherent, sustainable approach to digital preservation and helps those working in it to focus on specific tasks rather than try and ‘reinvent the wheel’ when somebody else has already spent time on it.