The vision for a preservation repository

Over the last couple of months, work at Cambridge University Library has begun to look at what a potential digital preservation system will look like, considering technical infrastructure, the key stakeholders and the policies underpinning them. Technical Fellow, Dave, tells us more about the holistic vision…


This post discusses some of the work we’ve been doing to lay foundations beneath the requirements for a ‘preservation system’ here at Cambridge. In particular, we’re looking at the core vision for the system. It comes with the standard ‘work in progress’ caveats – do not be surprised if the actual vision varies slightly (or more) from what’s discussed here. A lot of the below comes from Mastering the Requirements Process by Suzanne and James Robertson.

Also – it’s important to note that what follows is based upon a holistic definition of ‘system’ – a definition that’s more about what people know and do, and less about Information Technology, bits of tin and wiring.

Why does a system change need a vision?

New systems represent changes to the existing status-quo. The vision is like the Pole Star for such a change effort – it ensures that people have something fixed to move towards when they’re buried under minute details. When confusion reigns, you can point to the vision for the system to guide you back to sanity.

Plus, as with all digital efforts, none of this is real: there’s no definite, obvious end point to the change. So the vision will help us recognise when we’ve achieved what we set out to.

Establishing scope and context

Defining what the system change isn’t is a particularly good a way of working out what it actually represents. This can be achieved by thinking about the systems around the area you’re changing and the information that’s going to flow in and out. This sort of thinking makes for good diagrams: one that shows how a preservation repository system might sit within the broader ecosystem of digitisation, research outputs / data, digital archives and digital published material is shown below.

System goals

Being able to concisely sum-up the key goals of the system is another important part of the vision. This is a lot harder than it sounds and there’s something journalistic about it – what you leave out is definitely more important than what you keep in. Fortunately, the vision is about broad brush strokes, not detail, which helps at this stage.

I found some great inspiration in Sustainable Economics for a Digital Planet, which indicated goals such as: “the system should make the value of preserving digital resources clear”, “the system should clearly support stakeholders’ incentives to preserve digital resources” and “the functional aspects of the system should map onto clearly-defined preservation roles and responsibilities”.

Who are we implementing this for?

The final main part of the ‘vision’ puzzle is the stakeholders: who is going to benefit from a preservation system? Who might not benefit directly, but really cares that one exists?

Any significant project is likely to have a LOT of these, so the Robertsons suggest breaking the list down by proximity to the system (using Ian Alexander’s Onion Model), from the core team that uses the system, through the ‘operational work area’ (i.e. those with the need to actually use it) and out to interested parties within the host organisation, and then those in the wider world beyond. An initial attempt at thinking about our stakeholders this way is shown below.

One important thing that we realised was that it’s easy to confuse ‘closeness’ with ‘importance’: there are some very important stakeholders in the ‘wider world’ (e.g. Research Councils or historians) that need to be kept in the loop.

A proposed vision for our preservation repository

After iterating through all the above a couple of times, the current working vision (subject to change!) for a digital preservation repository at Cambridge University Library is as follows:

The repository is the place where the best possible copies of digital resources are stored, kept safe, and have their usefulness maintained. Any future initiatives that need the most perfect copy of those resources will be able to retrieve them from the repository, if authorised to do so. At any given time, it will be clear how the digital resources stored in the repository are being used, how the repository meets the preservation requirements of stakeholders, and who is responsible for the various aspects of maintaining the digital resources stored there.

Hopefully this will give us a clear concept to refer back to as we delve into more detail throughout the months and years to come…

C4RR – Containers for Reproducible Research Conference

James shares his thoughts after attending the C4RR Containers for Reproducible Research Conference at the University of Cambridge (27 – 28 June).


At the end of June both Dave and I, the Technical Fellows, attended the C4RR conference/workshop hosted by The Software Sustainability Institute in Cambridge. This event brought together researchers, developers and educators to explore best practices when using containers and the future of research software with containers.

Containers, specially Docker and Singularity are the ‘in’ thing at the moment and it was interesting to hear from a variety of research projects who are using them for reproducible research.

Containers are another form of server virtualisation but are lighter than a virtual machine. Containers and virtual machines have similar resource isolation and allocation benefits, but function differently because containers virtualize the operating system instead of hardware; containers are more portable and efficient.

 
Comparison of VM vs Container (Images from docker.com)

Researchers described how they were using Docker, one of the container implementations, to package the software used in their research so they could easily reproduce their computational environment across several different platforms (desktop, server and cluster). Others were using Singularity, another container technology, when implementing containers on a HPC (High-Performance Computing) Cluster due to restrictions of the Docker requirements for root access. It was clear from the talks, there is a rapid development of these technologies and ever increasing complexity of the computing environments involved, which does make me worry how these might be preserved.

Near the end of the second day, Dave and I gave a 20 minute presentation to encourage the audience to think more about preservation. As the audience were all evangelists for container technology it made sense to try to tap into them to promote building preservation into their projects.


Image By Raniere Silva

One aim was to get people to think about their research after the project was over. There is often a lack of motivation to think about how others might reproduce the work, whether that’s six months into the future let alone 15+ years from now.

Another area we briefly covered was relating to depositing research data. We use DROID to scan our repositories to identify file formats which relies on the technical registry of PRONOM. We put out a plea to the audience to ask for help with creating new file signatures for unknown file formats.

I had some great conversations with others over the two days and my main takeaway from the event was that we should look to attend more non-preservation specific conferences with a view to promote preservation in other computer-related areas of study.

Our slides from the event have been posted by The Software Sustainability Institute via Google.