Audiovisual creation and preservation

Following on from the well received Filling the digital preservation gap(s) post, Somaya has followed this up by reflecting on an in-house workshop she recently attended entitled, ‘Video Production: Shoot, Edit and Upload’, which has prompted these thoughts and some practical advice on analogue and digital audiovisual preservation.

My photographer colleague, Maciej, and I attended a video editing course at Cambridge University. I was there to learn about what video file formats staff at the University are creating and where these are being stored and made available, with a view to future preservation of this type of digital content. It is important we know what types of content the university is creating, so we know what we will have to preserve now and in the future.

While I have an audio background (having started out splicing reel-to-reel tapes), for the past 20 years I have predominantly worked in the digital domain. I am not an analogue audiovisual specialist, particularly not film and video. However, I have previously worked for an Australian national broadcaster (in the radio division) and the National Film and Sound Archive of Australia (developing a strategy for acquiring and preserving multi-platform content, such as Apps and interactive audiovisual works etc.)

AV Media

A range of analogue and digital carriers. Image credit: Somaya Langley

Since my arrival, both Cambridge University Library and Bodleian Libraries, Oxford have been very keen to discuss their audiovisual collections and I’m led to believe there may be some significant film collections held in Cambridge University Library (although, I’ve yet to see them in person). As many people have been asking about audiovisual, I thought I would briefly share some information (from an Australiasian perspective).

A ten-year deadline for audiovisual digitisation

In 2015, the National Film and Sound Archive of Australia launched a strategy paper called Deadline 2025: collections at risk which outlines why there is a ten-year deadline to digitise analogue (or digital tape-based) audiovisual material. This is due to the fragility of the carriers (the reels, tapes etc.), playback equipment having been discontinued – a considerable proportion of equipment purchased is secondhand and bought via eBay or similar services – as well as the specialist skills also disappearing. The knowledge of analogue audiovisual held by engineers of this era is considerable. These engineers have started to retire, and while there is some succession planning, there is not nearly enough to retain the in-depth, wide-ranging and highly technical skill-sets and knowledge of engineers trained last century.

Obsolete physical carriers

Why is it that audio and video content requires extra attention? There is a considerable amount of specialist knowledge that is required to understand how carriers are best handled. In the same way that conservation staff know how to repair delicate hundreds of years old paper or paintings, similar knowledge is required to handle audiovisual carriers such as magnetic tape (cassettes, reel-to-reel tapes) or optical media (CDs, DVDs etc.) Not having the proper knowledge of how to wind tapes, when a tape requires ‘baking’ or holding a CD in a certain way can result in damage to the carrier. Further information on handling carriers can be found here: If you’re struggling to identify an audiovisual or digital carrier, then Mediapedia (a resource initiated by Douglas Elford at the National Library of Australia) is a great starting point.

Earlier this year, along with former State Library of New South Wales colleagues in Sydney, Scott Wajon and Damien Cassidy, we produced an Obsolete Physical Carriers Report based on a survey of audiovisual and digital carriers held in nine Australian libraries for the National and State Libraries Australasia (NSLA). This outlined the scope of the problem of ‘at-risk’ content held on analogue and digital carriers (and that this content needs to be transferred within the next decade). Of note is the short lifespan of ‘burnt’ (as opposed to professionally mastered) CDs and DVDs.

Audio preservation standards

In 2004, the International Association of Sound and Audiovisual Archives (IASA) first published the audio preservation standard: Guidelines on the Production and Preservation of Digital Audio Objects. I have been lucky to have worked with the editor (Kevin Bradley from the National Library of Australia) and several of the main contributors (including Matthew Davies) in some of my previous roles. This sets a standard for the quality.

Other standards publications IASA has produced can be found here:

Video preservation standards

Since approximately 2010, IASA has been working towards publishing a similar standard for video preservation. While this has yet to be released, it is likely to be soon (hopefully 2017?).

In lieu of a world-wide standard for video

As audiovisual institutions around the world are digitising their film and video collections, they are developing their own internal guidelines and procedures regarding ‘preservation quality’ video, however best-practice has started to form with many choosing to use:

  • Lossless Motion JPEG 2000, inside an MXF OP1a wrapper

There is also interest in another CODEC as a possible video preservation standard, which is being discussed by various audiovisual preservation specialists as a possible alternative:

  • Lossless FFV1 (FF Video Codec 1)

For content that has been captured at a lower quality in the first place (e.g. video created with consumer rather than professional equipment), another format various collecting institutions may consider is:

  • Uncompressed AVI

Why is video tricky?

For the most part, video is more complex than audio for several reasons including:

  • A video file format may not be what it seems – there is both a container (aka wrapper) holding inside it the video file (e.g. Quicktime MOV file containing content encoded as H.264).
  • Video codecs can also produce files that are lossy (compressed with a loss of information) or lossless (compressed, but where data is not lost as part of the encoding process).

The tool, MediaInfo, can provide information about both the container and the encoded file for a wide range of file formats.

Of course, there are many things to consider and parameters to configure – hence needing film and video digitisation specialists and specialist equipment to produce preservation quality digitised video.

From the US, the Federal Agencies Digitization Guide Initiative (FADGI) are also a great resource for information about audiovisual digitisation.

Consumer-produced audiovisual content

While I would recommend that consumers capture and produce as high-quality audiovisual content as their equipment allows (minimum of 24bit, 48kHz WAV files for audio and uncompressed AVI for video), I’m aware those using mobile devices aren’t necessarily going to do this. So, in addition to ensuring, where possible, preservation quality audiovisual content is created now and in the future, we will also have to take into account significant content being created on non-professional consumer-grade equipment and the potential proprietary file formats produced.

What can you do?

If you’re creating audio and or video content:

  • set your settings on your device to the highest quality it will allow (however you will need to take into account the amount of storage this will require)
  • try to avoid proprietary and less common file formats and CODECs
  • be aware that, especially for video content, your file is a little more complex than you might have expected: it’s a ‘file’ inside a ‘wrapper’, so it’s almost like two files, one inside the other…

How big?

Another consideration are the file sizes of digitised and born-digital film and video content which has implications for how to ‘wrangle’ files as well as considerable storage needed … however this is best left for a future blog post.

We will discuss more about born-digital audiovisual content and considerations as the DPOC project progresses.

The digital preservation gap(s)

Somaya’s engaging, reflective piece identifies gaps in the wider digital preservation field and provides insightful thoughts as to how the gaps can be narrowed or indeed closed.

I initially commenced this post as a response to the iPres 2016 conference and an undercurrent that caught my attention there – however, really it is a broader comment on field of digital preservation itself. This post ties into some of my thoughts that have been brewing for several years about various gaps I’ve discovered in the digital preservation field. As part of the Polonsky Digital Preservation Project, I hope we will be able to do some of the groundwork to begin to address a number of these gaps.

So what are these gaps?

To me, there are many. And that’s not to say that there aren’t good people working very hard to address them – there are. (I should note that these people often do this work as part of their day jobs as well as evenings and weekends.)

Specifically, the gaps (at least the important ones I see) are:

  • Silo-ing of different areas of practice and knowledge (developers, archivists etc.)
  • Lack of understanding of working with born-digital materials at the coalface (including managing donor relationships)
  • Traditionally-trained archivists, curators and librarians wanting a ‘magic wand’ to deal with ‘all things digital’
  • Tools to undertake certain processes that do not currently exist (or do not exist for the technological platform or limitation archivists, curators, and librarians are having to work with)
  • Lack of existing knowledge of command line and/or coding skills in order to run the few available tools (skills that often traditionally-trained archivists, curators, and librarians don’t have under their belt)
  • Lack of knowledge of how to approach problem-solving

I’ve sat at the nexus between culture and technology for over two decades and these issues don’t just exist in the field of digital preservation. I’ve worked in festival and event production, radio broadcast and as an audiovisual tech assistant. I find similar issues in these fields too. (For example, the sound tech doesn’t understand the type of music the musician is creating and doesn’t mix it the right way, or the artist requesting the technician to do something not technically possible.) In the digital curation and digital preservation contexts, effectively I’ve been a translator between creators (academics, artists, authors, producers etc.), those working at the coalface of collecting institutions (archivists, curators and librarians) and technologists.

To me, one of the gaps was brought to the fore and exacerbated during the workshop: OSS4Pres 2.0: Building Bridges and Filling Gaps which built on the iPres 2015 workshop “Using Open-Source Tools to Fulfill Digital Preservation Requirements”. Last year I’d contributed my ideas prior to the workshop, however I couldn’t be there in person. This year I very much wanted to be part of the conversation.

What struck me was the discussion still began with the notion that digital preservation commences at the point where files are in a stable state, such as in a digital preservation system (or digital asset management system). Appraisal and undertaking data transfers wasn’t considered at all, yet it is essential to capture metadata (including technical metadata) at this very early point. (Metadata captured at this early point may turn into preservation metadata in the long run.)

I presented a common real-world use case/user story in acquiring born-digital collections: A donor has more than one Mac computer, each running different operating systems. The archivist needs to acquire a small selection of the donor’s files. The archivist cannot install any software onto the donor’s computers, ask them to install any software and only selected the files must be collected – hence, none of the computers can be disk imaged.

The Mac-based tools that exist to do this type of acquisition rely on Java software. Contemporary Mac operating systems don’t come with Java installed by default. Many donors are not competent computer users. They haven’t installed this software as they have no knowledge of it, need for it, or literally wouldn’t know how to. I put this call out to the Digital Curation Google Groups list several months ago, before I joined the Polonsky Digital Preservation Project. (It followed on from work that myself and my former colleagues at the National Library of Australia had undertaken to collect born-digital manuscript archives, having first run into this issue in 2012.) The response to my real-world use case at iPres was:

This final option is definitely not possible in many circumstances, including when collecting political archives from networked environments inside government buildings (another real-world use case I’ve had first-hand experience of). The view was that anything else isn’t possible or is much harder (yes, I’m aware). Nevertheless, this is the reality of acquiring born-digital content, particularly unpublished materials. It demands both ‘hard’ and ‘soft’ skills in equal parts.

The discussion at iPres 2016 brought me back to the times I’ve previously thought about how I could facilitate a way for former colleagues to spend “a day in someone else’s shoes”. It’s something I posed several times when working as a Producer at the Australian Broadcasting Corporation.

Archivists have an incredible sense of how to manage the relationship with a donor who is handing over their life’s work, ensuring the donor entrusts the organisation with the ongoing care of their materials. However traditionally trained archivists, curators and librarians typically don’t have in-depth technical skillsets. Technologists often haven’t witnessed the process of liaising with donors first-hand. Perhaps those working in developer and technical roles, which is typically further down the workflow for processing born-digital materials need opportunities to observe the process of acquiring born-digital collections from donors. Might this give them an increased appreciation for the scenarios that archivists find themselves in (and must problem-solve their way out of)? Conversely, perhaps archivists, curators and librarians need to witness the process of developers creating software (especially the effort needed to create a small GUI-based tool for collecting born-digital materials from various Mac operating systems) or debug code. Is this just a case of swapping seats for a day or a week? Definitely sharing approaches to problem-solving seems key.

Part of what we’re doing as part of the Polonsky Digital Preservation Project is to start to talk more holistically, rather than the term ‘digital preservation’ we’re talking about ‘digital stewardship’. Therefore, early steps of acquiring born-digital materials aren’t overlooked. As the Policy and Planning Fellow at Cambridge University Library, I’m aware I can affect change in a different way. Developing policy –  including technical policies (for example, the National Library of New Zealand’s Preconditioning Policy, referenced here) – means I can draw on my first-hand experience of acquiring born-digital collections with a greater understanding of what it takes to do this type of work. For now, this is the approach I need to take and I’m looking forward to the changes I’ll be able to influence.

Comments on Somaya’s piece would be most welcome. There’s plenty of grounds for discussion and constructive feedback will only enhance the wider, collaborative approach to addressing the issue of preserving digital content.

The other place


The Cambridge team visited Oxford last week. Whilst there won’t be anything of a technical nature in this post, it is worth acknowledging that building and developing a core team for sustainable digital preservation is just as important a function as tools and technical infrastructure.

One of the first reactions we get when we talk about DPOC and its collaborative nature between Oxford and Cambridge is “how on earth do you get anything done?” given the perceived rivalry between the two institutions. “Surprisingly well, thank you” tends to be our answer. Sure, due to the collaborative (and human) nature of any project, there will be times when work doesn’t run parallel and we don’t immediately agree on an approach, but we’ve not let historical rivalry get in the way of working together.

To keep collaboration going, we usually meet on a Wednesday huddled around our respective laptops to participate in a ‘Team Skype’. As a change from this, the Cambridge people (Dave, Lee, Somaya, Suzanne, and Tuan) travelled over to see the Oxford people (Edith, Michael, and Sarah) for two days of valuable face to face meetings and informative talks. The Fellows travelled together; knowing we’d be driving through the rush hour on an east to west traverse, we left a bit earlier. What we hadn’t accounted for was a misbehaving satnav (see below), but it’s the little things like this that make teams bond too. We arrived half an hour before the start for an informal catch-up with Sarah, Edith, and Michael. Such time and interaction is very important to keep the team gelled together.

Satnav misbehaving complicating what is usually a simple left turn at a roundabout. Image credit: Somaya Langley.

Satnav misbehaving, complicating what is usually a simple left turn at a roundabout. Image credit: Somaya Langley.

A team meeting in the Osney One Boardroom formally started the day at 11am. It continued as a working lunch as we had plenty to discuss! We then had a fascinating insight into the developers’ aspects of how materials are ingested into the ORA repository from Michael Davis, followed by an overview from Sarah Barkla on how theses are deposited and their surrounding rights issues. Breaking for a cup of tea and team photo, the team then had split sessions; Sarah and Lee reviewed skills survey work whilst Dave, Edith, and Somaya discussed rationale and approaches to collections auditing.

Thursday saw the continuation of working in smaller teams. Sarah, Lee, and Michael had meetings to discuss PASIG 2017 organisation details. Dave, Edith, and Somaya (later joined by Michael) discussed their joint work before having a talk from Amanda Flynn and David Tomkins on open access and research data management.

Lunchtime heralded the time to board Oxford University’s minibus service to the 14th century Vault Café, St Mary’s Church for a tasty lunch (communal eating and drinking is very important for team building). We then went to the Weston Library to discuss Dave’s digital preservation pattern language over cake in the Weston’s spectacular Blackwell Hall and then on up to the BEAM (Bodleian Electronic Archives and Manuscripts) Lab (a fantastically tidy room with a FRED and many computers) to see and hear Susan Thomas, Matthew Neely, and Rachel Gardner talk about, show, and discuss BEAM’s processes. From a recordkeeping point of view, it was both comforting and refreshing to see that despite working in the digital realm, the archival principles around selection, appraisal, and access rights issues remain constant.

View from the BEAM Lab

The end of the rainbow on the left as viewed from the BEAM lab in the Weston Library. Image credit: Somaya Langley.

The mix of full team sessions, breaking into specialisms, joining up again for talks, and informal talks over tea breaks and lunches was a successful blend of continually building team relationships and contributing to progress. Both teams came away from the two days with reinforced ideas, new ideas, and enhanced clarity of continuing work and aims to keep all aspects of the digital preservation programme on track.

We don’t (and can’t) work in a bubble when it comes to digital preservation and the more that we can share the various components that make up ‘digital preservation’ and collaborate, the better contribution the team can make towards interested communities.


The DP0C team assembled together on day one in Oxford.

A digital preservation pattern language

Technical Fellow, Dave, shares his final update from PASIG NYC in October. It includes his opinions on digital preservation terminology and his development of an interpretation model for mapping processes.

Another of the sessions at the PASIG NYC conference we attended concerned standardisation. It started with Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of Practice by Artefactual Systems’ Dan Gillean, which explained the relationships between De Jure / De Facto, and Open / Proprietary standards, and which introduced the major Digital Preservation standards. Then later in the session, Sibyl Schaefer (@archivelle) from the UCSD Chronopolis Network presented Here we go again down this road: Certification and Recertification, which covered the ISO standardisation terminology (e.g. Certification vs Accreditation) and went deeper into the formal (De Jure) standards, in particular the Open Archival Information System (OAIS) reference model (ISO 14721) and the Audit and Certification of Trustworthy Digital Repositories (ISO 16363).

One aspect of Dan Gillean’s presentation that resonated with me was his discussion of the Communities of Practice that had emerged around the Digital Preservation standards. This reminded me of a software development concept called design patterns, which has its roots in (real) architecture, and in particular a book called A Pattern Language: towns, buildings, construction, by Christopher Alexander (et al). This proposes that planners and architects develop a ‘language’ of architecture so that they can learn from each other and contribute their ideas to a more harmonious, better-planned whole of well-designed cities, towns and countryside. The key concept they propose is that of the ‘pattern’:

The elements of this [architectural] language are entities called patterns. Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice (Alexander et al, 1977:x).

Each pattern has a common structure, including details of the problem it solves, the forces at work, the start and end states of related resources, and relationships to other patterns. (James Coplein has provided a short overview of a typical pattern structure). The idea is to build up a playbook of (de facto) standard approaches to common problems, and the types of behaviour that might solve them, as a way of sharing and reusing knowledge.

I asked around at PASIG to see if anyone had created a reusable set of Digital Preservation Patterns (somebody please tell me if so, it’ll save me heaps of work!), but I drew a blank. So I grabbed the Alexander book (I work in a building containing 18 million books!), and also had a quick look online. The best online resource I found was – which contained lots of familiar names related to programming design patterns (e.g. Erich Gamma, Grady Booch, Martin Fowler, Ward Cunningham). But the original Alexander book also gave me an insight into patterns that I’d never heard of before, in particular the very straightforward way that its patterns related to each other from the general / high level (e.g. patterns about regional, city and town planning), via mid-level patterns (for neighbourhoods, streets and building design), to the extremely detailed (e.g. patterns for where to put beds, baths and kitchen equipment).

This helped me consider what I think are two issues with Digital Preservation: firstly, there’s a lot of jargon (e.g. ‘fixity’, ‘technical metadata’ or ‘file format migration’ – none of which are terms fit for normal conversation). Secondly, many of the Digital Preservation models mismatch concepts at different levels of abstraction and complexity: for example the OAIS places a discrete process labelled Data Management alongside another labelled Ingest, where Ingest is quite a specific, discrete step in the overall picture, but where there’s also a strong case for saying that the whole of Digital Preservation is ‘data management’, including Ingest itself.

Such issues of defining and labelling concepts are common in most computer-technology-related domains, of course, and they’re often harmful (contributing to the common story of failed IT projects and angry developers / customers etc). But the way in which A Pattern Language arranges its patterns at the same levels of abstraction and detail, and in doing so enables drilling-down through region / city / town / neighbourhood / street / building / room, provides an elegant example of how to avoid this trap.

Hence I’ve been working on a model of the Digital Preservation domain that has ‘elevator pitch’ and ‘plain English’ levels of detail before I get to the nitty-gritty of technical details. My intention is to group similarly-sized and equally-complex sets of Digital Preservation processes together in ways that help describe them in clear, jargon-free ways, hence forming a reusable set of patterns that help people work out how to implement Digital Preservation in their own organisational contexts. I will have an opportunity to share this model, and the patterns I derive from it, as it develops. Watch this space.

Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I. and Angel, S. (1977) A Pattern Language: towns, buildings, construction. 1st edn. New York: Oxford University Press.

Do you know of any work that’s been done to create a Digital Preservation Pattern Language? Would you like to contribute your ideas towards Dave’s idea of creating a playbook of Digital Preservation design patterns? Please let Dave know using the form below…

How do we solve the developer gap? The ever-present question for libraries and archives.

PASIG 2016 was held at MoMA in NYC on 26-28 October 2016. And like many digital preservation conferences, Twitter was ablaze with ideas and discussions—both those in attendance and those watching from twitter feeds at their desks.

During Karen Cariani’s (Director WGBH Media Library and Archives) presentation, ‘The Complexity of Preserving Digital Media Files,’ there was a tweet highlighting a point from Cariani regarding the lack of developers in the library and archives sector:

This was something I have been pondering on for some time; it led me to retweet with this question:

And then I went back to the conference only to realise I had unleashed quite a strong debate among developers, IT staff, librarians and archivists working in a diverse range of institutions. Turns out, this is a question many people are asking as well. And the answer is probably not straightforward, or at least not answerable in 140 characters.

However, I was inundated with plenty of good ideas. Here are a few of the highlights:

And this only a selection of the conversations from the Twitterverse. It shows that there are many ideas for potential solutions to the ‘developer gap’ in libraries and archives. However, there’s no one-size-fits all solution for every institution. These ideas sound great, but do they work in practice?

Pay developers market rates. Or at least on par with IT staff.
This would ideally make our developer roles competitive, but as budgets continue shrink in our sector this proposal can be hard for some institutions to get the support from senior management to pay market rates. Paying on par with other IT staff seems a given and I would be interested to see where this is not put into practice and why not.

Find burnt out IT staff and lure them over.
If we can sell a work-life balance and other benefits in our organisations, perhaps that would make up for different rates. Remuneration is not all about the salary, but about the overall benefits. And if your institution can offer them, should this be highlighted in job advertisements up front better? After all, it’s not always all about the money…we also have interesting puzzles to solve!

Improving higher education curriculums for library and archives programmes.
Should understanding the digital environment (such as Web 2.0 and the Internet) still be taught in 2016 or can we all agree that students should have these prerequisite skills? Can we include basic computing science and basic programming skills? These skills are reaching into broader fields than just computer science, so why have library and archives courses not bothered to catch up? Even if it doesn’t give a librarian/archivist all the skills to be a developer, it will help to bridge the communication gap between IT and librarians/archivists.

Preserve less?
Likely a very contentious solution, for a number of reasons. Having strong and clear collections development policies will outline the scope of collection, but sometimes institutions cannot simply say ‘no’. However, whenever we acquire a collection, considerations must be made. It’s not just about the cost of storage that matters in digital preservation, but the cost of care and management over time.

There is likely no one easy solution. It is likely a combination of many things and shifts that will take place over years—probably at a glacial pace. These questions and potential solutions should be considered, because our development needs aren’t going anywhere. I think it’s safe to say that digital is here to stay…

Have an idea how to fill the developer gap? Share below:

On the core concepts of digital preservation

Cambridge’s Technical Fellow, Dave Gerrard, shares his learning on digital preservation from the PASIG 2016. As a newcomer to digital preservation, he is sharing his insights as he learns them.

As a relative newbie to Digital Preservation, attending PASIG 2016 was an important step towards getting a picture of the state of the art in digital preservation. One of the most important things for a technician to do when entering a new domain is to get a high-level view of the overall landscape, and build up an understanding of some of the overarching concepts, and last week’s PASIG conference provided a great opportunity to do this.

So this post is about some of those central overarching data preservation concepts, and how they might, or might not, map onto ‘real-world’ archives and archiving. I should also warn you that I’m going to be posing as many questions as answers here: it’s early days for our Polonsky project, after all, so we’re all still definitely in the ‘asking’ phase. (Feel free to answer, of course!) I’ll also be contrasting two particular presentations that were delivered at PASIG, which at first glance have little in common, but which I thought actually made the same point from completely different perspectives.

Perhaps the most obvious, key concept in digital preservation is ‘the archive’: a place where one deposits (or donates) things of value to be stored and preserved for the long term. This concept inevitably influences a lot of the theory and activity related to preserving digital resources, but is there really a direct mapping between how one would preserve ‘real’ objects, in a ‘bricks and mortar’ archive, and the digital domain? The answer appears to be ‘yes and no’: in certain areas (perhaps related to concepts such as acquiring resources and storing them, for example) it seems productive to think in broadly ‘real-world’ terms. Other ‘real-world’ concepts may be problematic when applied directly to digital preservation, however.

For example, my fellow Fellows will tell you that I take particular issue with the word ‘managing’: a term which in digital preservation seems to be used (at least by some people) to describe a particular small set of technical activities related to checking that digital files are still usable in the long-term. (‘Managing’ was used in this context in at least one PASIG presentation). One of the keys to working effectively with Information Systems is to get one’s terminology right, and in particular, to group together and talk about parts of a system that are on the same conceptual level. I.e. don’t muddle your levels of detail, particularly when modelling things. ‘Managing’ to me is a generic, high-level concept, which could mean anything from ‘making sure files are still usable’ to ‘ensuring public-facing staff answer the phone within five rings’ or even ‘making sure the staff kitchen is kept clean’. So I’m afraid that I think it’s an entirely inappropriate word to describe a very specific set of technical activities.

The trouble is, most of the other words we’ve considered for describing the process of ‘keeping files usable’ are similarly ‘higher-level’ concepts… One obvious one (preservation) once again applies to much more of the overall process, and so do many of its synonyms (‘stewardship’, ‘keeping custody of’, etc…) So these are all good terms at that high level of abstraction, but they’re for describing the big picture, not the details. Another term that is more specific, ‘fixity checking’, is maybe a bit too much like jargon…  (We’re still working on this: answers below please!) But the key point is: until one understands a concept well enough to be able to describe it in relatively simple terms, that make sense and fit together logically, building an information system and marshalling the related technology is always going to be tough.

Perhaps the PASIG topic that highlighted the biggest difference between ‘real world’ archiving and digital preservation the most, however, was discussion regarding the increased rate at which preserved digital resources can be ‘touched’ by outside forces. Obviously, nobody stores things in a ‘real-world’ archive in the expectation that they will never be looked at again (do they?), but in the digital realm, there are potentially many more opportunities for resources to be linked directly to the knowledge and information that builds upon them.

This is where the two contrasting presentations came in. The first was Scholarly workflow integration: The key to increasing reproducibility and preservation efficacy, by Jeffrey Spies (@JeffSpies) from the Center for Open Science. Jeffrey clarified exactly how digital preservation in a research data management context can highlight, explicitly, how a given piece of research builds upon what went before, by enabling direct linking to the publications, and (increasingly) to the raw data of peers working in the same field. Once digital research outputs and data are preserved, they are available to be linked to, reliably, in a manner that brings into play entirely new opportunities for archived research that never existed in the ‘real world’ of paper archives. Thus enabling the ‘discovery’ of preserved digital resources is not just about ensuring that resources are well-indexed and searchable, it’s about adding new layers of meaning and interpretation as future scholars use them in their own work. This in turn indicates how digital preservation is a function that is entirely integral to the (cyclical) research process – a situation which is well-illustrated in the 20th slide from Jeffrey’s presentation (if you download it – Figshare doesn’t seem to handle the animation in the slide too well – which sounds like a preservation issue in itself…).

By contrast, Symmetrical Archiving with Webrecorder, a talk by Dragan Espenschied (@despens), was at first glance completely unrelated to the topic of how preserved digital resources might have a greater chance of changing as time passes than their ‘real-world’ counterparts. Dragan was demonstrating the Webrecorder tool for capturing online works of art by recording visits to those works through a browser, and it was during the discussion afterwards that the question was asked: “how do you know that everything has been recorded ‘properly’ and nothing has been missed?”

For me, this question (and Dragan’s answer) struck at the very heart of the same issue. The answer was that each recording is a different object in itself, as the interpretation of the person recording the artwork is an integral part of the object. In fact, Dragan’s exact answer contained the phrase: “when an archivist adds an object to an archive, they create a new object”; the actual act of archiving changes an object’s meaning and significance (potentially subtly, though not always) to an extent that it is not the same object once it has been preserved. Furthermore, the object’s history and significance change once more with every visit to see it, and every time it is used as inspiration for a future piece of work.

Again – I’m a newbie, but I’m told by my fellow Fellows this situation is well understood in archiving and hence may be more of a revelation to me than most readers of this post. But what has changed is the way the digital realm gives us the opportunity not just to record how objects change as they’re used and referred to, but also a chance to make the connections to new knowledge gained from use of digital objects completely explicit and part of the object itself.

This highlights the final point I want to make about two of the overarching concepts of ‘real-world’ archiving and preservation which PASIG indicated might not map cleanly onto digital preservation. The first is the concept of ‘depositing’. According to Jeffrey Spies’s model, the ‘real world’ research workflow of ‘plan the research, collect and analyse the data, publish findings, gain recognition / significance in the research domain, and then finally deposit evidence of this ground-breaking research in an archive’, simply no longer applies. In the new model, the initial ‘deposit’ is made at the point a key piece of data is first captured, or a key piece of analysis is created. Works in progress, early drafts, important communications, grey literature, as well as the final published output, are all candidates for preservation at the point they are first created by the researchers. digital preservation happens seamlessly in the background. The states of the ‘preserved’ objects change throughout.

The second is the concept of ‘managing’ (urgh!), or otherwise ‘maintaining the status quo’ of an object into the long-term future. In the digital realm, there doesn’t need to be a ‘status quo’ – in fact there just isn’t one. We can record when people search for objects, when they find them, when they cite them. We can record when preserved data is validated by attempts to reproduce experiments or re-used entirely in different contexts. We can note when people have been inspired to create new artworks based upon our previous efforts, or have interpreted the work we have preserved from entirely new perspectives. This is genuine preservation: preservation that will help fit the knowledge we preserve today into the future picture. This opportunity would be much harder to realise when storing things in a ‘real-world’ archive, and we need to be careful to avoid thinking too much ‘in real terms’ if we are to make the most of it.

What do you think? Is it fruitful to try and map digital preservation onto real world concepts? Or does doing so put us at risk of missing core opportunities? Would moving too far away from ‘real-world’ archiving put us at risk of losing many important skills and ideas? Or does thinking about ‘the digital data archive’ in terms that are too like ‘the real world’ limit us from making important connections to our data in future?

Where does the best balance between ‘real-world’ concepts and digital preservation lie?