Visit to the National Archives: herons and brutalism

An update from Edith Halvarsson about the DPOC team’s trip to visit the National Archives last week. Prepare yourself for a discussion about digital preservation, PRONOM, dark archives, and wildlife!


Last Thursday DPOC visited the National Archives in London. David Clipsham kindly put much time into organising a day of presentations with the TNA’s developers, digitization experts and digital archivists. Thank you Diana, David & David, Ron, Ian & Ian, Anna and Alex for all your time and interesting thoughts!

After some confusion, we finally arrived at the picturesque Kew Gardens station. The area around Kew is very sleepy, and our first thought on arrival was “is this really the right place?” However, after a bit more circling around Kew, you definitely cannot miss it. The TNA is located in an imposing brutalist building, surrounded by beautiful nature and ponds built as flood protection for the nation’s collections. They even have a tame heron!

After we all made it on site, the day the kicked off with an introduction from Diana Newton (Head of Digital Preservation). Diana told us enthusiastically about the history of the TNA and its Digital Records Infrastructure. It was really interesting to hear how much has changed in just six years since DRI was launched – both in terms of file format proliferation and an increase in FOI requests.

We then had a look at TNA’s ingest workflows into Preservica and storage model with Ian Hoyle (Senior Developer) and David Underdown (Senior Digital Archivist). It was particularly interesting to hear about the TNA’s decision to store all master file content on offline tape, in order to bring down the archive’s carbon footprint.

After lunch with Ron Davies (Senior Project Manager), Anna de Sousa and Ian Henderson spoke to us about their work digitizing audiovisual material and 2D images. Much of our discussion focused on standards and formats (particularly around A/V). Alex Green and David Clipsham then finished off the day talking about born-digital archive accession streams and PRONOM/DROID developments. This was the first time we had seen the clever way a file format identifier is created – there is much detective work required on David’s side. David also encouraged us and anyone else who relies on DROID to have a go and submit something to PRONOM – he even promised its fun! Why not read Jenny Mitcham’s and Andrea Byrne’s articles for some inspiration?

Thanks for a fantastic visit and some brilliant discussions on how digital preservation work and digital collecting is done at the TNA!

Training begins: personal digital archiving

Outreach & Training Fellow, Sarah, has officially begun training and capacity building with session on personal digital archiving at the Bodleian Libraries. Below Sarah shares how the first session went and shares some personal digital archiving tips.


Early Tuesday morning and the Weston Library had just opened to readers. I got to town earlier than usual, stopping to get a Melbourne-style flat white at one of my favourite local cafes – to get in me in the mood for public speaking. By 9am I was in the empty lecture theatre, fussing over cords, adjusting lighting and panicking of the fact I struggled to log in to the laptop.

At 10am, twenty-one interested faces were seated with pens at the ready; there was nothing else to do but take a deep breath and begin.

In the 1.5 hour session, I covered the DPOC project, digital preservation and personal digital archiving. The main section of the training was learning about personal digital archiving, preservation lifecycle and the best practice steps to follow to save your digital stuff!

The steps of the Personal Digital Archiving & Preservation Lifecycle are intended to help with keeping your digital files organised, findable and accessible over time. It’s not prescriptive advice, but it is a good starting point for better habits in your personal and work lives. Below are tips for every stage of the lifecycle that will help build better habits and preserve your valuable digital files.

Keep Track and Manage:

  • Know where your digital files are and what digital files you have: make a list of all of the places you keep your digital files
  • find out what is on your storage media – check the label, read the file and folder names, open the file to see the content
  • Most importantly: delete or dispose of things you no longer need.
    • This includes: things with no value, duplicates, blurry images, previous document versions (if not important) and so on.

Organise:

  • Use best practice for file naming:
    • No spaces, use underscores _ and hyphens – instead
    • Put ‘Created Date’ in the file name using yyyymmdd format
    • Don’t use special characters <>,./:;'”\|[]()!@£$%^&*€#`~
    • Keep the name concise and descriptive
    • Use a version control system for drafts (e.g. yyyymmdd_documentname_v1.txt)
  • Use best practice for folder naming;
    • Concise and descriptive names
    • Use dates where possible (yyyy or yyyymmdd)
    • keep file paths short and avoid a deep hierarchy
    • Choose structures that are logical to you and to others
  • To rename large groups of image files, consider using batch rename software

Describe:

  • Add important metadata directly into the body of a text document
    • creation date & version dates
    • author(s)
    • title
    • access rights & version
    • a description about the purpose or context of the document
  • Create a README.txt file of metadata for document collections
    • Be sure to list the folder names and file names to preserve the link between the metadata and the text file
    • include information about the context of the collection, dates, subjects and relevant information
    • this is a quick method for creating metadata around digital image collections
  • Embed the metadata directly in the file
  • for image and video: be sure to add subjects, location and a description of the trip or event
  • Add tags to documents and images to aid discoverability
  • Consider saving the ‘Creation Date’ in the file name, a free text field in the metadata, in the document header or in a README text file if it is important to you. In some cases transferring the file (copying to new media, uploading to cloud storage) will change the creation date and the original date will be lost. The same goes for saving as a different file type. Always test before transfer or ‘Save As’ actions or record the ‘Creation Date’ elsewhere.

Store:

  • Keep two extra backups in two geographically different locations
  • Diversify your backup storage media to protect against potential hardware faults
  • Try to save files in formats better suited to long-term access (for advice on how to choose file formats, visit Stanford University Libraries)
  • refresh your storage media every three to five years to protect against loss of hardware failure
  • do annual spot checks, including checking all backups. This will help check for any loss, corruption or damaged backups. Also consider checking all of the different file types in your collection, to ensure they are still accessible, especially if not saved in a recommended long-term file format.

Even I can admit I need better personal archiving habits. How many photographs are still on my SD cards, waiting for transfer, selection/deletion and renaming before saving in a few choice safe backup locations? The answer is: too many. 

Perhaps now that my first training session is over, I should start planning my personal side projects. I suspect clearing my backlog of SD cards is one of them.

Useful resources on personal digital archiving:

DPC Technology Watch Report, “Personal digital archiving” by Gabriela Redwine

DPC Case Note, “Personal digital preservation: Photographs and video“, by Richard Wright

Library of Congress “Personal Archiving” website, which includes guidance on preserving specific digital formats, videos and more

 

A view from the basement – a visit the DPC Glasgow

Last Monday, Sarah, Edith and Lee visited the Digital Preservation Coalition (DPC) at their DPC Glasgow Office on University Gardens. The aim of the visit was to understand how the DPC has and will lend support to the DPOC project. The DPOC team is very fortunate in having the DPC’s expertise, resources and services at their disposal as a supporting partner in the project and we were keen to find out more.

Plied with tea, coffee and Sharon McMeekin’s awesome lemon cake, William Kilbride gave us an overview of the DPC, explaining that that they are not-for-profit membership based organisation who used to mainly cater for the UK and Ireland. However, international agencies are now welcome (UN, NATO, ICC to name a few) and this has changed the nature of their program and the features that they offer (website, streaming, event recording). They are vendor neutral but do have a ‘Commercial Supporter’ community to help support events and raise funds for digital preservation work. They have six members of staff working from the DPC Glasgow and DPC York offices. They focus upon four main areas of:

  • Workforce Development, Training and Skills
  • Communication and Advocacy
  • Research and Practice
  • Partnerships and Sustainability

William explained the last three areas and Sharon gave us an overview of the work that she does for developing workforce skills and offering training events, especially the ‘Getting Started in Digital Preservation’ and ‘Making Progress’ workshops. The DPC also provide Leadership Scholarships to help develop knowledge and CPD in digital preservation, so please do apply for those if you are working somewhere that can spare your time out of the office but can’t fund you.

In terms of helping DPOC, the DPC can help with hosting events (such as PASIG 2017) and provide supporting training resources for our organisations. They can also help with procurement processes, auditing as well as calling on the wealth of advice gained from their six members of staff.

We left feeling that, despite working as a collaborative team with colleagues we can already bounce ideas off, we had a wider support network that we could call on, guide us and help us share our work more widely. From a skills and training perspective, the idea that they are happy to review, comment and suggest further avenues for the skills needs analysis toolkit to ensure it will benefit of the wider community is of tremendous use. Yet this is one such example, and help with procurement, policy development and auditing is also something they are willing to help the project with.

It is reassuring that the DPC are there and have plenty of experience to share in the digital preservation sphere. Tapping into networks, sharing knowledge and collaborating really is the best way to help achieve a coherent, sustainable approach to digital preservation and helps those working in it to focus on specific tasks rather than try and ‘reinvent the wheel’ when somebody else has already spent time on it.

DPC Student Conference – What I Wish I Knew Before I Started

At the end of January, I went to the Chancellor’s Hall at the University of London’s Art Deco style Senate House. Near to the entrance of the Chancellor’s Hall was Room 101. Rumours circulated amongst the delegates keenly awaiting the start of the conference that the building and the room were the inspiration for George Orwell’s Nineteen Eighty-Four.

Instead of facing my deepest and darkest digital preservation fears in Senate House, I was keen to see and hear what the leading digital preservation trainers and invited speakers at different stages of their careers had to say. For the DPOC project, I wanted to see what types of information were included in introductory digital preservation training talks, to witness the styles of delivery and what types of questions the floor would raise to see if there were any obvious gaps in the delivery. For the day’s programme, presenters’ slides and Twitter Storify, may I recommend that you visit the DPC webpage for this event:

http://www.dpconline.org/events/past-events/wiwik-2017

The take away lesson from the day, is just do something, don’t be afraid to start. Sharon McMeekin showed us how much the DPC can help (see their new website, it’s chock full of digital preservation goodness) and Steph Taylor from CoSense showed us that you can achieve a lot in digital preservation just through keeping an eye on emerging technologies and that you spend most of your time advocating that digital preservation is not just backing up. Steph also reinforced to the student delegation that you can approach members of the digital preservation community, they are all very friendly!

From the afternoon session, Dave Thompson reminded those assembled that we also need to think about the information age that we live in, how people use information, how they are their own gatekeepers to their digital records and how recordkeepers need to react to these changes, which will require a change in thinking from traditional recordkeeping theory and practice. As Adrian Brown put it for digital archivists, “digital archivists are archivists with superpowers”. One of those superpowers is the ability to adapt to your working context and the technological environment. Digital preservation is a constantly changing field and the practitioner needs to be able to adapt and change to the environment around them in a chameleon like manner to get their institution’s work preserved. Jennifer Febles reminded us that is also OK to say that “you don’t know” when training people, you can go away and learn or even learn from other colleagues. As for the content of the day, there were no real gaps, the day programme was spot on as far as I could tell from the delegates.

Whilst reflecting on the event on the journey back on the train (and whilst simultaneously being packed into the stifling hot carriage like a sweaty sardine), the one thing that I really wanted to find out was what the backgrounds of the delegates were. More specifically, what ‘information schools’ they were attending, what courses they were undertaking, how much their modules concerned digital recordkeeping and their preservation, and, most importantly, what they are being taught in those modules.

My thoughts then drifted towards thinking of those who have been given the label of ‘digital preservation experts’. They have cut their digital preservation teeth after their formal qualifications and training in an ostensibly different subject. Through a judicious application and blending of discipline-specific learning, learning about related fields they then apply this learning to their specific working context. Increasingly, in the digital world, those from a recordkeeping background need to embrace computer science skills and applications, especially for those where coding and command line operation is not a skill they have been brought up with. We seem to be at a point where the leading digital preservation practitioners are plying their trade (as they should) and not teaching their trade in a formal education setup. A very select few are doing both but if we pulled practitioners into formal digital preservation education programmes, would we then drain the discipline of innovative practice? Should digital preservation skills (which DigCurV has done well to define) be better suited to one big ‘on the job’ learning programme rather than more formal programmes. A mix of both would be my suggestion but this discussion will never close.

Starting out in digital preservation may seem terribly daunting, with so much to learn as there is so much going on. I think that the ‘information schools’ can equip students with the early skills and knowledge but from then on, the experience and skills is learned on the job. The thing that makes the digital preservation community standout is that people are not afraid to share their knowledge and skills for the benefit of preserving cultural heritage for the future.

Polonsky Fellows visit Western Bank Library at Sheffield University

Overview of DPOC’s visit to the Western Bank Library at Sheffield University by James Mooney, Technical Fellow at Bodleian Libraries, Oxford.
___________________________________________________________________________

The Polonsky Fellows were invited to the Western Bank Library at Sheffield University to speak with Laura Peaurt and other members of the Library. The aim of the meeting was to discuss the experiences of using and implementing Ex Libris’ Rosetta product.

After arriving by train, it was just a quick tram ride to Western Bank campus at Sheffield University, then we had the fun of using the paternoster lift in the Western Bank Library to arrive at our meeting, it’s great to see this technology has been preserved and still in use.

Paternoster lifts still in use at the Western Library. Image Credit: James Mooney

We met with Laura Peaurt (Digital Preservation Manager), Chris Jones (Library Systems Manager) and Angus Taggart (Library Systems Manager – Research).

Andy Bussey, Head of Digital Services & Systems was kind enough to give us an hour of his time at the start of the meeting, allowing us to discuss parts of the procurement and implementation process.

When working out the requirements for the system, Sheffield was able to collaborate with the White Rose University Consortium (the Universities of Leeds, Sheffield and York) to work out an initial scope.

When reviewing the options both open source and proprietary products were considered. For the Western Library and the University back in 2014, after a skills audit, the open source options had to be ruled out due to a lack of technical and developmental skills to customise or support them. I’m sure if this was revisited today the outcome may well have been different as the team has grown and gained experience and expertise. Many organisations may find it easier to budget for a software package and support contract with a vendor than to pursue the creation of several new employment positions.

With that said, as part of the implementation of Rosetta, Laura’s role was created as there was an obvious need for a Digital Preservation manager, we then went on to discuss the timeframe of the project and then moved onto the configuration of the product with Laura providing a live demonstration of the product whilst talking about the current setup, the scalability of the instances and the granularity of the sections within Rosetta.

During the demonstrations we discussed what content was held in Rosetta, how people had been trained with Rosetta and what feedback they had received so far. We reviewed the associated metadata which had been stored with the items that had been ingested and went over the options regarding integration with a Catalogue and/or Archival Management System.

After lunch we went on discuss the workflows currently being used with further demonstrations so we could see an end-to-end examples including what ingest rules and polices were in place along with what tools were in use and what processes were carried out. We then looked at how problematic items were dealt with in the Technical Analysis Workbench, covering the common issues and how additional steps in the ingest process can minimise certain issues.

As part of reviewing the sections of Rosetta we also inspected of Rosetta’s metadata model, the DNX (Digital Normalised XML) and discussed ingesting born-digital content and associated METS files.

Western Library. Image Credit: A J Buildings Library.

We visited Sheffield with many questions and during the course of the discussions throughout the day many of these were answered but as the day came to a close we had to wrap up the talks and head back to the train station. We all agreed it had been an invaluable meeting and sparked further areas of discussion. Having met face to face and with an understanding of the environment at Sheffield will make future conversations that much easier.

Digital preservation is a mature concept, but we need to pitch it better

Cambridge Technical Fellow, Dave, presents his thoughts on the OAIS and his own elevator pitch about digital preservation from the Pericles/DPC Acting on Change conference in London, last week.


Some of the best discussions at the Pericles / DPC Acting on Change conference came during the morning panel sessions. In the first, provocatively titled “Beyond the OAIS”, Barbara Sierman, from The KB National Library of the Netherlands, admitted that the OAIS can be confusing for newcomers… and as a newcomer to digital preservation, I agree!

Fellow panellist Barbara Reed, from Recordkeeping Innovation, suggested the OAIS’s Administration function as a potentially-confusing area, and this too struck a chord. I’ve gained some systems analysis and modelling experience over the years, and my first thought looking at the OAIS was that the Admin function looked like a place where much of the hard-to-model, human stuff had been separated from the technical, tool-based parts. (I’ve seen this happen before in other domains…)

There’s actually a hint that this is happening in the standard’s diagram for the Admin function – it’s busier and more information-packed than the other function diagrams, which tends to be a sign that it’s a bit of a ‘bucket’ which needs more modelling. This led me to an immediate concern that Admin doesn’t sit easily within the overall standard, and I think Barbara Reed had picked up on this too, suggesting that two more focused documents – one ‘technical’, one ‘human’ – might make the standard easier to use.

Then Artefactual Systems’ Dan Gillean asked who we should be talking to about the OAIS outside of the community? Barbara Reed answered ‘Enterprise Architects’; and two of the things Enterprise Architects use in their work are domain models and pattern languages. I was glad Barbara made this point, because I had already come to a similar conclusion.

AV Preserve’s Kara Van Malssen replied ‘communications experts’ to Dan’s question, suggesting Marketing in particular, though perhaps skilled science communicators might be even better? (Both Cambridge and Oxford – among others – put a lot of effort into public engagement with research, and there is a healthy body of research literature about it).

And the importance of communication was further emphasised by Nancy McGovern (MIT Libraries) and Neil Beagrie (Charles Beagrie Ltd) during the second day’s panel session (Preparing for Change). Nancy used the phrase ‘Technical Author’ at one stage – and it occurred that such input might be a very quick win for the OAIS Reference Implementation? Meanwhile, Neil talked about needing a short, pithy statement that explains what we do to funders…

So here’s an attempt at an Elevator Pitch:

Digital Preservation means sourcing computer-based material that is worthy of preservation, getting that material under control, and then maintaining the usefulness of that material, forever.

This Elevator Pitch is part of the pattern language I’m working on with my fellow Polonsky Fellows, and (I hope, soon) the broader Digital Preservation community. (We’re still thinking about that last ‘forever’, but considering how old some of the things in our libraries are, ‘forever’ seems an easy way of thinking about it).

The key point that Nancy McGovern made, however, was that we’re ready to take Digital Preservation to a wider audience. I think she’s right. The OAIS is confusing – it’s a real head-scrambler for a newcomer like me – but it has reached a level of maturity: it’s clear how much deep thought and expertise underpins it. And, of course, the same goes for the technology it has influenced over the previous decades. This supports what Arkivum’s Matthew Addis said in the second day’s keynote – the digital preservation community is ready to take their ideas to the world: we perhaps just need to pitch them a little better?

Audiovisual creation and preservation

Following on from the well received Filling the digital preservation gap(s) post, Somaya has followed this up by reflecting on an in-house workshop she recently attended entitled, ‘Video Production: Shoot, Edit and Upload’, which has prompted these thoughts and some practical advice on analogue and digital audiovisual preservation.


My photographer colleague, Maciej, and I attended a video editing course at Cambridge University. I was there to learn about what video file formats staff at the University are creating and where these are being stored and made available, with a view to future preservation of this type of digital content. It is important we know what types of content the university is creating, so we know what we will have to preserve now and in the future.

While I have an audio background (having started out splicing reel-to-reel tapes), for the past 20 years I have predominantly worked in the digital domain. I am not an analogue audiovisual specialist, particularly not film and video. However, I have previously worked for an Australian national broadcaster (in the radio division) and the National Film and Sound Archive of Australia (developing a strategy for acquiring and preserving multi-platform content, such as Apps and interactive audiovisual works etc.)

AV Media

A range of analogue and digital carriers. Image credit: Somaya Langley

Since my arrival, both Cambridge University Library and Bodleian Libraries, Oxford have been very keen to discuss their audiovisual collections and I’m led to believe there may be some significant film collections held in Cambridge University Library (although, I’ve yet to see them in person). As many people have been asking about audiovisual, I thought I would briefly share some information (from an Australiasian perspective).

A ten-year deadline for audiovisual digitisation

In 2015, the National Film and Sound Archive of Australia launched a strategy paper called Deadline 2025: collections at risk which outlines why there is a ten-year deadline to digitise analogue (or digital tape-based) audiovisual material. This is due to the fragility of the carriers (the reels, tapes etc.), playback equipment having been discontinued – a considerable proportion of equipment purchased is secondhand and bought via eBay or similar services – as well as the specialist skills also disappearing. The knowledge of analogue audiovisual held by engineers of this era is considerable. These engineers have started to retire, and while there is some succession planning, there is not nearly enough to retain the in-depth, wide-ranging and highly technical skill-sets and knowledge of engineers trained last century.

Obsolete physical carriers

Why is it that audio and video content requires extra attention? There is a considerable amount of specialist knowledge that is required to understand how carriers are best handled. In the same way that conservation staff know how to repair delicate hundreds of years old paper or paintings, similar knowledge is required to handle audiovisual carriers such as magnetic tape (cassettes, reel-to-reel tapes) or optical media (CDs, DVDs etc.) Not having the proper knowledge of how to wind tapes, when a tape requires ‘baking’ or holding a CD in a certain way can result in damage to the carrier. Further information on handling carriers can be found here: http://www.iasa-web.org/tc05/handling-storage-audio-video-carriers. If you’re struggling to identify an audiovisual or digital carrier, then Mediapedia (a resource initiated by Douglas Elford at the National Library of Australia) is a great starting point.

Earlier this year, along with former State Library of New South Wales colleagues in Sydney, Scott Wajon and Damien Cassidy, we produced an Obsolete Physical Carriers Report based on a survey of audiovisual and digital carriers held in nine Australian libraries for the National and State Libraries Australasia (NSLA). This outlined the scope of the problem of ‘at-risk’ content held on analogue and digital carriers (and that this content needs to be transferred within the next decade). Of note is the short lifespan of ‘burnt’ (as opposed to professionally mastered) CDs and DVDs.

Audio preservation standards

In 2004, the International Association of Sound and Audiovisual Archives (IASA) first published the audio preservation standard: Guidelines on the Production and Preservation of Digital Audio Objects. I have been lucky to have worked with the editor (Kevin Bradley from the National Library of Australia) and several of the main contributors (including Matthew Davies) in some of my previous roles. This sets a standard for the quality.

Other standards publications IASA has produced can be found here: http://www.iasa-web.org/iasa-publications

Video preservation standards

Since approximately 2010, IASA has been working towards publishing a similar standard for video preservation. While this has yet to be released, it is likely to be soon (hopefully 2017?).

In lieu of a world-wide standard for video

As audiovisual institutions around the world are digitising their film and video collections, they are developing their own internal guidelines and procedures regarding ‘preservation quality’ video, however best-practice has started to form with many choosing to use:

  • Lossless Motion JPEG 2000, inside an MXF OP1a wrapper

There is also interest in another CODEC as a possible video preservation standard, which is being discussed by various audiovisual preservation specialists as a possible alternative:

  • Lossless FFV1 (FF Video Codec 1)

For content that has been captured at a lower quality in the first place (e.g. video created with consumer rather than professional equipment), another format various collecting institutions may consider is:

  • Uncompressed AVI

Why is video tricky?

For the most part, video is more complex than audio for several reasons including:

  • A video file format may not be what it seems – there is both a container (aka wrapper) holding inside it the video file (e.g. Quicktime MOV file containing content encoded as H.264).
  • Video codecs can also produce files that are lossy (compressed with a loss of information) or lossless (compressed, but where data is not lost as part of the encoding process).

The tool, MediaInfo, can provide information about both the container and the encoded file for a wide range of file formats.

Of course, there are many things to consider and parameters to configure – hence needing film and video digitisation specialists and specialist equipment to produce preservation quality digitised video.

From the US, the Federal Agencies Digitization Guide Initiative (FADGI) are also a great resource for information about audiovisual digitisation.

Consumer-produced audiovisual content

While I would recommend that consumers capture and produce as high-quality audiovisual content as their equipment allows (minimum of 24bit, 48kHz WAV files for audio and uncompressed AVI for video), I’m aware those using mobile devices aren’t necessarily going to do this. So, in addition to ensuring, where possible, preservation quality audiovisual content is created now and in the future, we will also have to take into account significant content being created on non-professional consumer-grade equipment and the potential proprietary file formats produced.

What can you do?

If you’re creating audio and or video content:

  • set your settings on your device to the highest quality it will allow (however you will need to take into account the amount of storage this will require)
  • try to avoid proprietary and less common file formats and CODECs
  • be aware that, especially for video content, your file is a little more complex than you might have expected: it’s a ‘file’ inside a ‘wrapper’, so it’s almost like two files, one inside the other…

How big?

Another consideration are the file sizes of digitised and born-digital film and video content which has implications for how to ‘wrangle’ files as well as considerable storage needed … however this is best left for a future blog post.

We will discuss more about born-digital audiovisual content and considerations as the DPOC project progresses.

The digital preservation gap(s)

Somaya’s engaging, reflective piece identifies gaps in the wider digital preservation field and provides insightful thoughts as to how the gaps can be narrowed or indeed closed.


I initially commenced this post as a response to the iPres 2016 conference and an undercurrent that caught my attention there – however, really it is a broader comment on field of digital preservation itself. This post ties into some of my thoughts that have been brewing for several years about various gaps I’ve discovered in the digital preservation field. As part of the Polonsky Digital Preservation Project, I hope we will be able to do some of the groundwork to begin to address a number of these gaps.

So what are these gaps?

To me, there are many. And that’s not to say that there aren’t good people working very hard to address them – there are. (I should note that these people often do this work as part of their day jobs as well as evenings and weekends.)

Specifically, the gaps (at least the important ones I see) are:

  • Silo-ing of different areas of practice and knowledge (developers, archivists etc.)
  • Lack of understanding of working with born-digital materials at the coalface (including managing donor relationships)
  • Traditionally-trained archivists, curators and librarians wanting a ‘magic wand’ to deal with ‘all things digital’
  • Tools to undertake certain processes that do not currently exist (or do not exist for the technological platform or limitation archivists, curators, and librarians are having to work with)
  • Lack of existing knowledge of command line and/or coding skills in order to run the few available tools (skills that often traditionally-trained archivists, curators, and librarians don’t have under their belt)
  • Lack of knowledge of how to approach problem-solving

I’ve sat at the nexus between culture and technology for over two decades and these issues don’t just exist in the field of digital preservation. I’ve worked in festival and event production, radio broadcast and as an audiovisual tech assistant. I find similar issues in these fields too. (For example, the sound tech doesn’t understand the type of music the musician is creating and doesn’t mix it the right way, or the artist requesting the technician to do something not technically possible.) In the digital curation and digital preservation contexts, effectively I’ve been a translator between creators (academics, artists, authors, producers etc.), those working at the coalface of collecting institutions (archivists, curators and librarians) and technologists.

To me, one of the gaps was brought to the fore and exacerbated during the workshop: OSS4Pres 2.0: Building Bridges and Filling Gaps which built on the iPres 2015 workshop “Using Open-Source Tools to Fulfill Digital Preservation Requirements”. Last year I’d contributed my ideas prior to the workshop, however I couldn’t be there in person. This year I very much wanted to be part of the conversation.

What struck me was the discussion still began with the notion that digital preservation commences at the point where files are in a stable state, such as in a digital preservation system (or digital asset management system). Appraisal and undertaking data transfers wasn’t considered at all, yet it is essential to capture metadata (including technical metadata) at this very early point. (Metadata captured at this early point may turn into preservation metadata in the long run.)

I presented a common real-world use case/user story in acquiring born-digital collections: A donor has more than one Mac computer, each running different operating systems. The archivist needs to acquire a small selection of the donor’s files. The archivist cannot install any software onto the donor’s computers, ask them to install any software and only selected the files must be collected – hence, none of the computers can be disk imaged.

The Mac-based tools that exist to do this type of acquisition rely on Java software. Contemporary Mac operating systems don’t come with Java installed by default. Many donors are not competent computer users. They haven’t installed this software as they have no knowledge of it, need for it, or literally wouldn’t know how to. I put this call out to the Digital Curation Google Groups list several months ago, before I joined the Polonsky Digital Preservation Project. (It followed on from work that myself and my former colleagues at the National Library of Australia had undertaken to collect born-digital manuscript archives, having first run into this issue in 2012.) The response to my real-world use case at iPres was:

This final option is definitely not possible in many circumstances, including when collecting political archives from networked environments inside government buildings (another real-world use case I’ve had first-hand experience of). The view was that anything else isn’t possible or is much harder (yes, I’m aware). Nevertheless, this is the reality of acquiring born-digital content, particularly unpublished materials. It demands both ‘hard’ and ‘soft’ skills in equal parts.

The discussion at iPres 2016 brought me back to the times I’ve previously thought about how I could facilitate a way for former colleagues to spend “a day in someone else’s shoes”. It’s something I posed several times when working as a Producer at the Australian Broadcasting Corporation.

Archivists have an incredible sense of how to manage the relationship with a donor who is handing over their life’s work, ensuring the donor entrusts the organisation with the ongoing care of their materials. However traditionally trained archivists, curators and librarians typically don’t have in-depth technical skillsets. Technologists often haven’t witnessed the process of liaising with donors first-hand. Perhaps those working in developer and technical roles, which is typically further down the workflow for processing born-digital materials need opportunities to observe the process of acquiring born-digital collections from donors. Might this give them an increased appreciation for the scenarios that archivists find themselves in (and must problem-solve their way out of)? Conversely, perhaps archivists, curators and librarians need to witness the process of developers creating software (especially the effort needed to create a small GUI-based tool for collecting born-digital materials from various Mac operating systems) or debug code. Is this just a case of swapping seats for a day or a week? Definitely sharing approaches to problem-solving seems key.

Part of what we’re doing as part of the Polonsky Digital Preservation Project is to start to talk more holistically, rather than the term ‘digital preservation’ we’re talking about ‘digital stewardship’. Therefore, early steps of acquiring born-digital materials aren’t overlooked. As the Policy and Planning Fellow at Cambridge University Library, I’m aware I can affect change in a different way. Developing policy –  including technical policies (for example, the National Library of New Zealand’s Preconditioning Policy, referenced here) – means I can draw on my first-hand experience of acquiring born-digital collections with a greater understanding of what it takes to do this type of work. For now, this is the approach I need to take and I’m looking forward to the changes I’ll be able to influence.


Comments on Somaya’s piece would be most welcome. There’s plenty of grounds for discussion and constructive feedback will only enhance the wider, collaborative approach to addressing the issue of preserving digital content.

The other place

Image

The Cambridge team visited Oxford last week. Whilst there won’t be anything of a technical nature in this post, it is worth acknowledging that building and developing a core team for sustainable digital preservation is just as important a function as tools and technical infrastructure.


One of the first reactions we get when we talk about DPOC and its collaborative nature between Oxford and Cambridge is “how on earth do you get anything done?” given the perceived rivalry between the two institutions. “Surprisingly well, thank you” tends to be our answer. Sure, due to the collaborative (and human) nature of any project, there will be times when work doesn’t run parallel and we don’t immediately agree on an approach, but we’ve not let historical rivalry get in the way of working together.

To keep collaboration going, we usually meet on a Wednesday huddled around our respective laptops to participate in a ‘Team Skype’. As a change from this, the Cambridge people (Dave, Lee, Somaya, Suzanne, and Tuan) travelled over to see the Oxford people (Edith, Michael, and Sarah) for two days of valuable face to face meetings and informative talks. The Fellows travelled together; knowing we’d be driving through the rush hour on an east to west traverse, we left a bit earlier. What we hadn’t accounted for was a misbehaving satnav (see below), but it’s the little things like this that make teams bond too. We arrived half an hour before the start for an informal catch-up with Sarah, Edith, and Michael. Such time and interaction is very important to keep the team gelled together.

Satnav misbehaving complicating what is usually a simple left turn at a roundabout. Image credit: Somaya Langley.

Satnav misbehaving, complicating what is usually a simple left turn at a roundabout. Image credit: Somaya Langley.

A team meeting in the Osney One Boardroom formally started the day at 11am. It continued as a working lunch as we had plenty to discuss! We then had a fascinating insight into the developers’ aspects of how materials are ingested into the ORA repository from Michael Davis, followed by an overview from Sarah Barkla on how theses are deposited and their surrounding rights issues. Breaking for a cup of tea and team photo, the team then had split sessions; Sarah and Lee reviewed skills survey work whilst Dave, Edith, and Somaya discussed rationale and approaches to collections auditing.

Thursday saw the continuation of working in smaller teams. Sarah, Lee, and Michael had meetings to discuss PASIG 2017 organisation details. Dave, Edith, and Somaya (later joined by Michael) discussed their joint work before having a talk from Amanda Flynn and David Tomkins on open access and research data management.

Lunchtime heralded the time to board Oxford University’s minibus service to the 14th century Vault Café, St Mary’s Church for a tasty lunch (communal eating and drinking is very important for team building). We then went to the Weston Library to discuss Dave’s digital preservation pattern language over cake in the Weston’s spectacular Blackwell Hall and then on up to the BEAM (Bodleian Electronic Archives and Manuscripts) Lab (a fantastically tidy room with a FRED and many computers) to see and hear Susan Thomas, Matthew Neely, and Rachel Gardner talk about, show, and discuss BEAM’s processes. From a recordkeeping point of view, it was both comforting and refreshing to see that despite working in the digital realm, the archival principles around selection, appraisal, and access rights issues remain constant.

View from the BEAM Lab

The end of the rainbow on the left as viewed from the BEAM lab in the Weston Library. Image credit: Somaya Langley.

The mix of full team sessions, breaking into specialisms, joining up again for talks, and informal talks over tea breaks and lunches was a successful blend of continually building team relationships and contributing to progress. Both teams came away from the two days with reinforced ideas, new ideas, and enhanced clarity of continuing work and aims to keep all aspects of the digital preservation programme on track.

We don’t (and can’t) work in a bubble when it comes to digital preservation and the more that we can share the various components that make up ‘digital preservation’ and collaborate, the better contribution the team can make towards interested communities.

img_2744

The DP0C team assembled together on day one in Oxford.

On the core concepts of digital preservation

Cambridge’s Technical Fellow, Dave Gerrard, shares his learning on digital preservation from the PASIG 2016. As a newcomer to digital preservation, he is sharing his insights as he learns them.


As a relative newbie to Digital Preservation, attending PASIG 2016 was an important step towards getting a picture of the state of the art in digital preservation. One of the most important things for a technician to do when entering a new domain is to get a high-level view of the overall landscape, and build up an understanding of some of the overarching concepts, and last week’s PASIG conference provided a great opportunity to do this.

So this post is about some of those central overarching data preservation concepts, and how they might, or might not, map onto ‘real-world’ archives and archiving. I should also warn you that I’m going to be posing as many questions as answers here: it’s early days for our Polonsky project, after all, so we’re all still definitely in the ‘asking’ phase. (Feel free to answer, of course!) I’ll also be contrasting two particular presentations that were delivered at PASIG, which at first glance have little in common, but which I thought actually made the same point from completely different perspectives.

Perhaps the most obvious, key concept in digital preservation is ‘the archive’: a place where one deposits (or donates) things of value to be stored and preserved for the long term. This concept inevitably influences a lot of the theory and activity related to preserving digital resources, but is there really a direct mapping between how one would preserve ‘real’ objects, in a ‘bricks and mortar’ archive, and the digital domain? The answer appears to be ‘yes and no’: in certain areas (perhaps related to concepts such as acquiring resources and storing them, for example) it seems productive to think in broadly ‘real-world’ terms. Other ‘real-world’ concepts may be problematic when applied directly to digital preservation, however.

For example, my fellow Fellows will tell you that I take particular issue with the word ‘managing’: a term which in digital preservation seems to be used (at least by some people) to describe a particular small set of technical activities related to checking that digital files are still usable in the long-term. (‘Managing’ was used in this context in at least one PASIG presentation). One of the keys to working effectively with Information Systems is to get one’s terminology right, and in particular, to group together and talk about parts of a system that are on the same conceptual level. I.e. don’t muddle your levels of detail, particularly when modelling things. ‘Managing’ to me is a generic, high-level concept, which could mean anything from ‘making sure files are still usable’ to ‘ensuring public-facing staff answer the phone within five rings’ or even ‘making sure the staff kitchen is kept clean’. So I’m afraid that I think it’s an entirely inappropriate word to describe a very specific set of technical activities.

The trouble is, most of the other words we’ve considered for describing the process of ‘keeping files usable’ are similarly ‘higher-level’ concepts… One obvious one (preservation) once again applies to much more of the overall process, and so do many of its synonyms (‘stewardship’, ‘keeping custody of’, etc…) So these are all good terms at that high level of abstraction, but they’re for describing the big picture, not the details. Another term that is more specific, ‘fixity checking’, is maybe a bit too much like jargon…  (We’re still working on this: answers below please!) But the key point is: until one understands a concept well enough to be able to describe it in relatively simple terms, that make sense and fit together logically, building an information system and marshalling the related technology is always going to be tough.

Perhaps the PASIG topic that highlighted the biggest difference between ‘real world’ archiving and digital preservation the most, however, was discussion regarding the increased rate at which preserved digital resources can be ‘touched’ by outside forces. Obviously, nobody stores things in a ‘real-world’ archive in the expectation that they will never be looked at again (do they?), but in the digital realm, there are potentially many more opportunities for resources to be linked directly to the knowledge and information that builds upon them.

This is where the two contrasting presentations came in. The first was Scholarly workflow integration: The key to increasing reproducibility and preservation efficacy, by Jeffrey Spies (@JeffSpies) from the Center for Open Science. Jeffrey clarified exactly how digital preservation in a research data management context can highlight, explicitly, how a given piece of research builds upon what went before, by enabling direct linking to the publications, and (increasingly) to the raw data of peers working in the same field. Once digital research outputs and data are preserved, they are available to be linked to, reliably, in a manner that brings into play entirely new opportunities for archived research that never existed in the ‘real world’ of paper archives. Thus enabling the ‘discovery’ of preserved digital resources is not just about ensuring that resources are well-indexed and searchable, it’s about adding new layers of meaning and interpretation as future scholars use them in their own work. This in turn indicates how digital preservation is a function that is entirely integral to the (cyclical) research process – a situation which is well-illustrated in the 20th slide from Jeffrey’s presentation (if you download it – Figshare doesn’t seem to handle the animation in the slide too well – which sounds like a preservation issue in itself…).

By contrast, Symmetrical Archiving with Webrecorder, a talk by Dragan Espenschied (@despens), was at first glance completely unrelated to the topic of how preserved digital resources might have a greater chance of changing as time passes than their ‘real-world’ counterparts. Dragan was demonstrating the Webrecorder tool for capturing online works of art by recording visits to those works through a browser, and it was during the discussion afterwards that the question was asked: “how do you know that everything has been recorded ‘properly’ and nothing has been missed?”

For me, this question (and Dragan’s answer) struck at the very heart of the same issue. The answer was that each recording is a different object in itself, as the interpretation of the person recording the artwork is an integral part of the object. In fact, Dragan’s exact answer contained the phrase: “when an archivist adds an object to an archive, they create a new object”; the actual act of archiving changes an object’s meaning and significance (potentially subtly, though not always) to an extent that it is not the same object once it has been preserved. Furthermore, the object’s history and significance change once more with every visit to see it, and every time it is used as inspiration for a future piece of work.

Again – I’m a newbie, but I’m told by my fellow Fellows this situation is well understood in archiving and hence may be more of a revelation to me than most readers of this post. But what has changed is the way the digital realm gives us the opportunity not just to record how objects change as they’re used and referred to, but also a chance to make the connections to new knowledge gained from use of digital objects completely explicit and part of the object itself.

This highlights the final point I want to make about two of the overarching concepts of ‘real-world’ archiving and preservation which PASIG indicated might not map cleanly onto digital preservation. The first is the concept of ‘depositing’. According to Jeffrey Spies’s model, the ‘real world’ research workflow of ‘plan the research, collect and analyse the data, publish findings, gain recognition / significance in the research domain, and then finally deposit evidence of this ground-breaking research in an archive’, simply no longer applies. In the new model, the initial ‘deposit’ is made at the point a key piece of data is first captured, or a key piece of analysis is created. Works in progress, early drafts, important communications, grey literature, as well as the final published output, are all candidates for preservation at the point they are first created by the researchers. digital preservation happens seamlessly in the background. The states of the ‘preserved’ objects change throughout.

The second is the concept of ‘managing’ (urgh!), or otherwise ‘maintaining the status quo’ of an object into the long-term future. In the digital realm, there doesn’t need to be a ‘status quo’ – in fact there just isn’t one. We can record when people search for objects, when they find them, when they cite them. We can record when preserved data is validated by attempts to reproduce experiments or re-used entirely in different contexts. We can note when people have been inspired to create new artworks based upon our previous efforts, or have interpreted the work we have preserved from entirely new perspectives. This is genuine preservation: preservation that will help fit the knowledge we preserve today into the future picture. This opportunity would be much harder to realise when storing things in a ‘real-world’ archive, and we need to be careful to avoid thinking too much ‘in real terms’ if we are to make the most of it.

What do you think? Is it fruitful to try and map digital preservation onto real world concepts? Or does doing so put us at risk of missing core opportunities? Would moving too far away from ‘real-world’ archiving put us at risk of losing many important skills and ideas? Or does thinking about ‘the digital data archive’ in terms that are too like ‘the real world’ limit us from making important connections to our data in future?

Where does the best balance between ‘real-world’ concepts and digital preservation lie?