PASIG 2017: honest reflections from a trainee digital archivist

A guest blog post by Kelly, one of the Bodleian Libraries’ graduate digital archivist trainees, on what she learned as a volunteer and attendee of PASIG 2017 Oxford.


Amongst the digital preservation professionals from almost every continent and 130 institutions, myself and my 5 traineeship colleagues were amongst the lecture theatre seats, annexe demos and the awesome artefacts at the Museum of Natural History for PASIG 2017, Oxford. It was a brilliant opportunity at just 6 months into our traineeship to not only apply some of our new knowledge to work at Special Collections, Bodleian Libraries, but we were also able to gain a really current and relevant insight to theories we have been studying as part of our long distance MSc in Digital Curation at Aberystwyth University. The first ‘Bootcamp’ day was exactly what I needed to throw myself in, and it really consolidated my confidence in my understanding of some aspects of the shared language that is used amongst the profession (fixity checks, maturity models…as well as getting to grips with submission information packages, dissemination information packages and everything that occurs in between!).

My pen didn’t stop scribbling all three days, except maybe for tea breaks. Saying that, the demo presentations were also a great time for myself and other trainees to ask questions specifically about workflows and benefits of certain software such as LibNova, Preservica and ResourceSpace.

For want of a better word (and because it really is the truth) PASIG 2017 was genuinely inspiring and there were messages delivered so powerfully I hope that I stay grounded in these for my entire career. Here is what I was taught:

The Community is invaluable. Many of the speakers were quick to assert that sharing practice amongst the digital preservation community is key. This is a value I was familiar with, yet witnessing it happening throughout the conference in such a sincere manner. I can assure you the gratitude and affirmation that followed Eduardo del Valle, University of the Balearic Islands and his presentation: “Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation” was as encouraging to witness as someone new to the profession as it was to all of the other experienced delegates present. As well as sharing practice, it was clear that the community need to be advocating on behalf of each other. It is time and resource consuming but oh-so important.

Digital archives are preserving historical truths. Yes, the majority of the workflow is technological but the objectives and functions are so much more than technology; to just reduce digital preservation down to this is an oversimplification. It was so clear that the range of use cases presented at PASIG were all driven towards documenting social, political, historical information (and preserving that documentation) that will be of absolute necessity for society and infrastructure in future. Right now, for example, Angeline Takewara and her colleagues at UN MICT are working on a digital preservation programme to ensure absolute accountability and usability of the records of the International Criminal Tribunals of both Rwanda and Yugoslavia. I have written a more specific post on Angeline’s presentation here.

Due to the nature of technology and the digital world, the goalposts will always be moving. For example, Somaya Langley’s talk on the future of digital preservation and the mysteries of extracting data from smart devices will soon become (and maybe already is) a reality for those working with accessions of archives or information management. We should, then, embrace change and embrace the unsure and ultimately ‘get over the need for tidiness’ as pointed out by John Sheridan from The National Archives during his presentation “Creating and sustaining a disruptive digital archive” . This is usually counter-intuitive, but as the saying goes, one of the most dangerous phrases to use is ‘we’ve always done it that way’.

The value of digital material outlives the software, so the enabling of prolonged use of software is a real and current issue. Admittedly, this was a factor I had genuinely not even considered before. In my brain I linked obsolescence with hardware and hardware only. Therefore,  Dr. Natasa Milic-Frayling’s presentation on “Aging of Digital: Managed Services for digital continuity” shed much light on the changing computing ecosystem and the gradual aging of software. What I found especially interesting about the proposed software-continuity plan was the transparency of it; the fact that the client can ask to see the software at any time whilst it is being stabilised and maintained.

Thank you so much PASIG 2017 and everybody involved!

One last thing…in closing, Cliff Lynch, CNI, bought up that there was comparably less Web Archiving content this year. If anybody fancies taking a trainee to Mexico next year to do a (lightning) talk on Bodleian Libraries’ Web Archive I am keen…

 

 

Computers are the apogee of profligacy: a response to THE most important PASIG 2017 presentations

Following the PASIG conference, Cambridge Technical Fellow Dave Gerrard couldn’t simply wait to fire off his thoughts on the global context of digital preservation and how we need to better consider the world around us to work on a global solution and not just one that suits capitalist agenda. We usually preface these blogs with “enjoy” but in this instance, please, find a quiet moment, make yourself comfortable, read on and contemplate the global issues presented passionately presented here.


I’m going to work on a more technical blog about PASIG later, but first I want to get this one off my chest. It’s about the two most important presentations: Angeline Takawira’s Digital preservation at the United Nations Mechanism for International Criminal Tribunals and Keep your eyes on the information, Patricia Sleeman’s discussion of preservation work at the UN Refugee Agency (UNHCR).

Angeline Takawira described, in a very precise and formal manner, how the current best practice in Digital Preservation is being meticulously applied to preserving information from UN war crimes tribunals in The Hague (covering the Balkan conflict) and Arusha, Tanzania (covering the Rwandan genocide). As befitted her work, it was striking how calm Angeline was; how well the facts were stuck to, despite the emotive context. Of course, this has to be the case for work underpinning legal processes: intrusion of emotion into the capture of facts could let those trying to avoid justice escape it.

And the importance of maintaining a dispassionate outlook was echoed in the title of the other talk. “Keep your eyes on the information” was what Patricia Sleeman was told when learning to work with the UNHCR, as to engage too emotionally with the refugee crisis could make vital work impossible to perform. However, Patricia provided some context, in part by playing Head Over Heels, (Emi Mahmoud’s poem about the conflict and refugee crisis in Darfur), and by describing the brave, inspirational people she had met in Syria and Kurdistan. An emotionless response was impossible: the talk resulted in the conference’s longest and loudest applause.

Indeed, I think the audience was so stunned by Patricia’s words that questions were hard to formulate. However, my colleague Somaya at least asked the $64,000 one: how can we help? I’d like to tie this question back to one that Patricia raised in her talk, namely (and I paraphrase here): how do you justify expenditure on tasks like preservation when doing so takes food from the mouths of refugees?

So, now I’m less stunned, here’s my take: feeding refugees solves a symptom of the problem. Telling their stories helps to solve the problem, by making us engage our emotions, and think about how our lives are related to theirs, and about how we behave impacts upon them. And how can we help? Sure, we can help Patricia with her data management and preservation problems. But how can we really contribute to a solution? How can we stop refugee crises occurring in the first place?

We have a responsibility to recognise the connections between our own behaviour and the circumstances refugees find themselves in, and it all comes down, of course, to resources, and the profligate waste of them in the developed world. Indeed, Angeline and Patricia’s talks illustrated the borderline absurdity of a bunch of (mostly) privileged ‘Westerners’ / ‘Northerners’ (take your pick) talking about the ‘preservation’ of anything, when we’re products of a society that’s based upon throwing everything away.

And computers / all things ‘digital’ are at the apogee of this profligacy: Natasa Milic-Frayling highlighted this when she (diplomatically) referred to the way in which the ‘innovators’ hold all the cards, currently, in the relationship with ‘content producers’, and can hence render the technologies upon which we depend obsolete across ever-shorter cycles. Though, after Patricia’s talk, I’m inclined to frame this more in terms of ‘capitalist industrialists generating unnecessary markets at the expense of consumers’; particularly given that, while we were listening to Patricia, the latest iPhone was being launched in the US.

Though, of course, it’s not really the ‘poor consumers’ who genuinely suffer due to planned obsolescence… That would be the people in Africa and the Middle East whose countries are war zones due to grabs for oil or droughts caused by global warming. As the world’s most advanced tech companies, Apple, Google, Facebook, Amazon, Microsoft et al are the biggest players in a society that – at best indirectly, at worst carelessly – causes the suffering of the people Patricia and Angeline are helping and providing justice for. And, as someone typing a blog post using a Macbook Pro that doesn’t even let me add a new battery – I’m clearly part of the problem, not the solution.

So – in answer to Somaya’s question: how can we help? Well, for a start, we can stop fetishising the iPhone and start bigging up Fairphone and Phonebloks. However, keeping the focus on Digital Preservation, we’ve got to be really careful that our efforts aren’t used to support an IT industry that’s currently profligate way beyond moral acceptability. So rather than assuming (as I did above) that all the ‘best-practice’ of digital preservation flows from the ‘developed’ (ahem) world to the ‘developing’, we ought to seek some lessons in how to preserve technology from those who have fewer opportunities to waste it.

Somaya’s already on the case with her upcoming panel at iPres on the 28th September: Then we ought to continue down the road of holding PASIG in Mexico City next year by holding one in Africa as soon as possible. As long as – when we’re there, we make sure we shut up and listen.

PASIG 2017 Twitter round-up

After many months of planning it feels quite strange to us that PASIG 2017 is over. Hosting the PASIG conference in Oxford has been a valuable experience for the DPOC fellows and a great chance for Bodleian Libraries’ staff to meet with and listen to presentations by digital preservation experts from around the world.

In the end 244 conference delegates made their way to Oxford and the Museum of Natural History. The delegates came from 130 different institutions and every continent of the world was represented (…well, apart from Antarctica).

What was especially exciting though were all the new faces. In fact 2/3 of the delegates this year had not been to a PASIG conference before! Is this perhaps a sign that interest in digital preservation is on the rise?

As always at PASIG, Twitter was ablaze with discussion in spite of an at times flaky Wifi connection. Over three days #PASIG17 was mentioned a whopping 5300 times on Twitter and had a “reach” of 1.7 million. Well done everyone on some stellar outreach! Most active Twittering came from the UK, USA and Austria.

Twitter activity by country using #PASIG17 (Talkwalker statistics)

Although it is hard to choose favourites among all the Tweets, a few of the DPOC project’s personal highlights included:

Cambridge Fellow Lee Pretlove lists “digital preservation skills” and why we cannot be an expert in all areas. Tweet by Julian M. Morley

Bodleian Fellow James makes some insightful observations about the incompatibility between tar pits and digital preservation.

Cambridge Fellow Somaya Langley presents in the last PASIG session on the topic of “The Future of Digital Preservation”.  

What were some of your favourite talks and Twitter conversations? What would you like to see more of at PASIG 2018? #futurePASIG

Digital Preservation futurology

I fancy attempting futurology, so here’s a list of things I believe could happen to ‘digital preservation systems’ over the next decade. I’ve mostly pinched these ideas from folks like Dave Thompson, Neil Jefferies, and my fellow Fellows. But if you see one of your ideas, please claim it using the handy commenting mechanism. And because it’s futurology, it doesn’t have to be accurate, so kindly contradict me!

Ingest becomes a relationship, not a one-off event

Many of the core concepts underpinning how computers are perceived to work are crude, paper-based metaphors – e.g. ‘files’, ‘folders’, ‘desktops’, ‘wastebaskets’ etc – that don’t relate to what your computer’s actually doing. (The early players in office computing were typewriter and photocopier manufacturers, after all…) These metaphors have succeeded at getting everyone to use computers, but they’ve also suppressed various opportunities to work smarter, too.

The concept of ingesting (oxymoronic) ‘digital papers’ is obviously heavily influenced by this paper paradigm.  Maybe the ‘paper paradigm’ has misled the archival community about computers a bit, too, given that they were experts at handling ‘papers’ before computers arrived?

As an example of what I mean: in the olden days (25 whole years ago!), Professor Plum would amass piles of important papers until the day he retired / died, and then, and only then, could these personal papers be donated and archived. Computers, of course, make it possible for the Prof both to keep his ‘papers’ where he needs them, and donate them at the same time, but the ‘ingest event’ at the centre of current digital preservation systems still seems to be underpinned by a core concept of ‘piles of stuff needing to be dealt with as a one-off task’. In future, the ‘ingest’ of a ‘donation’ will actually become a regular, repeated set of occurrences based upon ongoing relationships between donors and collectors, and forged initially when Profs are but lowly postgrads. Personal Digital Archiving and Research Data Management will become key; and ripping digital ephemera from dying hard disks will become less necessary as they become so.

The above depends heavily upon…

Object versioning / dependency management

Of course, if Dr. Damson regularly donates materials from her postgrad days onwards, some of these may be updates to things donated previously. Some of them might have mutated so much since the original donation that they can be considered ‘child’ objects, which may have ‘siblings’ with ‘common ancestors’ already extant in the archive. Hence preservation systems need to manage multiple versions of ‘digital objects’, and the relationships between them.

Some of the preservation systems we’ve looked at claim to ‘do versioning’ but it’s a bit clunky – just side-by-side copies of immutable ‘digital objects’, not records of the changes from one version to the next, and with no concept of branching siblings from a common parent. Complex structures of interdependent objects are generally problematic for current systems. The wider computing world has been pushing at the limits of the ‘paper-paradigm’ immutable object for a while now (think Git, Blockchain, various version control and dependency management platforms, etc). Digital preservation systems will soon catch up.

Further blurring of the object / metadata boundary

What’s more important, the object or the metadata? The ‘paper-paradigm’ has skewed thinking towards the former (the sacrosanct ‘digital object’, comparable to the ‘original bit of paper’), but after you’ve digitised your rare book collection, what are Humanities scholars going to text-mine? It won’t be images of pages – it’ll be the transcripts of those (i.e. the ‘descriptive metadata’)*. Also, when seminal papers about these text mining efforts are published, how is this history of the engagement with your collection going to be recorded? Using a series of PREMIS Events (that future scholars can mine in turn), perhaps?

The above talk of text mining and contextual linking of secondary resources raises two more points…

* While I’m here, can I take issue with the term ‘descriptive metadata’? All metadata is descriptive. It’s tautological; like saying ‘uptight Englishman’. Can we think of a better name?

Ability to analyse metadata at scale

‘Delivery’ no longer just means ‘giving users a viewer to look at things one-by-one with’ – it now also means ‘letting people push their Natural Language or image processing algorithms to where the data sits, and then coping with vast streams of output data’.

Storage / retention informed by well-understood usage patterns

The fact that everything’s digital, and hence easier to disseminate and link together than physical objects, also means better understanding how people use our material. This doesn’t just mean ‘wiring things up to Google Analytics’ – advances in bibliometrics that add social / mainstream media analysis, and so forth, to everyday citation counts present opportunities to judge the impact of our ‘stuff’ on the world like never before. Smart digital archives will inform their storage management and retention decisions with this sort of usage information, potentially in fully or semi-automated ways.

Ability to get data out, cleanly – all systems are only ever temporary!

Finally – it’s clear that there are no ‘long-term’ preservation system options. The system you procure today will merely be ‘custodian’ of your materials for the next ten or twenty years (if you’re lucky). This may mean moving heaps of content around in future, but perhaps it’s more pragmatic to think of future preservation systems as more like ‘lenses’ that are laid on top of more stable data stores to enable as-yet-undreamt-of functionality for future audiences?

(OK – that’s enough for now…)

Audiovisual creation and preservation: part 2

Paul Heslin, Digital Collection Infrastructure Support Officer/Film Preservation Officer at the National Film and Sound Archive of Australia (NFSA) has generously contributed the following blog post. Introduction by Cambridge Policy and Planning Fellow, Somaya.

Introduction

As Digital Preservation is such a wide-ranging field, people working in this field can’t be an absolute expert on absolutely everything. It’s important to have areas of expertise and to connect and collaborate with others who can share their knowledge and experience.

While I have a background in audio, broadcast radio, multimedia and some video editing, moving image preservation is not my area of speciality. It is for this reason I invited Paul Heslin to compose a follow-up to my Audiovisual creation and preservation blog post. Paul Heslin is a Digital Archivist at the NFSA, currently preoccupied with migrating the digital collection to a new generation of LTO tapes.

I am incredibly indebted to Paul and the input from his colleagues and managers (some of whom are also my former colleagues, from when I worked at the NFSA).


Background to moving image preservation

A core concern for all archives is the ongoing accessibility of their collections. In this regard film archives have traditionally been spoilt: a film print does not require any intermediate machinery for assessment, and conceptually a projector is not a complicated device (at least in regards to presenting the visual qualities of the film). Film material can be expected to last hundreds of years if kept in appropriate vault conditions; other moving image formats are not so lucky. Many flavours of videotape are predicted to be extinct within a decade, due to loss of machinery or expertise, and born-digital moving image items can arrive at the archive in any possible format. This situation necessitates digitisation and migration to formats which can be trusted to continue to be suitable. But not only suitable!

Optimistically, the digital preservation of these formats carries the promise of these items maintaining their integrity perpetually. Unlike analogue preservation, there is no assumption of degradation over time, however there are other challenges to consider. The equipment requirements for playing back a digital audiovisual file can be complicated, especially as the vast majority of such files are compressed using encoding/decoding systems called codecs. There can be very interesting results when these systems go wrong!

Example of Bad Compression (in Paris). Copyright Paul Heslin

Example of Bad Compression (in Paris). Copyright Paul Heslin

Codecs

Codecs can be used in an archival context for much the same reason as the commercial world. Data storage is expensive and money saved can certainly be spent elsewhere. However, a key difference is that archives require truly lossless compression. So, it is important here to distinguish between lossless codecs which are mathematically lossless and those which are visually lossless. The later claims to encode in a way which is visually indistinguishable from an original source file, but it still dispenses with ‘superfluous’ data. This is not appropriate for archival usage, as this data loss cannot be recovered, and accumulated migration will ultimately result in visual and aural imperfections.

Another issue for archivists is that many codecs are proprietary or commercially owned: Apple’s ProRes format is a good example. While it is ubiquitously used within the production industry, it is an especially troubling example given signs that Apple will not be providing support into the future, especially for non-Mac platforms. This is not a huge issue for production companies who will have moved on to new projects and codecs, but for archives collecting these materials this presents a real problem. For this reason there is interest in dependable open standards which exist outside the commercial sphere.

FFV1

One of the more interesting developments in this area has been the emergence of the FFV1 codec. FFV1 started life in the early 2000s as a lossless codec associated with the FFMPEG free software project and has since gained some traction as a potential audiovisual preservation codec for the future. The advantages of the codec are:

  • It is non-proprietary, unlike the many other popular codecs currently in use.
  • It makes use of truly lossless compression, so archives can store more material in less space without compromising quality.
  • FFV1 files are ALWAYS losslessly compressed, which avoids accidents that can result from using formats which can either encode losslessly or lossily (like the popular JPEG-2000 archival format).
  • It internally holds checksums for each frame, allowing archivists to check that everything is as it should be. Frame checksums are especially useful in identifying where error has specifically occurred.
  • Benchmark tests indicate that conversion speeds are quicker than JPEG-2000. This makes a difference for archives dealing with large collections and limited computing resources.

The final, and possibly most exciting, attribute of FFV1 is that it is developing out of the needs of the archival community, rather than relying on specifications designed for industry use. Updates from the original developer, Michael Niedermayer, have introduced beneficial features for archival use and so far the codec has been implemented in different capacities by the The National Archives in the UK, the Austrian National Archives, and the Irish Film Institute, as well as being featured in the FIAF Journal Of Film Preservation.

Validating half a million TIFF files. Part Two.

Back in May, I wrote a blog post about preparing the groundwork for the process of validating over 500,000 TIFF files which were created as part of a Polonsky Digitization Project which started in 2013. You can read Part One here on the blog.

Restoring the TIFF files from tape

Stack of backup tapes. Photo: Amazon

For the digitization workflow we used Goobi and within that process, the master TIFF files from the project were written to tape. In order to actually check these files, it was obvious we would need to restore all the content to spinning disk. I duly made a request to our system administration team and waited.

As I mentioned in Part One, we had setup a new virtualised server which had access to a chunk of network storage. The Polonsky TIFF files were restored to this network storage, however midway through the restoration from tape, the tape server’s operating system crashed…disaster.

After reviewing the failure, it appeared there was a bug within the RedHat operating system which had caused the problem. This issue proved to be a good lesson, a tape backup copy is only useful if you can actually restore it!

Question for you. When was the last time you tried to restore a large quantity of data from tape?

After some head scratching, patching and a review of the related systems, a second attempt at restoring all the TIFF content from tape commenced and this time all went well and the files were restored to the network storage. Hurrah!

JHOVE to validate those TIFFs

I decided that for the initial validation of the TIFF files, checking the files were well-formed and valid, JHOVE would provide a good baseline report.

As I mentioned in another blog post Customizable JHOVE TIFF output handler anyone? JHOVE’s XML output is rather unwieldy and so I planned to transform the XML using xsltproc (a command line xslt processor) with a custom XSLT stylesheet, allowing us to select any of attributes from the file which we might want to report on later, this would then produce a simple CSV output.

On a side note, work on adding a CSV output handler to JHOVE is in progress! This would mean the above process would be much simpler and quicker.

Parallel processing for the win.

What’s better than one JHOVE process validating TIFF content? Two! (well actually for us, sixteen at once works out quite nicely.)

It was clear from some initial testing with a 10,000 sample set of TIFF files that a single JHOVE process was going to take a long time to process 520,000+ images (around two and half days!)

So I started to look for a simple way to run many JHOVE processes in parallel. Using GNU Parallel seemed like a good way to go.

I created a command line BASH script which would take a list of directories to scan and then utilise GNU Parallel to fire off many JHOVE + XSLT processes to result in a CSV output, one line per TIFF file processed.

As our validation server was virtualised, it meant that I could scale the memory and CPU cores in this machine to do some performance testing. Below is a chart showing the number of images that the parallel processing system could handle per minute vs. the number of CPU cores enabled on the virtual server. (For all of the testing the memory in the server remained at 4 GB.)

So with 16 CPU cores, the estimate was that it would take around 6-7 hours to process all the Polonksy TIFF content, so a nice improvement on a single process.

At the start of this week, I ran a full production test, validating all 520,000+ TIFF files. 4 and half hours later the process was complete and 100 MB+ CSV file was generated with 520,000+ rows of data. Success!

For Part Three of this story I will write up how I plan to visualise the CSV data in Qlik Sense and the further analysis of those few files which failed the initial validation.

Visit to the Parliamentary Archives: Training and business cases

Edith Halvarsson, Policy and Planning Fellow at Bodleian Libraries, writes about the DPOC project’s recent visit to the Parliamentary Archives.


This week the DPOC fellows visited the Parliamentary Archives in London. Thank you very much to Catherine Hardman (Head of Preservation and Access), Chris Fryer (Digital Archivist) and Grace Bell (Digital Preservation Trainee) for having us. Shamefully I have to admit that we have been very slow to make this trip; Chris first invited us to visit all the way back in September last year! However, our tardiness to make our way to Westminster was in the end aptly timed with the completion of year one of the DPOC project and planning for year 2.

Like CUL and Bodleian Libraries, the Parliamentary Archives also first began their own Digital Preservation Project back in 2010. Their project has since transitioned into digital preservation in a more programmatic capacity as of 2015. As CUL and Bodleian Libraries will be beginning to draft business cases for moving from project to programme in year 2; meeting with Chris and Catherine was a good opportunity to talk about how you start making that tricky transition.

Of course, every institution has its own drivers and risks which influence business cases for digital preservation, but there are certain things which will sound familiar to a lot of organisations. For example, what Parliamentary Archives have found over the past seven years, is that advocacy for digital collections and training staff in digital preservation skills is an ongoing activity. Implementing solutions is one thing, whereas maintaining them is another. This, in addition to staff who have received digital preservation training eventually moving on to new institutions, means that you constantly need to stay on top of advocacy and training. Making “the business case” is therefore not a one-off task.

Another central challenge in terms of building business cases, is how you frame digital preservation as a service rather than as “an added burden”. The idea of “seamless preservation” with no human intervention is a very appealing one to already burdened staff, but in reality workflows need to be supervised and maintained. To sell digital preservation, that extra work must therefore be perceived as something which adds value to collection material and the organisation. It is clear that physical preservation adds value to collections, but the argument for digital preservation can be a harder sell.

Catherine had, however, some encouraging comments on how we can attempt to turn advice about digital preservation into something which is perceived as value adding.  Being involved with and talking to staff early on in the design of new project proposals – rather than as an extra add on after processes are already in place – is an example of this.

Image by James Mooney

All in all, it has been a valuable and encouraging visit to the Parliamentary Archives. The DPOC fellows look forward to keeping in touch – particularly to hear more about the great work Parliamentary Archive have been doing to provide digital preservation training to staff!

What is holding us back from change?

There are worse spots for a meeting. Oxford. Photo by: S. Mason

Every 3 months the DPOC teams gets together in person in either Oxford, Cambridge or London (there’s also been talk of taking a meeting at Bletchley Park sometime). As this is a collaborative effort, these meetings offer a rare opportunity to work face-to-face instead of via Skype with the endless issues around screen sharing and poor connections. Good ideas come when we get to sit down together.

As our next joint board meeting is next week, it was important to look over the work of the past year and make sure we are happy with the plan for year two. Most importantly, we wanted to discuss the messages we need to give our institutions as we look towards the sustainability of our digital preservation activities. How do we ensure that the earlier work and the work being done by us does not get repeated in 2-5 years time?

Silos in institutions

This is especially complicated when dealing with institutions like Oxford and Cambridge. We are big and old institutions with teams often working in silos. What does siloing have an effect on? Well, everything. Communication, effort, research—it all suffers. Work done previously is done again. Over and over.

The same problems are being tackled within different silos; this is duplicated and wasted effort if they are not communicating their work to each other. This means that digital preservation efforts can be fractured and imbalanced if institutional collaboration is ignored. We have an opportunity and responsibility in this project to get people together and to get them to talk openly about the digital preservation problems they are each trying to tackle.

Managers need to lead the culture change in the institution

While not always the case, it is important that managers do not just sit back and say “you will never get this to work” or “it has always been this way.” We need them on our side; they after often the gatekeepers of silos. We have to bring them together in order to start opening the silos.

It is within their power to be the agents of change; we have to empower them to believe in changing the habits of our institution. They have to believe that digital preservation is worth it if their team will also.

This might be the ‘carrot and stick’ approach or the ‘carrot’ only, but whatever approach is used, the are a number of points we agreed needed to be made clear:

  • our digital collections are significant and we have made assurances about their preservation and long term access
  • our institutional reputation plays a role in the preservation our digital assets
  • digital preservation is a moving target and we must be moving with it
  • digital preservation will not be “solved” through this project, but we can make a start; it is important that this is not then the end.

Roadmap to sustainable digital preservation

Backing up any messages is the need for a sustainable roadmap. If you want change to succeed and if you want digital preservation to be a core activity, then steps must be actionable and incremental. Find out where you are, where you want to go and then outline the timeline of steps it will take to get there. Consider using maturity models to set goals for your roadmap, such as Kenney and McGovern’s, Brown’s or the NDSA model. Each are slightly different and some might be more suitable for your institutions than others, so have a look at all of them.

It’s like climbing a mountain. I don’t look at the peak as I walk; it’s too far away and too unattainable. Instead, I look at my feet and the nearest landmark. Every landmark I pass is a milestone and I turn my attention to the next one. Sometimes I glance up at the peak, still in the distance—over time it starts to grow closer. And eventually, my landmark is the peak.

It’s only when I get to the top that I see all of the other mountains I also have to climb. And so I find my landmarks and continue on. I consider digital preservation a bit of the same thing.

What are your suggestions for breaking down the silos and getting fractured teams to work together? 

Operational Pragmatism in Digital Preservation: a discussion

From Somaya Langley, Policy and Planning Fellow at Cambridge: In September this year, six digital preservation specialists from around the world will be leading a panel and audience discussion. The panel is titled Operational Pragmatism in Digital Preservation: establishing context-aware minimum viable baselines. This will be held at the iPres International Digital Preservation Conference in Kyoto, Japan.


Panellists

Panellists include:

  • Dr. Anthea Seles – The National Archives, UK
  • Andrea K Byrne – Rensselaer Polytechnic Institute, USA
  • Dr. Dinesh Katre – Centre for Development of Advanced Computing (C-DAC), India
  • Dr. Jones Lukose Ongalo – International Criminal Court, The Netherlands
  • Bertrand Caron – Bibliothèque nationale de France
  • Somaya Langley – Cambridge University Library, UK

Panellists have been invited based on their knowledge of a breadth of digital creation, archiving and preservation contexts and practices including having worked in non-Western, non-institutional and underprivileged communities.

Operational Pragmatism

What does ‘operational pragmatism’ mean? For the past year or two I’ve been pondering ‘what corners can we cut’? For over a decade I have witnessed an increasing amount of work in the digital preservation space, yet I haven’t seen the increase in staffing and resources to handle this work. Meanwhile deadlines for transferring digital (and analogue audiovisual) content from carriers are just around the corner (e.g. Deadline 2025).

Panel Topic

Outside of the First World and national institutional/top-tier university context, individuals in the developing world struggle to access basic technology and resources to be able to undertake archiving and preservation of digital materials. Privileged First World institutions (who still struggle with deeply ingrained under-resourcing) are considering Trusted Digital Repository certification, while in the developing world meeting these standards is just not feasible. (Evidenced by work that has taken place in the POWRR project and Anthea Seles’ PhD thesis and more.)

How do we best prioritise our efforts so we can plan effectively (with the current resources we have)? How do we strategically develop these resources in methodical ways while ensuring the critical digital preservation work gets done before it is simply too late?

Approach

This panel discussion will take the form of a series of provocations addressing topics including: fixity, infrastructure and storage, preconditioning, pre-ingest processes, preservation metadata, scalability (including bi-directional scalability), technical policies, tool error reporting and workflows.

Each panellist will present their view on a different topic. Audience involvement in the discussion will be strongly encouraged.

Outcomes

The intended outcome is a series of agreed-upon ‘baselines’ tailored to different cultural, organisational and contextual situations, with the hope that these can be used for digital preservation planning and strategy development.

Further Information

The Panel Abstract is included below.

iPres Digital Preservation Conference program information can be found at: https://ipres2017.jp/program/.

We do hope you’ll be able to join us.


Panel Abstract

Undertaking active digital preservation, holistically and thoroughly, requires substantial infrastructure and resources. National archives and libraries across the Western world have established, or are working towards maturity in digital preservation (often underpinned by legislative requirements). On the other hand, smaller collectives and companies situated outside of memory institution contexts, as well as organisations in non-Western and developing countries, are struggling with the basics of managing their digital materials. This panel continues the debate within the digital preservation community, critiquing the development of digital preservation practices typically from within positions of privilege. Bringing together individuals from diverse backgrounds, the aim is to establish a variety of ‘bare minimum’ baselines for digital preservation efforts, while tailoring these to local contexts.

Six Priority Digital Preservation Demands

Somaya Langley, Cambridge Policy and Planning Fellow, talks about her top 6 demands for a digital preservation system.


Photo: Blazej Mikula, Cambridge University Library

As a former user of one digital preservation system (Ex Libris’ Rosetta), I have spent a few years frustrated by the gap between what activities need to be done as part of a digital stewardship end-to-end workflow – including packaging and ingesting ‘information objects’ (files and associated metadata) – and the maturity level of digital preservation systems.

Digital Preservation Systems Review

At Cambridge, we are looking at different digital preservation systems and what each one can offer. This has involved talking to both vendors and users of systems.

When I’m asked about what my top digital preservation system current or future requirements are, it’s excruciatingly hard to limit myself to a handful of things. However, having previously been involved in a digital preservation system implementation project, there are some high-level takeaways from past experiences that remain with me.

Shortlist

Here’s the current list of my six top ‘digital preservation demands’ (aka user requirements):

Integration (with various other systems)

A digital preservation ‘system’ is only one cog in a wheel within a much larger machine; one piece of a much larger puzzle. There is an entire ‘digital ecosystem’ that this ‘system’ should exist within, and end-to-end digital stewardship workflows are of primary importance. The right amount of metadata and/or files should flow should flow from one system to another. We must also know where the ‘source of truth’ is for each bit.

Standards-based

This seems like a no-brainer. We work in Library Land. Libraries rely on standards. We also work with computers and other technologies that also require standard ways (protocols etc.) of communicating.

For files and metadata to flow from one system to another – whether via import, ingest, export, migration or an exit strategy from a system – we already spend a bunch of time creating mappings and crosswalks from one standard (or implementation of a standard) to another. If we don’t use (or fully implement) existing standards, this means we risk mangling data, context or meaning; potentially losing or not capturing parts of the data; or just wasting a whole lot of time.

Error Handling (automated, prioritised)

There’s more work to be done in managing digital materials than there are people to do it. Content creation is increasing at exponential rates, meanwhile the number of staff (with the right skills) just aren’t. We have to be smart about how we work. This requires prioritisation.

We need to have smarter systems that help us. This includes helping to prioritise where we focus our effort. Digital preservation systems are increasingly incorporating new third-party tools. We need to know which tool reports each error and whether these errors are show-stoppers or not. (For example: is the content no longer renderable versus a small piece of non-critical descriptive metadata that is missing?) We have to accept that, for some errors, we will never get around to addressing them.

Reporting

We need to be able to report to different audiences. The different types of reporting classes include (but are not limited to):

  1. High-level reporting – annual reports, monthly reports, reports to managers, projections, costings etc.)
  2. Collection and preservation management reporting – reporting on successes and failures, overall system stats, rolling checksum verification etc.
  3. Reporting for preservation planning purposes – based on preservation plans, we need to be able to identify subsections of our collection (configured around content types, context, file format and/or whatever other parameters we choose to use) and report on potential candidates that require some kind of preservation action.

Provenance

We need to best support – via metadata – where a file has come from. This, for want of a better approach, is currently being handled by the digital preservation community through documenting changes as Provenance Notes. Digital materials acquired into our collections are not just the files, they’re also the metadata. (Hence, why I refer to them as ‘information objects’.) When an ‘information object’ has been bundled, and is ready to be ingested into a system, I think of it as becoming an ‘information package’.

There’s a lot of metadata (administrative, preservation, structural, technical) that appears along the path from an object’s creation until the point at which it becomes an ‘information package’. We need to ensure we’re capturing and retaining the important components of this metadata. Those components we deem essential must travel alongside their associated files into a preservation system. (Not all files will have any or even the right metadata embedded within the file itself.) Standardised ways of handling information held in Provenance Notes (whether these are from ‘outside of the system’ or created by the digital preservation system) and event information so it can be interrogated and reported on is crucial.

Managing Access Rights

Facilitating access is not black and white. Collections are not simply ‘open’ or ‘closed’. We have a myriad of ways that digital material is created and collected; we need to ensure we can provide access to this content in a variety of ways that support both the content and our users. This can include access from within an institution’s building, via a dedicated computer terminal, online access to anyone in the world, mediated remote access, access to only subsets of a collection, support for embargo periods, ensuring we respect cultural sensitivities or provide access to only the metadata (perhaps as large datasets) and more.

We must set a goal of working towards providing access to our users in the many different (and new-ish) ways they actually want to use our content.

It’s imperative to keep in mind the whole purpose of preserving digital materials is to be able to access them (in many varied ways). Provision of content ‘viewers’ and facilitating other modes of access (e.g. to large datasets of metadata) are essential.

Final note: I never said addressing these concerns was going to be easy. We need to factor each in and make iterative improvements, one step at a time.