PASIG 2017: honest reflections from a trainee digital archivist

A guest blog post by Kelly, one of the Bodleian Libraries’ graduate digital archivist trainees, on what she learned as a volunteer and attendee of PASIG 2017 Oxford.


Amongst the digital preservation professionals from almost every continent and 130 institutions, myself and my 5 traineeship colleagues were amongst the lecture theatre seats, annexe demos and the awesome artefacts at the Museum of Natural History for PASIG 2017, Oxford. It was a brilliant opportunity at just 6 months into our traineeship to not only apply some of our new knowledge to work at Special Collections, Bodleian Libraries, but we were also able to gain a really current and relevant insight to theories we have been studying as part of our long distance MSc in Digital Curation at Aberystwyth University. The first ‘Bootcamp’ day was exactly what I needed to throw myself in, and it really consolidated my confidence in my understanding of some aspects of the shared language that is used amongst the profession (fixity checks, maturity models…as well as getting to grips with submission information packages, dissemination information packages and everything that occurs in between!).

My pen didn’t stop scribbling all three days, except maybe for tea breaks. Saying that, the demo presentations were also a great time for myself and other trainees to ask questions specifically about workflows and benefits of certain software such as LibNova, Preservica and ResourceSpace.

For want of a better word (and because it really is the truth) PASIG 2017 was genuinely inspiring and there were messages delivered so powerfully I hope that I stay grounded in these for my entire career. Here is what I was taught:

The Community is invaluable. Many of the speakers were quick to assert that sharing practice amongst the digital preservation community is key. This is a value I was familiar with, yet witnessing it happening throughout the conference in such a sincere manner. I can assure you the gratitude and affirmation that followed Eduardo del Valle, University of the Balearic Islands and his presentation: “Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation” was as encouraging to witness as someone new to the profession as it was to all of the other experienced delegates present. As well as sharing practice, it was clear that the community need to be advocating on behalf of each other. It is time and resource consuming but oh-so important.

Digital archives are preserving historical truths. Yes, the majority of the workflow is technological but the objectives and functions are so much more than technology; to just reduce digital preservation down to this is an oversimplification. It was so clear that the range of use cases presented at PASIG were all driven towards documenting social, political, historical information (and preserving that documentation) that will be of absolute necessity for society and infrastructure in future. Right now, for example, Angeline Takewara and her colleagues at UN MICT are working on a digital preservation programme to ensure absolute accountability and usability of the records of the International Criminal Tribunals of both Rwanda and Yugoslavia. I have written a more specific post on Angeline’s presentation here.

Due to the nature of technology and the digital world, the goalposts will always be moving. For example, Somaya Langley’s talk on the future of digital preservation and the mysteries of extracting data from smart devices will soon become (and maybe already is) a reality for those working with accessions of archives or information management. We should, then, embrace change and embrace the unsure and ultimately ‘get over the need for tidiness’ as pointed out by John Sheridan from The National Archives during his presentation “Creating and sustaining a disruptive digital archive” . This is usually counter-intuitive, but as the saying goes, one of the most dangerous phrases to use is ‘we’ve always done it that way’.

The value of digital material outlives the software, so the enabling of prolonged use of software is a real and current issue. Admittedly, this was a factor I had genuinely not even considered before. In my brain I linked obsolescence with hardware and hardware only. Therefore,  Dr. Natasa Milic-Frayling’s presentation on “Aging of Digital: Managed Services for digital continuity” shed much light on the changing computing ecosystem and the gradual aging of software. What I found especially interesting about the proposed software-continuity plan was the transparency of it; the fact that the client can ask to see the software at any time whilst it is being stabilised and maintained.

Thank you so much PASIG 2017 and everybody involved!

One last thing…in closing, Cliff Lynch, CNI, bought up that there was comparably less Web Archiving content this year. If anybody fancies taking a trainee to Mexico next year to do a (lightning) talk on Bodleian Libraries’ Web Archive I am keen…

 

 

Computers are the apogee of profligacy: a response to THE most important PASIG 2017 presentations

Following the PASIG conference, Cambridge Technical Fellow Dave Gerrard couldn’t simply wait to fire off his thoughts on the global context of digital preservation and how we need to better consider the world around us to work on a global solution and not just one that suits capitalist agenda. We usually preface these blogs with “enjoy” but in this instance, please, find a quiet moment, make yourself comfortable, read on and contemplate the global issues presented passionately presented here.


I’m going to work on a more technical blog about PASIG later, but first I want to get this one off my chest. It’s about the two most important presentations: Angeline Takawira’s Digital preservation at the United Nations Mechanism for International Criminal Tribunals and Keep your eyes on the information, Patricia Sleeman’s discussion of preservation work at the UN Refugee Agency (UNHCR).

Angeline Takawira described, in a very precise and formal manner, how the current best practice in Digital Preservation is being meticulously applied to preserving information from UN war crimes tribunals in The Hague (covering the Balkan conflict) and Arusha, Tanzania (covering the Rwandan genocide). As befitted her work, it was striking how calm Angeline was; how well the facts were stuck to, despite the emotive context. Of course, this has to be the case for work underpinning legal processes: intrusion of emotion into the capture of facts could let those trying to avoid justice escape it.

And the importance of maintaining a dispassionate outlook was echoed in the title of the other talk. “Keep your eyes on the information” was what Patricia Sleeman was told when learning to work with the UNHCR, as to engage too emotionally with the refugee crisis could make vital work impossible to perform. However, Patricia provided some context, in part by playing Head Over Heels, (Emi Mahmoud’s poem about the conflict and refugee crisis in Darfur), and by describing the brave, inspirational people she had met in Syria and Kurdistan. An emotionless response was impossible: the talk resulted in the conference’s longest and loudest applause.

Indeed, I think the audience was so stunned by Patricia’s words that questions were hard to formulate. However, my colleague Somaya at least asked the $64,000 one: how can we help? I’d like to tie this question back to one that Patricia raised in her talk, namely (and I paraphrase here): how do you justify expenditure on tasks like preservation when doing so takes food from the mouths of refugees?

So, now I’m less stunned, here’s my take: feeding refugees solves a symptom of the problem. Telling their stories helps to solve the problem, by making us engage our emotions, and think about how our lives are related to theirs, and about how we behave impacts upon them. And how can we help? Sure, we can help Patricia with her data management and preservation problems. But how can we really contribute to a solution? How can we stop refugee crises occurring in the first place?

We have a responsibility to recognise the connections between our own behaviour and the circumstances refugees find themselves in, and it all comes down, of course, to resources, and the profligate waste of them in the developed world. Indeed, Angeline and Patricia’s talks illustrated the borderline absurdity of a bunch of (mostly) privileged ‘Westerners’ / ‘Northerners’ (take your pick) talking about the ‘preservation’ of anything, when we’re products of a society that’s based upon throwing everything away.

And computers / all things ‘digital’ are at the apogee of this profligacy: Natasa Milic-Frayling highlighted this when she (diplomatically) referred to the way in which the ‘innovators’ hold all the cards, currently, in the relationship with ‘content producers’, and can hence render the technologies upon which we depend obsolete across ever-shorter cycles. Though, after Patricia’s talk, I’m inclined to frame this more in terms of ‘capitalist industrialists generating unnecessary markets at the expense of consumers’; particularly given that, while we were listening to Patricia, the latest iPhone was being launched in the US.

Though, of course, it’s not really the ‘poor consumers’ who genuinely suffer due to planned obsolescence… That would be the people in Africa and the Middle East whose countries are war zones due to grabs for oil or droughts caused by global warming. As the world’s most advanced tech companies, Apple, Google, Facebook, Amazon, Microsoft et al are the biggest players in a society that – at best indirectly, at worst carelessly – causes the suffering of the people Patricia and Angeline are helping and providing justice for. And, as someone typing a blog post using a Macbook Pro that doesn’t even let me add a new battery – I’m clearly part of the problem, not the solution.

So – in answer to Somaya’s question: how can we help? Well, for a start, we can stop fetishising the iPhone and start bigging up Fairphone and Phonebloks. However, keeping the focus on Digital Preservation, we’ve got to be really careful that our efforts aren’t used to support an IT industry that’s currently profligate way beyond moral acceptability. So rather than assuming (as I did above) that all the ‘best-practice’ of digital preservation flows from the ‘developed’ (ahem) world to the ‘developing’, we ought to seek some lessons in how to preserve technology from those who have fewer opportunities to waste it.

Somaya’s already on the case with her upcoming panel at iPres on the 28th September: Then we ought to continue down the road of holding PASIG in Mexico City next year by holding one in Africa as soon as possible. As long as – when we’re there, we make sure we shut up and listen.

PASIG 2017 Twitter round-up

After many months of planning it feels quite strange to us that PASIG 2017 is over. Hosting the PASIG conference in Oxford has been a valuable experience for the DPOC fellows and a great chance for Bodleian Libraries’ staff to meet with and listen to presentations by digital preservation experts from around the world.

In the end 244 conference delegates made their way to Oxford and the Museum of Natural History. The delegates came from 130 different institutions and every continent of the world was represented (…well, apart from Antarctica).

What was especially exciting though were all the new faces. In fact 2/3 of the delegates this year had not been to a PASIG conference before! Is this perhaps a sign that interest in digital preservation is on the rise?

As always at PASIG, Twitter was ablaze with discussion in spite of an at times flaky Wifi connection. Over three days #PASIG17 was mentioned a whopping 5300 times on Twitter and had a “reach” of 1.7 million. Well done everyone on some stellar outreach! Most active Twittering came from the UK, USA and Austria.

Twitter activity by country using #PASIG17 (Talkwalker statistics)

Although it is hard to choose favourites among all the Tweets, a few of the DPOC project’s personal highlights included:

Cambridge Fellow Lee Pretlove lists “digital preservation skills” and why we cannot be an expert in all areas. Tweet by Julian M. Morley

Bodleian Fellow James makes some insightful observations about the incompatibility between tar pits and digital preservation.

Cambridge Fellow Somaya Langley presents in the last PASIG session on the topic of “The Future of Digital Preservation”.  

What were some of your favourite talks and Twitter conversations? What would you like to see more of at PASIG 2018? #futurePASIG

Digital Preservation futurology

I fancy attempting futurology, so here’s a list of things I believe could happen to ‘digital preservation systems’ over the next decade. I’ve mostly pinched these ideas from folks like Dave Thompson, Neil Jefferies, and my fellow Fellows. But if you see one of your ideas, please claim it using the handy commenting mechanism. And because it’s futurology, it doesn’t have to be accurate, so kindly contradict me!

Ingest becomes a relationship, not a one-off event

Many of the core concepts underpinning how computers are perceived to work are crude, paper-based metaphors – e.g. ‘files’, ‘folders’, ‘desktops’, ‘wastebaskets’ etc – that don’t relate to what your computer’s actually doing. (The early players in office computing were typewriter and photocopier manufacturers, after all…) These metaphors have succeeded at getting everyone to use computers, but they’ve also suppressed various opportunities to work smarter, too.

The concept of ingesting (oxymoronic) ‘digital papers’ is obviously heavily influenced by this paper paradigm.  Maybe the ‘paper paradigm’ has misled the archival community about computers a bit, too, given that they were experts at handling ‘papers’ before computers arrived?

As an example of what I mean: in the olden days (25 whole years ago!), Professor Plum would amass piles of important papers until the day he retired / died, and then, and only then, could these personal papers be donated and archived. Computers, of course, make it possible for the Prof both to keep his ‘papers’ where he needs them, and donate them at the same time, but the ‘ingest event’ at the centre of current digital preservation systems still seems to be underpinned by a core concept of ‘piles of stuff needing to be dealt with as a one-off task’. In future, the ‘ingest’ of a ‘donation’ will actually become a regular, repeated set of occurrences based upon ongoing relationships between donors and collectors, and forged initially when Profs are but lowly postgrads. Personal Digital Archiving and Research Data Management will become key; and ripping digital ephemera from dying hard disks will become less necessary as they become so.

The above depends heavily upon…

Object versioning / dependency management

Of course, if Dr. Damson regularly donates materials from her postgrad days onwards, some of these may be updates to things donated previously. Some of them might have mutated so much since the original donation that they can be considered ‘child’ objects, which may have ‘siblings’ with ‘common ancestors’ already extant in the archive. Hence preservation systems need to manage multiple versions of ‘digital objects’, and the relationships between them.

Some of the preservation systems we’ve looked at claim to ‘do versioning’ but it’s a bit clunky – just side-by-side copies of immutable ‘digital objects’, not records of the changes from one version to the next, and with no concept of branching siblings from a common parent. Complex structures of interdependent objects are generally problematic for current systems. The wider computing world has been pushing at the limits of the ‘paper-paradigm’ immutable object for a while now (think Git, Blockchain, various version control and dependency management platforms, etc). Digital preservation systems will soon catch up.

Further blurring of the object / metadata boundary

What’s more important, the object or the metadata? The ‘paper-paradigm’ has skewed thinking towards the former (the sacrosanct ‘digital object’, comparable to the ‘original bit of paper’), but after you’ve digitised your rare book collection, what are Humanities scholars going to text-mine? It won’t be images of pages – it’ll be the transcripts of those (i.e. the ‘descriptive metadata’)*. Also, when seminal papers about these text mining efforts are published, how is this history of the engagement with your collection going to be recorded? Using a series of PREMIS Events (that future scholars can mine in turn), perhaps?

The above talk of text mining and contextual linking of secondary resources raises two more points…

* While I’m here, can I take issue with the term ‘descriptive metadata’? All metadata is descriptive. It’s tautological; like saying ‘uptight Englishman’. Can we think of a better name?

Ability to analyse metadata at scale

‘Delivery’ no longer just means ‘giving users a viewer to look at things one-by-one with’ – it now also means ‘letting people push their Natural Language or image processing algorithms to where the data sits, and then coping with vast streams of output data’.

Storage / retention informed by well-understood usage patterns

The fact that everything’s digital, and hence easier to disseminate and link together than physical objects, also means better understanding how people use our material. This doesn’t just mean ‘wiring things up to Google Analytics’ – advances in bibliometrics that add social / mainstream media analysis, and so forth, to everyday citation counts present opportunities to judge the impact of our ‘stuff’ on the world like never before. Smart digital archives will inform their storage management and retention decisions with this sort of usage information, potentially in fully or semi-automated ways.

Ability to get data out, cleanly – all systems are only ever temporary!

Finally – it’s clear that there are no ‘long-term’ preservation system options. The system you procure today will merely be ‘custodian’ of your materials for the next ten or twenty years (if you’re lucky). This may mean moving heaps of content around in future, but perhaps it’s more pragmatic to think of future preservation systems as more like ‘lenses’ that are laid on top of more stable data stores to enable as-yet-undreamt-of functionality for future audiences?

(OK – that’s enough for now…)

DPOC: 1 year on

Oxford’s Outreach & Training Fellow, Sarah, reflects on how the first year of the DPOC project has gone and looks forward to the big year ahead.


A lot can happen in a year.

A project can finally get a name, a website can launch and a year of auditing can finally reach completion. It has been a long year of lessons and finding things for the Oxford DPOC team.

While project DR@CO and PADLOC never got off the ground, we got the DPOC Project. And with it has come a better understanding of our digital preservation practices at Bodleian Libraries. We’re starting year two with plenty of informed ideas that will lead to roadmaps for implementation and a business case to help continue to move Oxford forward with a digital preservation programme.

Auditing our collections

For the past year, Fellows have been auditing the many collections. The Policy and Planning Fellow spent nearly 6 months tracking down the digitized content of Bodleian Libraries across tape storage and many legacy websites. There was more to be found on hard drives under desks, on network drives and CDs. What Edith found was 20 years of digitized images at Bodleian Libraries. From that came a roadmap and recommendations to improve storage, access and workflows. Changes have already been made to the digitization workflow (we use jpylyzer now instead of jhove) and more changes are in progress.

James, the Technical Fellow at Oxford, has been looking at validating and characterising the TIFFs we have stored on tape, especially the half a million TIFFs from the Polonsky Foundation Digitization Project. There were not only some challenges to recovering the files from tape to disk for the characterisation and validating process, but there was issue with customising the output from JHOVE in XML. James did find a workaround to getting the outputs into a reporting tool for assessment in the end, but not without plenty of trial and error. However, we’re learning more about our digitized collections (and the preservation challenges facing them) and during year 2 we’ll be writing more about that as we continue to roadmap our future digital preservation work.

Auditing our skills

I spoke to a lot of staff and ran an online survey to understand the training needs of Bodleian Libraries. It is clear that we need to develop a strong awareness about digital preservation and its fundamental importance to the long-term accessibility of our digital collections. We also need to create a strong shared language in order to have these important discussions; this is important when we are coming together from several different disciplines, each with a different language. As a result, some training has begun in order to get staff thinking about the risks surrounding the digital content we use every day, in order to later translate it into our collections. The training and skills gaps identified from the surveys done in year 1 will continue to inform the training work coming in year 2.

 

What is planned for year 2?

Now that we have a clearer picture of where we are and what challenges are facing us, we’ve been putting together roadmaps and risk registers. This is allowing us to look at what implementation work we can do in the next year to set us up for the work of the next 3, 5, 10, and 15 years. There are technical implementations we have placed into a roadmap to address the major risks highlighted in our risk register. This work is hopefully going to include things like implementing PREMIS metadata and file format validation. This work will prepare us for future preservation planning.

We also have a training programme roadmap and implementation timeline. While not all of the training can be completed in year 2 of the DPOC project, a start can be made and materials prepared for a future training programme. This includes developing a training roadmap to support the technical implementations roadmap and the overall digital preservation roadmap.

There is also the first draft of our digital preservation policy to workshop with key stakeholders and develop into a final draft. There are roles and responsibilities to review and key stakeholders to work with if we want to make sustainable changes to our existing workflows.

Ultimately, what we are working towards is an organisational change. We want more people to think about digital preservation in their work. We are putting forward sustainable recommendations to help develop an ongoing digital preservation programme. There is still a lot a work ahead of us — well beyond the final year of this project — but we are hoping that what we have started will keep going even after the project reaches completion.