Breaking through with Library Carpentry

Thursday 11th January saw the Cambridge University Library’s annual conference take place. This year, it was entitled ‘Breakthrough the Library’, and focused on cutting-edge innovation in libraries and archives. I can honestly say that this was the first ever conference I’ve been to where every single speaker I saw (including the ten or so who gave lightning talks) were absolutely excellent.

So it’s hard to pick the one that made the most impression. Of course, an honourable mention must go to the talk about Jasper the three legged cat, but if I had to plump for the one that was most pertinent to moving Digital Preservation forward, I’d have picked “Library Carpentry: software and data skills for librarian professionals”, from Dr James Baker of the University of Sussex.

I’d heard of the term ‘Library Carpentry’ (and the initiatives it stems from – Software Carpentry and Data Carpentry) and thus had an idea what the talk was about on the way in. Their web presence explains things far better than I can, too (see https://librarycarpentry.github.io/), so I’m going to skip the exposition and make a different point…

As a full-blown, time-served nerd who’s clearly been embittered by 20 years in the IT profession (though I’m pleased to report, not as much as most of my long-term friends and colleagues!), I went into the talk with a bit of a pessimistic outlook. This was because, in my experience, there are three stages one passes through when learning IT skills:

  • Stage 1: I know nothing. This computer is a bit weird and confuses me.
  • Stage 2: I know EVERYTHING. I can make this computer sing and dance, and now I have the power to conquer the world.
  • Stage 3: … er – hang on… The computer might not have been doing exactly what I thought it was, after all… Ooops! What did I just do?

Stage 1 is just something you get through (if you want – I have nothing but respect for happy Stage 1 dwellers, though). If so inclined, all it really takes is a bit of persistence and a dollop of enthusiasm to get through it. If you want to but think you might struggle, then have a go at this computer programming aptitude test from the University of Kent – you may be pleasantly surprised… In my own case, I got stuck there for quite a while until one day a whole pile of O Level algebra that was lurking in my brain suddenly rose out of the murk, and that was that.

Stage 2 people, on the other hand, tend to be really dangerous… I have personally worked with quite a few well-paid developers who are stuck in Stage 2, and they tend to be the ones who drop all the bombs on your system. So the faster you can get through to Stage 3, the better. This was at the root of my concern, as one of the ideas of Library Carpentry is to pick up skills quick, and then pass them on. But I needn’t have worried because…

When I asked Dr Baker about this issue, he reassured me that ‘questioning whether the computer has done what you expected’ is a core learning point that is central to Library Carpentry, too. He also declared the following (which I’m going to steal): “I make a point of only ever working with people with Impostor Syndrome”.

Hence it really does look as if getting to Stage 3 without even going through Stage 2 at all is what Library Carpentry is all about. I believe moves are afoot to get some of this good stuff going at Cambridge… I watch with interest and might even be able to find the time to join in..? I bet it’ll be fun.

Using ePADD with Josh Schneider

Edith, Policy and Planning Fellow at Bodleian Libraries, writes about her favourite features in ePADD (an open source software for email archives) and about how the tool aligns with digital preservation workflows.


At iPres a few weeks ago I had the pleasure of attending an ePadd workshop ran by Josh Schneider from Stanford University Libraries. The workshop was for me one of the major highlights of the conference, as I have been keen to try out ePADD since first hearing about it at DPC’s Email Preservation Day. I wrote a blog about the event back in July, and have now finally taken the time to review ePADD using my own email archive.

ePADD is primarily for appraisal and delivery, rather than a digital preservation tool. However, as a potential component in ingest workflows to an institutional repository, ensuring that email content retains integrity during processing in ePADD is paramount. The creators behind ePADD are therefore thinking about how to enhance current features to make the tool fit better into digital preservation workflows. I will discuss these features later in the blog, but first I wanted to show some of the capabilities of ePADD. I can definitely recommend having a play with this tool yourself as it is very addictive!

ePADD: Appraisal module dashboard

Josh, our lovely workshop leader, recommends that new ePADD users go home and try it on their own email collections. As you know your own material fairly well it is a good way of learning about both what ePADD does well and its limits. So I decided to feed in my work emails from the past year into ePADD – and found some interesting trends about my own working patterns.

ePADD consists of four modules, although I will only be showing features from the first two in this blog:

Module 1: Appraisal (Module used by donors for annotation and sensitivity review of emails before delivering them to the archive)

Module 2: Processing (A module with some enhanced appraisal features used by archivist to find additional sensitive information which may have been missed in the first round of appraisal)

Module 3: Discovery (A module which provides users with limited key word searching for entities in the email archive)

Module 4: Delivery (This module provides more enhanced viewing of the content of the email archive – including a gallery for viewing images and other document attachments)

Note that ePADD only support MBOX files, so if you are an Outlook user like myself you will need to first convert from PST to MBOX. After you have created an MBOX file, setting up ePADD is fairly simple and quick. Once the first ePADD module (“Appraisal”) was up and running, processing my 1,500 emails and 450 attachments took around four minutes. This time includes time for natural language processing. ePADD recognises and indexes various “entities” – including persons, places and events – and presents these in a digestible way.

ePADD: Appraisal module processing MBOX file

Looking at the entities recognised by ePADD, I was able to see who I have been speaking with/about during the past year. There were some not so surprising figures that popped up (such as my DPOC colleagues James Mooney and Dave Gerrard). However, curiously I seem to also have received a lot of messages about the “black spider” this year (turns out they were emails from the Libraries’ Dungeons and Dragons group).

ePADD entity type: Person (some details removed)

An example of why you need to look deeper at the results of natural language processing was evident when I looked under the “place entities” list in ePADD:

ePADD entity type: Place

San Francisco comes highest up on the list of mentioned places in my inbox. I was initially quite surprised by this result. Looking a bit closer, all 126 emails containing a mention of San Francisco turned out to be from “Slack”.  Slack is an instant messaging service used by the DPOC team, which has its headquarters in San Francisco. All email digests from Slack contains the head office address!

Another one of my favourite things about ePADD is its ability to track frequency of messages between email accounts. Below is a graph showing correspondence between myself and Sarah Mason (outreach and training fellow on the DPOC project). The graph shows that our peak period of emailing each other was during the PASIG conference, which DPOC hosted in Oxford at the start of September this year. It is easy to imagine how this feature could be useful to academics using email archives to research correspondence between particular individuals.

ePADD displaying correspondence frequency over time between two users

The last feature I wanted to talk about is “sensitivity review” in ePADD. Although I annotate personal data I receive, I thought that the one year mark of the DPOC project would also be a good time to run a second sensitivity review of my own email archive. Using ePADD’s “lexicon hits search” I was able to sift through a number of potentially sensitive emails. See image below for categories identified which cover everything from employment to health. These were all false positives in the end, but it is a feature I believe I will make use of again.

ePADD processing module: Lexicon hits for sensitive data

So now on to the Digital Preservation bit. There are currently three risks of using ePADD in terms of preservation which stands out to me.

1) For practical reasons, MBOX is currently the only email format option supported by ePADD. If MBOX is not the preferred preservation format of an archive it may end up running multiple migrations between email formats resulting in progressive loss of data

2) There are no checksums being generated when you download content from an ePADD module in order to copy it onto the next one. This could be an  issue as emails are copied multiple times without monitoring of the integrity of the email archive files occurring

3) There is currently limited support for assigning multiple identifiers to archives in ePADD. This could potentially become an issue when trying to aggregate email archives from different intuitions. Local identifiers could in this scenario clash and other additional unique identifiers would then also be required

Note however that these concerns are already on the ePADD roadmap, so they are likely to improve or even be solved within the next year.

To watch out for ePADD updates, or just have a play with your own email archive (it is loads of fun!), check out their:

Putting ‘stuff’ in ‘context’: deep thoughts triggered by PASIG 2017

Cambridge Technical Fellow, Dave, delves a bit deeper into what PASIG 2017 talks really got him thinking further about digital preservation and the complexity of it.


After a year of studying digital preservation, my thoughts are starting to coalesce, and the presentations at PASIG 2017 certainly helped that. (I’ve already discussed what I thought were the most important talks, so the ones below some that stimulated me about preservation in particular)…

The one that matched my current thoughts on digital preservation generally was John Sheridan’s Creating and sustaining a disruptive digital archive. It was similar to another previous blog post, and to chats with fellow Fellow Lee too (some of which he’s captured in a blog post for the Digital Preservation Coalition)… I.e.: computing’s ‘paper paradigm’ makes little sense in relation to preservation, hierarchical / neat information structures don’t hold together as well digitally, we’re going to need to compute across the whole archive, and, well, ‘digital objects’ just aren’t really material ‘objects’, are they?

An issue with thinking about digital ‘stuff’ too much in terms of tangible objects is that opportunities related to the fact the ‘stuff’ is digital can be missed. Matt Zumwalt highlighted one such opportunity in Data together: Communities & institutions using decentralized technologies to make a better web when he introduced ‘content addressing’: using cryptographic hashing and Directed Acyclic Graphs (in this case, information networks that record content changing as time progresses) to manage many copies of ‘stuff’ robustly.

This addresses some of the complexities of preserving digital ‘stuff’, but perhaps thinking in terms of ‘copies’, and not ‘branches’ or ‘forks’ is an over simplification? Precisely because digital ‘stuff’ is rarely static, all ‘copies’ have the potential to deviate from the ‘parent’ or ‘master’ copy. What’s the ‘version of true record’ in all this? Perhaps there isn’t one? Matt referred to ‘immutable data structures’, but the concept of ‘immutability’ only really holds if we think it’s possible for data to ever be completely separated from its informational context, because the information does change, constantly. (Hold that thought).

Switching topics, fellow Polonsky Somaya often tries to warn me just how complicated working with technical metadata can get. Well, the pennies dropped further during Managing digital preservation metadata at Sound and Vision: A case on matching OAIS and PREMIS with the DPX file format from Annemieke De Jong and Josefien Schuurman. Space precludes going into the same level of detail they did regarding building a Preservation Metadata Dictionary (PMD) about just one, ‘relatively’ simple file format – but let’s say, well, it’s really complicated. (They’ve blogged about it and the whole PMD is online too). The conclusion: preserving files properly means drilling down deep into their formats, but it also got me thinking – shouldn’t the essence of a ‘preservation file format’ be its simplicity?

The need for greater simplicity in preservation was further emphasised by Mathieu Giannecchini’s The Eclair Archive cinema heritage use case: Rising to the challenges of complex formats at large scale. Again – space precludes me from getting into detail, but the key takeaway was that Mathieu has 2 million reels of film to preserve using the Digital Cinema Distribution Master (DCDM) format, and after lots of good work, he’s optimised the process to preserve 8tb a day, (with a target of 15tb). Now, we don’t know how much film is on each reel, but assuming a (likely over-) estimate of 10 minutes per reel, that’s roughly 180,000 films of 1 hour 50 mins in length. Based on Mathieu’s own figures, it’s going to take many decades, perhaps even a few hundred years, to get through all 2 million reels… So further, major optimisations are required, and I suspect DCDM (a format with a 155-page spec, which relies on TIFF, a format with a 122-page spec) might be one of the bottlenecks.

Of course, the trade-off with simplifying formats is that data will likely be ‘decontextualised’, so there must be a robust method for linking data back to context… Thoughts on this were triggered by Developing and applying principles for discovery and access for the UK Data Service by Katherine McNeill from the UK Data Archive, as Katherine discussed production of a next-generation access system based on a linked-data model with which, theoretically, single cells’ worth of data could be retrieved from research datasets.

Again – space precludes entering into the whole debate around the process of re-using data stripped of original context… Mauthner and Parry illustrate the two contrary sides well, and furthermore argue that merely entertaining the possibility of decontextualising data indicates a certain ‘foundational’ way of thinking that might be invalid from the start? This is where I link to William Kilbride’s excellent DPC blog post from a few months ago

William’s PASIG talk Sustainable digital futures was also one of two that got closer to what we know are the root of the preservation problem; economics. The other was Aging of digital: Managed services for digital continuity by Natasa Milic-Frayling, which flagged-up the current “imbalance in control and empowerment” between tech providers and content producers / owners / curators, an imbalance that means tech firms can effectively doom our digital ‘stuff’ into obsolescence, and we have to suck it up.

I think this imbalance in part exists because there’s too much technical context related to data, because it’s generally in the tech providers’ interests to bloat data formats to match the USPs of their software. So, is a pure ‘preservation format’ one in which the technical context of the data is generalised to the point where all that’s left is commonly-understood mathematics? Is that even possible? Do we really need 122-page specs to explain how raster image data is stored? (It’s just an N-dimensional array of pixel values…, isn’t it…?) I think perhaps we don’t need all the complexity – at the data storage level at least. Though I’m only guessing at this stage: much more research required.

PASIG 2017: honest reflections from a trainee digital archivist

A guest blog post by Kelly, one of the Bodleian Libraries’ graduate digital archivist trainees, on what she learned as a volunteer and attendee of PASIG 2017 Oxford.


Amongst the digital preservation professionals from almost every continent and 130 institutions, myself and my 5 traineeship colleagues were amongst the lecture theatre seats, annexe demos and the awesome artefacts at the Museum of Natural History for PASIG 2017, Oxford. It was a brilliant opportunity at just 6 months into our traineeship to not only apply some of our new knowledge to work at Special Collections, Bodleian Libraries, but we were also able to gain a really current and relevant insight to theories we have been studying as part of our long distance MSc in Digital Curation at Aberystwyth University. The first ‘Bootcamp’ day was exactly what I needed to throw myself in, and it really consolidated my confidence in my understanding of some aspects of the shared language that is used amongst the profession (fixity checks, maturity models…as well as getting to grips with submission information packages, dissemination information packages and everything that occurs in between!).

My pen didn’t stop scribbling all three days, except maybe for tea breaks. Saying that, the demo presentations were also a great time for myself and other trainees to ask questions specifically about workflows and benefits of certain software such as LibNova, Preservica and ResourceSpace.

For want of a better word (and because it really is the truth) PASIG 2017 was genuinely inspiring and there were messages delivered so powerfully I hope that I stay grounded in these for my entire career. Here is what I was taught:

The Community is invaluable. Many of the speakers were quick to assert that sharing practice amongst the digital preservation community is key. This is a value I was familiar with, yet witnessing it happening throughout the conference in such a sincere manner. I can assure you the gratitude and affirmation that followed Eduardo del Valle, University of the Balearic Islands and his presentation: “Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation” was as encouraging to witness as someone new to the profession as it was to all of the other experienced delegates present. As well as sharing practice, it was clear that the community need to be advocating on behalf of each other. It is time and resource consuming but oh-so important.

Digital archives are preserving historical truths. Yes, the majority of the workflow is technological but the objectives and functions are so much more than technology; to just reduce digital preservation down to this is an oversimplification. It was so clear that the range of use cases presented at PASIG were all driven towards documenting social, political, historical information (and preserving that documentation) that will be of absolute necessity for society and infrastructure in future. Right now, for example, Angeline Takewara and her colleagues at UN MICT are working on a digital preservation programme to ensure absolute accountability and usability of the records of the International Criminal Tribunals of both Rwanda and Yugoslavia. I have written a more specific post on Angeline’s presentation here.

Due to the nature of technology and the digital world, the goalposts will always be moving. For example, Somaya Langley’s talk on the future of digital preservation and the mysteries of extracting data from smart devices will soon become (and maybe already is) a reality for those working with accessions of archives or information management. We should, then, embrace change and embrace the unsure and ultimately ‘get over the need for tidiness’ as pointed out by John Sheridan from The National Archives during his presentation “Creating and sustaining a disruptive digital archive” . This is usually counter-intuitive, but as the saying goes, one of the most dangerous phrases to use is ‘we’ve always done it that way’.

The value of digital material outlives the software, so the enabling of prolonged use of software is a real and current issue. Admittedly, this was a factor I had genuinely not even considered before. In my brain I linked obsolescence with hardware and hardware only. Therefore,  Dr. Natasa Milic-Frayling’s presentation on “Aging of Digital: Managed Services for digital continuity” shed much light on the changing computing ecosystem and the gradual aging of software. What I found especially interesting about the proposed software-continuity plan was the transparency of it; the fact that the client can ask to see the software at any time whilst it is being stabilised and maintained.

Thank you so much PASIG 2017 and everybody involved!

One last thing…in closing, Cliff Lynch, CNI, bought up that there was comparably less Web Archiving content this year. If anybody fancies taking a trainee to Mexico next year to do a (lightning) talk on Bodleian Libraries’ Web Archive I am keen…

 

 

Computers are the apogee of profligacy: a response to THE most important PASIG 2017 presentations

Following the PASIG conference, Cambridge Technical Fellow Dave Gerrard couldn’t simply wait to fire off his thoughts on the global context of digital preservation and how we need to better consider the world around us to work on a global solution and not just one that suits capitalist agenda. We usually preface these blogs with “enjoy” but in this instance, please, find a quiet moment, make yourself comfortable, read on and contemplate the global issues presented passionately presented here.


I’m going to work on a more technical blog about PASIG later, but first I want to get this one off my chest. It’s about the two most important presentations: Angeline Takawira’s Digital preservation at the United Nations Mechanism for International Criminal Tribunals and Keep your eyes on the information, Patricia Sleeman’s discussion of preservation work at the UN Refugee Agency (UNHCR).

Angeline Takawira described, in a very precise and formal manner, how the current best practice in Digital Preservation is being meticulously applied to preserving information from UN war crimes tribunals in The Hague (covering the Balkan conflict) and Arusha, Tanzania (covering the Rwandan genocide). As befitted her work, it was striking how calm Angeline was; how well the facts were stuck to, despite the emotive context. Of course, this has to be the case for work underpinning legal processes: intrusion of emotion into the capture of facts could let those trying to avoid justice escape it.

And the importance of maintaining a dispassionate outlook was echoed in the title of the other talk. “Keep your eyes on the information” was what Patricia Sleeman was told when learning to work with the UNHCR, as to engage too emotionally with the refugee crisis could make vital work impossible to perform. However, Patricia provided some context, in part by playing Head Over Heels, (Emi Mahmoud’s poem about the conflict and refugee crisis in Darfur), and by describing the brave, inspirational people she had met in Syria and Kurdistan. An emotionless response was impossible: the talk resulted in the conference’s longest and loudest applause.

Indeed, I think the audience was so stunned by Patricia’s words that questions were hard to formulate. However, my colleague Somaya at least asked the $64,000 one: how can we help? I’d like to tie this question back to one that Patricia raised in her talk, namely (and I paraphrase here): how do you justify expenditure on tasks like preservation when doing so takes food from the mouths of refugees?

So, now I’m less stunned, here’s my take: feeding refugees solves a symptom of the problem. Telling their stories helps to solve the problem, by making us engage our emotions, and think about how our lives are related to theirs, and about how we behave impacts upon them. And how can we help? Sure, we can help Patricia with her data management and preservation problems. But how can we really contribute to a solution? How can we stop refugee crises occurring in the first place?

We have a responsibility to recognise the connections between our own behaviour and the circumstances refugees find themselves in, and it all comes down, of course, to resources, and the profligate waste of them in the developed world. Indeed, Angeline and Patricia’s talks illustrated the borderline absurdity of a bunch of (mostly) privileged ‘Westerners’ / ‘Northerners’ (take your pick) talking about the ‘preservation’ of anything, when we’re products of a society that’s based upon throwing everything away.

And computers / all things ‘digital’ are at the apogee of this profligacy: Natasa Milic-Frayling highlighted this when she (diplomatically) referred to the way in which the ‘innovators’ hold all the cards, currently, in the relationship with ‘content producers’, and can hence render the technologies upon which we depend obsolete across ever-shorter cycles. Though, after Patricia’s talk, I’m inclined to frame this more in terms of ‘capitalist industrialists generating unnecessary markets at the expense of consumers’; particularly given that, while we were listening to Patricia, the latest iPhone was being launched in the US.

Though, of course, it’s not really the ‘poor consumers’ who genuinely suffer due to planned obsolescence… That would be the people in Africa and the Middle East whose countries are war zones due to grabs for oil or droughts caused by global warming. As the world’s most advanced tech companies, Apple, Google, Facebook, Amazon, Microsoft et al are the biggest players in a society that – at best indirectly, at worst carelessly – causes the suffering of the people Patricia and Angeline are helping and providing justice for. And, as someone typing a blog post using a Macbook Pro that doesn’t even let me add a new battery – I’m clearly part of the problem, not the solution.

So – in answer to Somaya’s question: how can we help? Well, for a start, we can stop fetishising the iPhone and start bigging up Fairphone and Phonebloks. However, keeping the focus on Digital Preservation, we’ve got to be really careful that our efforts aren’t used to support an IT industry that’s currently profligate way beyond moral acceptability. So rather than assuming (as I did above) that all the ‘best-practice’ of digital preservation flows from the ‘developed’ (ahem) world to the ‘developing’, we ought to seek some lessons in how to preserve technology from those who have fewer opportunities to waste it.

Somaya’s already on the case with her upcoming panel at iPres on the 28th September: Then we ought to continue down the road of holding PASIG in Mexico City next year by holding one in Africa as soon as possible. As long as – when we’re there, we make sure we shut up and listen.

PASIG 2017 Twitter round-up

After many months of planning it feels quite strange to us that PASIG 2017 is over. Hosting the PASIG conference in Oxford has been a valuable experience for the DPOC fellows and a great chance for Bodleian Libraries’ staff to meet with and listen to presentations by digital preservation experts from around the world.

In the end 244 conference delegates made their way to Oxford and the Museum of Natural History. The delegates came from 130 different institutions and every continent of the world was represented (…well, apart from Antarctica).

What was especially exciting though were all the new faces. In fact 2/3 of the delegates this year had not been to a PASIG conference before! Is this perhaps a sign that interest in digital preservation is on the rise?

As always at PASIG, Twitter was ablaze with discussion in spite of an at times flaky Wifi connection. Over three days #PASIG17 was mentioned a whopping 5300 times on Twitter and had a “reach” of 1.7 million. Well done everyone on some stellar outreach! Most active Twittering came from the UK, USA and Austria.

Twitter activity by country using #PASIG17 (Talkwalker statistics)

Although it is hard to choose favourites among all the Tweets, a few of the DPOC project’s personal highlights included:

Cambridge Fellow Lee Pretlove lists “digital preservation skills” and why we cannot be an expert in all areas. Tweet by Julian M. Morley

Bodleian Fellow James makes some insightful observations about the incompatibility between tar pits and digital preservation.

Cambridge Fellow Somaya Langley presents in the last PASIG session on the topic of “The Future of Digital Preservation”.  

What were some of your favourite talks and Twitter conversations? What would you like to see more of at PASIG 2018? #futurePASIG

C4RR – Containers for Reproducible Research Conference

James shares his thoughts after attending the C4RR Containers for Reproducible Research Conference at the University of Cambridge (27 – 28 June).


At the end of June both Dave and I, the Technical Fellows, attended the C4RR conference/workshop hosted by The Software Sustainability Institute in Cambridge. This event brought together researchers, developers and educators to explore best practices when using containers and the future of research software with containers.

Containers, specially Docker and Singularity are the ‘in’ thing at the moment and it was interesting to hear from a variety of research projects who are using them for reproducible research.

Containers are another form of server virtualisation but are lighter than a virtual machine. Containers and virtual machines have similar resource isolation and allocation benefits, but function differently because containers virtualize the operating system instead of hardware; containers are more portable and efficient.

 
Comparison of VM vs Container (Images from docker.com)

Researchers described how they were using Docker, one of the container implementations, to package the software used in their research so they could easily reproduce their computational environment across several different platforms (desktop, server and cluster). Others were using Singularity, another container technology, when implementing containers on a HPC (High-Performance Computing) Cluster due to restrictions of the Docker requirements for root access. It was clear from the talks, there is a rapid development of these technologies and ever increasing complexity of the computing environments involved, which does make me worry how these might be preserved.

Near the end of the second day, Dave and I gave a 20 minute presentation to encourage the audience to think more about preservation. As the audience were all evangelists for container technology it made sense to try to tap into them to promote building preservation into their projects.


Image By Raniere Silva

One aim was to get people to think about their research after the project was over. There is often a lack of motivation to think about how others might reproduce the work, whether that’s six months into the future let alone 15+ years from now.

Another area we briefly covered was relating to depositing research data. We use DROID to scan our repositories to identify file formats which relies on the technical registry of PRONOM. We put out a plea to the audience to ask for help with creating new file signatures for unknown file formats.

I had some great conversations with others over the two days and my main takeaway from the event was that we should look to attend more non-preservation specific conferences with a view to promote preservation in other computer-related areas of study.

Our slides from the event have been posted by The Software Sustainability Institute via Google.

DPASSH: Getting close to producers, consumers and digital preservation

Sarah shares her thoughts after attending the DPASSH (Digital Preservation in the Arts, Social Sciences and Humanities) Conference at the University of Sussex (14 – 15 June).


DPASSH is a conference that the Digital Repository Ireland (DRI) puts on with a host organisation. This year, it was hosted by the Sussex Humanities Lab at the University of Sussex, Brighton. What is exciting about this digital preservation conference is that it brings together creators (producers) and users (consumers) with digital preservation experts. Most digital preservation conferences end up being a bit of an echo chamber, full of practitioners and vendors only. But what about the creators and the users? What knowledge can we share? What can we learn?

DPASSH is a small conference, but it was an opportunity to see what researchers are creating and how they are engaging with digital collections. For example in Stefania Forlini’s talk, she discussed the perils of a content-centric digitisation process where unique print artefacts are all treated the same; the process flattens everything into identical objects though they are very different. What about the materials and the physicality of the object? It has stories to tell as well.

To Forlini, books span several domains of sensory experience and our digitised collections should reflect that. With the Gibson Project, Forlini and project researchers are trying to find ways to bring some of those experiences back through the Speculative W@nderverse. They are currently experimenting with embossing different kinds of paper with a code that can be read by a computer. The computer can then bring up the science fiction pamphlets that are made of that specific material. Then a user can feel the physicality of the digitised item and then explore the text, themes and relationships to other items in the collection using generous interfaces. This combines a physical sensory experience with a digital experience.

For creators, the decision of what research to capture and preserve is sometimes difficult; often they lack the tools to capture the information. Other times, creators do not have the skills to perform proper archival selection. Athanasios Velios offered a tool solution for digital artists called Artivity. Artivity can capture the actions performed on a digital artwork in certain programs, like Photoshop or Illustrator. This allows the artist to record their creative process and gives future researchers the opportunity to study the creative process. Steph Taylor from CoSector suggested in her talk that creators are archivists now, because they are constantly appraising their digital collections and making selection decisions.  It is important that archivists and digital preservation practitioners empower creators to make good decisions around what should be kept for the long-term.

As a bonus to the conference, I was awarded with the ‘Best Tweet’ award by the DPC and DPASSH. It was a nice way to round out two good, informative days. I plan to purchase many books with my gift voucher!

I certainly hope they hold the conference next year, as I think it is important for researchers in the humanities, arts and social sciences to engage with digital preservation experts, archivists and librarians. There is a lot to learn from each other. How often do we get our creators and users in one room with us digital preservation nerds?

IDCC 2017 – data champions among us

Outreach and Training Fellow, Sarah, provides some insight into some of the themes from the recent IDCC conference in Edinburgh on the 21 – 22 February. The DPOC team also presented their first poster,”Parallel Auditing of the University of Cambridge and the University of Oxford’s Institutional Repositories,” which is available on the ‘Resource’ page.


Storm Doris waited to hit until after the main International Digital Curation Conference (IDCC) had ended, allowing for two days of great speakers. The conference focused on research data management (RDM) and sharing data. In Kevin Ashley’s wrap-up, he touched on data champions and the possibilities of data sharing as two of the many emerging themes from IDCC.

Getting researchers to commit to good data practice and then publish data for reuse is not easy. Many talks focused around training and engagement of researchers to improve their data management practice. Marta Teperek and Rosie Higman from Cambridge University Library (CUL) gave excellent talks on engaging their research community in RDM. Teperek found value in going to the community in a bottom-up, research led approach. It was time-intensive, but allowed the RDM team at CUL to understand the problems Cambridge researchers faced and address them. A top-down, policy driven approach was also used, but it has been a combination of the two that has been the most effective for CUL.

Higman went on to speak about the data champions initiative. Data champions were recruited from students, post-doctoral researchers, administrators and lecturers. What they had in common was their willingness to advocate for good RDM practices. Each of the 41 data champions was responsible for at least one training session year. While the data champions did not always do what the team expected, their advocacy for good RDM practice has been invaluable. Researchers need strong advocates to see the value in publishing their data – it is not just about complying with policy.

On day two, I heard from researcher and data champion Dr. Niamh Moore from University of Edinburgh. Moore finds that many researchers either think archiving their data is either a waste of time or are concerned about the future use of their data. As a data champion, she believes that research data is worth sharing and thinks other researchers should be asking,  ‘how can I make my data flourish?’. Moore uses Omeka to share her research data from her mid-90s project at the Clayoquot Sound peace camp called Clayoquot Lives. For Moore, benefits to sharing research data include:

  • using it as a teaching resource for undergraduates (getting them to play with data, which many do not have a chance to do);
  • public engagement impact (for Moore it was an opportunity to engage with the people previously interviewed at Clayoquot); and
  • new articles: creating new relationships and new research where she can reuse her own data in new ways or other academics can as well.

Opening up data and archiving leads to new possibilities. The closing keynote on day one discussed the possibilities of using data to improve the visitor experience for people at the British Museum. Data Scientist, Alice Daish, spoke of data as the unloved superhero. It can rescue organisations from questions and problems by providing answers, helping organisations make decisions, take actions and even provide more questions. For example, Daish has been able to wrangle and utilise data at the British Museum to learn about the most popular collection items on display (the Rosetta Stone came first!).

And Daish, like Teperek and Higman, touched on outreach as the only way to advocate for data – creating good data, sharing it, and using it to its fullest potential. And for the DPOC team, we welcome this advocacy; and we’d like to add to it and see that steps are also made to preserve this data.

Also, it was a great to talk about the work we have been doing and the next steps for the project—thanks to everyone who stopped by our poster!

Oxford Fellows (From left: Sarah, Edith, James) holding the DPOC poster out front of the appropriately named “Fellows Entrance” at the Royal College of Surgeons.

DPC Student Conference – What I Wish I Knew Before I Started

At the end of January, I went to the Chancellor’s Hall at the University of London’s Art Deco style Senate House. Near to the entrance of the Chancellor’s Hall was Room 101. Rumours circulated amongst the delegates keenly awaiting the start of the conference that the building and the room were the inspiration for George Orwell’s Nineteen Eighty-Four.

Instead of facing my deepest and darkest digital preservation fears in Senate House, I was keen to see and hear what the leading digital preservation trainers and invited speakers at different stages of their careers had to say. For the DPOC project, I wanted to see what types of information were included in introductory digital preservation training talks, to witness the styles of delivery and what types of questions the floor would raise to see if there were any obvious gaps in the delivery. For the day’s programme, presenters’ slides and Twitter Storify, may I recommend that you visit the DPC webpage for this event:

http://www.dpconline.org/events/past-events/wiwik-2017

The take away lesson from the day, is just do something, don’t be afraid to start. Sharon McMeekin showed us how much the DPC can help (see their new website, it’s chock full of digital preservation goodness) and Steph Taylor from CoSense showed us that you can achieve a lot in digital preservation just through keeping an eye on emerging technologies and that you spend most of your time advocating that digital preservation is not just backing up. Steph also reinforced to the student delegation that you can approach members of the digital preservation community, they are all very friendly!

From the afternoon session, Dave Thompson reminded those assembled that we also need to think about the information age that we live in, how people use information, how they are their own gatekeepers to their digital records and how recordkeepers need to react to these changes, which will require a change in thinking from traditional recordkeeping theory and practice. As Adrian Brown put it for digital archivists, “digital archivists are archivists with superpowers”. One of those superpowers is the ability to adapt to your working context and the technological environment. Digital preservation is a constantly changing field and the practitioner needs to be able to adapt and change to the environment around them in a chameleon like manner to get their institution’s work preserved. Jennifer Febles reminded us that is also OK to say that “you don’t know” when training people, you can go away and learn or even learn from other colleagues. As for the content of the day, there were no real gaps, the day programme was spot on as far as I could tell from the delegates.

Whilst reflecting on the event on the journey back on the train (and whilst simultaneously being packed into the stifling hot carriage like a sweaty sardine), the one thing that I really wanted to find out was what the backgrounds of the delegates were. More specifically, what ‘information schools’ they were attending, what courses they were undertaking, how much their modules concerned digital recordkeeping and their preservation, and, most importantly, what they are being taught in those modules.

My thoughts then drifted towards thinking of those who have been given the label of ‘digital preservation experts’. They have cut their digital preservation teeth after their formal qualifications and training in an ostensibly different subject. Through a judicious application and blending of discipline-specific learning, learning about related fields they then apply this learning to their specific working context. Increasingly, in the digital world, those from a recordkeeping background need to embrace computer science skills and applications, especially for those where coding and command line operation is not a skill they have been brought up with. We seem to be at a point where the leading digital preservation practitioners are plying their trade (as they should) and not teaching their trade in a formal education setup. A very select few are doing both but if we pulled practitioners into formal digital preservation education programmes, would we then drain the discipline of innovative practice? Should digital preservation skills (which DigCurV has done well to define) be better suited to one big ‘on the job’ learning programme rather than more formal programmes. A mix of both would be my suggestion but this discussion will never close.

Starting out in digital preservation may seem terribly daunting, with so much to learn as there is so much going on. I think that the ‘information schools’ can equip students with the early skills and knowledge but from then on, the experience and skills is learned on the job. The thing that makes the digital preservation community standout is that people are not afraid to share their knowledge and skills for the benefit of preserving cultural heritage for the future.