Digital Preservation at Oxford Open Days

Oxford Fellow, Sarah, describes the DPOC team’s pop-up exhibition “Saving Digital,” held at the Radcliffe Science Library during Oxford Open Days #OxOpenDay. The post describes from the equipment and games the team showcased over the two days and some of the goals they had in mind for this outreach work.


On 27 June and 28 June, Oxford ran Open Days for prospective students. The city was alive with open doors and plenty of activity. It was the perfect opportunity for us to take our roadshow kit out and meet with prospective students with a pop-up exhibition called “Saving Digital”. The Radcliffe Science Library (RSL) on Parks Road kindly hosted the DPOC team and all of our obsolete media for two day in their lounge area.

The pop-up exhibition hosted at the RSL

We set up our table with a few goals in mind:

  • to educate prospective students about the rapid pace of technology and the concern about how we’re going to read digital data off them in the future (we educated a few parents as well!)
  • to speak with library and university staff about their digital dilemmas and what we at the digital preservation team could do about it
  • to raise awareness about the urgency and need of digital preservation in all of our lives and to inform more people about our project (#DP0C)

To achieve this, we first drew people in with two things: retro gaming and free stuff.

Last minute marketing to get people to the display. It worked!

Our two main games were the handheld game, Galaxy Invader 1000, and Frak! for the BBC Micro.

Frak! on the BBC Micro. The yellow handheld console to the right is Galaxy Invader 1000.

Galaxy Invader 1000 by CGL (1980) is a handheld game, which plays a version of Space Invaders. This game features a large multi-coloured display and 3 levels of skill. The whole game was designed to fit in 2 kilobytes of memory. 

Frak! (1984) was a game released for the BBC Micro in 1984 under the Aardvark software label. It was praised for excellent graphics and game play. In the side scrolling game, you play a caveman named Trogg. The aim of the game is to cross a series of platforms while avoiding dangers that include various monsters named Poglet and Hooter. Trogg is armed with a yo-yo for defence. 

Second, we gave them some digestible facts, both in poster form and by talking with them:

Saving Digital poster

Third, we filled the rest of the table with obsolete media and handheld devices from about the last forty years—just a small sample of what was available! This let them hold some of the media of the past, marvel over how little it could hold, but how much it could do for the time. And then we asked them how would they read the data off it today. That probably concerned parents more than their kids as several of them admitted to having important digital stuff either still on VHS or miniDV tapes, or on 3.5-inch disks! It got everyone thinking at least.

A lot of obsolete media all in one place.

Lastly, an enthusiastic team with some branded t-shirts made to emulate our most popular 1st generation badge, which was pink with a 3.5-inch disk in the middle. We gave away our last one during Open Days! But don’t worry, we have some great 2nd generation badges to collect now.

An enthusiastic team always helps. Especially if they are willing to demo the equipment.


A huge thank you to the RSL for hosting us for two days—we’ll be back on the 16th of July if you missed us and want to visit the exhibition! We’ll have a few extra retro games on hand and some more obsolete storage media!

Our poster was found on display in the RSL.

Update on the training programme pilot

Sarah, Oxford’s Outreach and Training Fellow, has been busy since the new year designing and a running a digital preservation training programme pilot in Oxford. It consisted of one introductory course on digital preservation and six other workshops. Below is an update on what she did for the pilot and what she has learnt over the past few months.


It’s been a busy few months for me, so I have been quiet on the blog. Most of my time and creative energy has been spent working on this training programme pilot. In total, there were seven courses and over 15 hours of material. In the end, I trialled the courses on over 157 people from Bodleian Libraries and the various Oxford college libraries and archives. Many attendees were repeats, but some were not.

The trial gave me an opportunity to test out different ideas and various topics. Attendees were good at giving feedback, both during the course and after via an online survey. It’s provided me with further ideas and given me the chance to see what works or what doesn’t. I’ve been able to improve the experience each time, but there’s still more work to be done. However, I’ve already learned a lot about digital preservation and teaching.

Below are some of the most important lessons I’ve learned from the training programme pilot.

Time: You always need more

I found that I almost always ran out of time at the end of a course; it left no time for questions or to finish that last demo. Most of my courses could have either benefited from less content, shorter exercises, or just being 30 minutes longer.

Based on feedback from attendees, I’ll be making adjustments to every course. Some will be longer. Some will have shorter exercises with more optional components and some will have slightly less content.

While you might budget 20 minutes for an activity, you will likely use 5-10 minutes more. But it varies every time due to the attendees. Some might have a lot of questions, but others will be quieter. It’s almost better to overestimate the time and end early than rush to cover everythhing. People need a chance to process the information you give them.

Facilitation: You can’t go it alone

In only one of my courses did I have to facilitate alone. I was run off my feet for the 2 hours because it was just me answering questions during  exercises for 15 attendees. It doesn’t sound like a lot, but I had a hoarse voice by the end from speaking for almost 2 hours!

Always get help with facilitation—especially for workshops. Someone to help:

  • answer questions during exercises,
  • get some of the group idea exercises/conversations started,
  • make extra photocopies or print outs, and
  • load programs and files onto computers—and then help delete them after.

It is possible to run training courses alone, but having an extra person makes things run smoother and saves a lot of time. Edith and James have been invaluable support!

Demos: Worth it, but things often go wrong

Demos were vital to illustrate concepts, but they were also sometimes clunky and time consuming to manage. I wrote up demo sheets to help. The demos relied on software or the Internet—both which can and will go wrong. Patience is key; so is accepting that sometimes things will not go right. Processes might take a long time to run or the course concludes before the demo is over.

The more you practice on the computer you will be using, the more likely things will go right. But that’s not always an option. If it isn’t, always have a back up plan. Or just apologise, explain what should have happened and move on. Attendees are generally forgiving and sometimes it can be turned into a really good teaching moment.

Exercises: Optional is the way to go

Unless you put out a questionnaire beforehand, it is incredibly hard to judge the skill level of your attendees. It’s best to prepare for all levels. Start each exercise slow and have a lot of optional work built in for people that work faster.

In most of my courses I was too ambitious for the time allowed. I wanted them to learn and try everything. Sometimes I wasn’t asking the right questions on the exercises either. Testing exercises and timing people is the only way to tailor them. Now that I have run the workshops and seen the exercises in action, I have a clearer picture of what I want people to learn and accomplish—now I just have to make the changes.

Future plans

There were courses I would love to run in the future (like data visualisation and digital forensics), but I did not have the time to develop. I’d like to place them on a roadmap for future training. As well as reaching out more to the Oxford colleges, museums and other departments. I would also like to tailor the introductory course a bit more for different audiences.

I’d like to get involved with developing courses like Digital Preservation Carpentry that the University of Melbourne is working on. The hands-on workshops excited and challenged me the most. Not only did others learn a lot, but so did I. I would like to build on that.

At the end of this pilot, I have seven courses that I will finalise and make available through a creative commons licence. What I learned when trying to develop these courses is that there isn’t always a lot of good templates available on the Internet to use as a starting point—you have to ask around for people willing to share.

So, I am hoping to take the work that I’ve done and share it with the digital preservation community. I hope they will be useful resources that can be reused and repurposed. Or at the very least, I hope it can be used as a starting point for inspiration (basic speakers notes included).

These will be available via the DPOC website sometime this summer, once I have been able to make the changes necessary to the slides and exercises—along with course guidance material. It has been a rewarding experience (as well as an exhausting one); I look forward to developing and delivering more digital preservation training in the future.

Digital preservation with limited resources

What should my digital preservation strategy be, if I do not have access to repository software or a DAMS system?

At Oxford, we recently received this question from a group of information professionals working for smaller archives. This will be a familiar scenario for many – purchasing and running repository software will require a regular dedicated budget, which many archives in the UK do not currently have available to them.

So what intermediate solutions could an archive put in place to better its chances of not losing digital collection content until such a time? This blog summarises some key points from meeting with the archivists, and we hope that these may be useful for other organisations who are asking the same question.


Protect yourself against human error

CC-BY KateMangoStar, Freepik

Human error is one of the major risks to digital content. It is not uncommon that users will inadvertently drag files/folders or delete content by mistake. It is therefore important to have strict user restrictions in place which limits who can delete, move, and edit digital collections. For this purpose you need to ensure that you have defined an “archives directory” which is separate from any “working directories” where users can still edit and actively work with content.

If you have IT support available to you, then speak to them about setting up new user restrictions.


Monitor fixity

CC-BY Dooder, Freepik

However, even with strong user restrictions in place, human error can occur. In addition to enforcing stronger user restrictions in the “archives directory”, tools like Fixity from AVP can be used to spot if content has been moved between folders, deleted, or edited. By running regular Fixity reports an archivist can spot any suspicious looking changes.

We are aware that time constraints are a major factor which inhibits staff from adding additional tasks to their workload, but luckily Fixity can be set to run automatically on a weekly basis, providing users with an email report at the end of the week.


Understand how your organisation does back-ups

CC-BY Shayne_ch13, Freepik

A common IT retention period for back-ups of desktop computers is 14 days. The two week period enables disaster recovery of working environments, to ensure that business can continue as usual. However, a 14 day back-up is not the same as preservation storage and it is not a suitable solution for archival collections.

In this scenario, where content is stored on a file system with no versioning, the archivist only has 14 days to spot any issues and retrieve an older back-up before it is too late. So please don’t go on holiday or get ill for long! Even with tools like Fixity, fourteen days is an unrealistic turn-around time (if the issue is at all spotted in the first place).

If possible, try and make the case to your organisation that you require more varied types of back-ups for the “archival directory”. These should include back-ups which are at least retained for a year. Using a mix of tape storage and/or cloud service providers can be a less expensive way of storing additional back-ups which do not require ongoing access. It is an investment which is worth making.

As a note of warning though – you are still only dealing in back-ups. This is not archival storage. If there are issues with multiple back-ups (due to for example transfer or hardware errors) you can still lose content. The longer term goal, once better back-ups are in place, should be to monitor the fixity of multiple copies of content from the “archival directory”. (For more information about the difference between back-ups used for regular IT purposes and storage for digital preservation see the DPC Handbook)


Check that your back-ups work
Once you have got additional copies of your collection content, remember to check that you can retrieve them again from storage.

Many organisations have been in the positions where they think they have backed up their content – only to find out that their back-ups have not been created properly when they need them. By testing retrieval you can protect your collections against this particular risk.


But… what do I do if my organisation does not do back-ups at all?
Although the 14 day back-up retention is common in many businesses, it is far from the reality which certain types of archives operate within. A small community organisation may for example do all its business on a laptop or workstation which is shared by all staff (including the archive).

This is a dangerous position to be in, as hardware failure can cause immediate and total loss. There is not a magic bullet for solving this issue, but some of the advice which Sarah (Training and Outreach Fellow at Bodleian Libraries) has provided in her Personal Digital Archiving Course could apply.

Considerations from Sarah’s course include:

  • Create back-ups on additional removable hard drive(s) and store them in a different geographical location from the main laptop/workstation
  • Make use of free cloud storage limits (do check the licenses though to see what you are agreeing to – it’s not where you would want to put your HR records!)
  • Again – remember to check your back-ups!
  • For digitized images and video, consider using the Internet Archive’s Gallery as an additional copy (note that this is open to the public, and requires assigning a CC-BY license)  (If you like the work that the Internet Archive does – you can donate to them here )
  • Apply batch-renaming tools to file names to ensure that they contain understandable metadata in case they are separated from their original folders

(Email us if you would like to get a copy of Sarah’s lecture slides with more information)


Document all of the above

CC-BY jcomp, Freepik

Make sure to write down all the decisions you have made regarding back-ups, monitoring, and other activities. This allows for succession planning and ensures that you have a paper trail in place.


Stronger in numbers

CC-BY, Kjpargeter, Freepik

Licenses, contracts and ongoing management is expensive. Another venue to consider is looking to peer organisations to lower some of these costs. This could include entering into joint contracts with tape storage providers, or consortium models for using repository software. An example of an initiative which has done this is the NEA (Network Electronic Archive) group which has been an established repository for over ten years supporting 28 small Danish archives.


Summary:
These are some of the considerations which may lower the risk of losing digital collections. Do you have any other ideas (or practical experience) of managing and preserving digital collections with limited resources, and without using a repository or DAMS system?

Closing the digitization gap

MS. Canon. Misc. 378, fol. 136r

Bodleian Digital Library’s Digitization Assistant, Tim, guest blogs about the treasures he finds while migrating and preparing complete, high-fidelity digitised items for Digital Bodleian. The Oxford DPOC Fellows feel lucky to sit across the office from the team that manages Digital Bodleian and so many of our amazing digitized collections.


We might spend most of our time on an industrial estate here at BDLSS, but we still get to do a bit of treasure-hunting now and then. Our kind has fewer forgotten ruins or charming wood-panelled reading rooms than we might like, admittedly – it’s more a rickety MySQL databases and arcane php scripts affair. But the rewards can be great. Recent rummages have turned up a Renaissance masterpiece, a metaphysical manuscript, and the legacy of a Polish queen.

Back in October, Emma wrote about our efforts to identify digital images held by the Bodleian which would make good candidates for Digital Bodleian, but for one reason or another haven’t yet made it onto the site. Since that post was published, we have been making good progress migrating images from our legacy websites, including the Oxford Digital Library and – coming soon to Digital Bodleian – our Luna collection of digitized slides. Many of the remaining undigitized images in our archive are unsuitable for the site, as they don’t constitute full image sets: we’re trying to keep Digital Bodleian a reserve for complete, high-fidelity digitized items, rather than a dumping-ground for fragmentary facsimiles. But among the millions of images are a few sets of fully-photographed books and manuscripts still waiting to be showcased to the public on our digital platform.


A recent Digital Bodleian addition: the Notitia Dignitatum, a hugely important Renaissance copy of a late-Roman administrative text (MS. Canon. Misc. 378).

Identifying these full-colour, complete image sets isn’t as easy as we’d like, thanks to some slightly creaky legacy databases, and the sheer volume of material versus limited staff time. An approach mentioned by Emma has, however, yielded some successes. Taking suggestions from our curators – and, more recently, our Twitter followers  – we’ve been able draw up a digitization wishlist, which also serves as a list of targets for when we go ferreting around in the archive. Most haven’t been fully photographed, but we’ve turned up a clutch of exciting items from these efforts.

Finding the images is only half the hunt, though. To present the digital facsimiles usefully, we need to give them some descriptive metadata. Digital Bodleian isn’t intended to be a catalogue, but we like to provide some information about an item where we have it, and make our digitized collections discoverable, as well as giving context for non-experts. But as with finding images, locating useful metadata isn’t always simple.

Most of the items on Digital Bodleian sit within the Bodleian’s Special Collections. Each object is unique, requiring the careful attention of an expert to be properly catalogued. For this reason, modern cataloguing efforts focus on subsets of the collections. For those not covered by these, often the only published descriptions (if any) are in 19th century surveys – which can be excellent, but can be terse, or no longer up-to-date. Other descriptions and scholarly analyses are spread around a variety of published and unpublished material, some of it available in a digital form, most of it not. This all presents a challenge when it comes to finding information to go along with items on Digital Bodleian: much as we’d like to be, Emma and I aren’t yet experts on the entirety of all the periods, areas and traditions represented in the Bodleian’s holdings.


Another item pulled from the Bodleian’s image archive: a finely decorated 16th-century Book of Hours (MS. Douce 112).

Happily, our colleagues responsible for curating these collections are engaged in constant, dogged efforts to make descriptions more accessible. Especially useful to those of us unable to pop into the Weston to rifle through printed finding aids are a set of TEI-based electronic catalogues*, developed in conjunction with BDLSS. These aim to provide systematically-structured digital catalogue entries for a variety of Western and Oriental Special Collections. They’re fantastic resources, but they represent ongoing cataloguing campaigns, rather than finished products. Nor do they cover all the Special Collections.

Our most valuable resource therefore remains the ever-patient curators themselves. They kindly help us track down information about the items we’re putting on Digital Bodleian from a sometimes-daunting array of potential sources, put us in touch with other experts where required, and are always ready to answer our questions when we need something clarified. This has been enormously helpful in providing descriptions for our new additions to the site.

With this assistance, and the help of our colleagues in the Imaging Studio, who provide similar expertise in tracking down the images, and try hard to squeeze in time to photograph items from the aforementioned wishlist, we’ve managed to get 25 new treasures onto Digital Bodleian since Emma’s post, on top of all the ongoing new photography and migration projects. This totals around 9,300 images altogether, and we have more items on the way (due soon are a couple of Mesoamerican codices and an Old Sundanese text printed on palm leaves from Java). Slowly, we’re closing the gap.

A selection of recent items we’ve dug up from our archives:

MS. Ashmole 304
MS. Ashmole 399
MS. Auct. D. inf. 2. 11
MS. Canon. Bibl. Lat. 61
MS. Canon. Misc. 213
MS. Canon. Misc. 378
MS. Douce 112
MS. Douce 134
MS. Douce 40
MS. Holkham misc. 49
MS. Lat. liturg. e. 17
MS. Lat. liturg. f. 2
MS. Laud Misc. 108
MS. Tanner 307

 

*Currently live are catalogues of medieval manuscripts, Hebrew manuscripts, Genizah fragments,  and union catalogues of Islamicate manuscripts and Shan Buddhist manuscripts in the United Kingdom. Catalogues of Georgian and Armenian manuscripts, to an older TEI standard, are still online and are currently undergoing conversion work. Similar, non-TEI-based resources for Incunables and some of our Chinese Special collections are also available.

Project update

A project update from Edith Halvarsson, Policy and Planning Fellow at Bodleian Libraries. 


Ms Arm.e.1, Folio 23v

Bodleian Libraries’ new digital preservation policy is now available to view on our website, after having been approved by Bodleian Libraries’ Round Table earlier this year.

The policy articulates Bodleian Libraries’ approach and commitment to digital preservation:

“Bodleian Libraries preserves its digital collections with the same level of commitment as it has preserved its physical collections over many centuries. Digital preservation is recognized as a core organizational function which is essential to Bodleian Libraries’ ability to support current and future research, teaching, and learning activities.”

 

Click here to read more of Bodleian Libraries’ policies and reports.

In other related news we are currently in the process of ratifying a GLAM (Gardens, Libraries and Museums) digital preservation strategy which is due for release after the summer. Our new digitization policy is also in the pipelines and will be made publicly available. Follow the DPOC blog for future updates.

Gathering the numbers: a maturity and resourcing survey for digital preservation

The ability to compare ourselves to peer institutions is key when arguing the case for digital preservation within our own organisations. However, finding up-to-date and correct information is not always straight forward.

The Digital Preservation at Oxford and Cambridge (DPOC) project has joined forces with the Digital Preservation Coalition (DPC) to gather some of the basic numbers that can assist staff in seeking to build a business case for digital preservation in their local institution.

We need your input to make this happen!

The DPOC and the DPC have developed a survey aimed at gathering basic data about maturity levels, staff resources, and the policy and strategy landscapes of institutions currently doing or considering digital preservation activities. (The survey intentionally does not include questions about the type or size of the data organisations are required to preserve.)

Completing the survey will only take 10-20 minutes of your time, and will help us better understand the current digital preservation landscape. The survey can be taken at: https://cambridge.eu.qualtrics.com/jfe/form/SV_brWr12R8hMwfIOh

Deadline for survey responses is: Thursday 31 May 2018.

For those wanting to know upfront what questions are asked in the survey – here is the full set of Survey Questions (PDF). Please keep in mind the survey is interactive and you may not see all of the questions when filling this in online (as the questions only appear in relation to your previous responses). Responses must be submitted through the online survey.

Anonymised data gathered as part of this maturity and resourcing survey will be made available via this DPOC website.

For any questions about the survey and its content, please contact: digitalpreservation@lib.cam.ac.uk

The Ethics of Working in Digital Preservation

Since joining the DPOC project in 2016, I have been espousing the need for holistic approaches to digital preservation. This has very much been about how skills development, policy, strategy, workflows and much more need to be included as part of a digital preservation offering. Digital preservation is never just about the tech. There is a concern I must raise: how we play nice together.

Since first drafting this post in October 2017, there have been several events I would be remiss not to mention. Ethics and how we conduct ourselves in professional contexts have been brought into the current social consciousness by the #metoo movement and the recent matter regarding Chris Bourg’s keynote at the Code4Lib conference.

Working Together

We know digital preservation can’t be done alone, and I believe the digital preservation community is well on the way to accepting this. One single person cannot hold all the information about every type of file, standard, operating system, disk file system, policy, carrier, hardware, peripheral, protocol, copyright, legislation as well as undertake advocacy, suitably negotiate with donors etc.

Dream Team – Library of Congress Digital Preservation Outreach and Education Training Materials

For each digital preservation activity, we need a ‘dream team’. This is a term Emma Jolley (Curator of Digital Archives, National Library of Australia) incorporated into the 2015 Library of Congress Digital Preservation Outreach & Education (DPOE) Train the Trainer education programme I took part in. This understanding of the needs of complementary skills, knowledge and approaches very much underpins the Polonsky Digital Preservation Project.

Step by Step, Hand in Hand

If I think back to my time working in digital preservation in the mid-2000s, it was a far more isolating experience than it is now. Remembering the challenges we were discussing back then, it doesn’t feel as if the field has progressed all that much. It may just be slow going. Or perhaps it’s fear of making a wrong decision?

As humans, we know we have the capacity to learn from mistakes. We’ve likely had someone tell us about the time they (temporarily or permanently) lost data. The short-term lifespan of media carriers, inter-dependencies between different components, changes to services where data may be stored ‘in the cloud’ and the limited availability of devices (hardware or software) to read and interpret the data means that digital content is fragile (for many reasons, not only technical) and is continually at risk.

There are enough lessons of data loss out there in the wider world that it is imperative we acknowledge these situations and learn from them. Nor should we have to face these kinds of stressful situations alone; it should be done step-by-step, hand-in-hand, supporting each other.

Acknowledging Failure

Over recent years, the international arts and cultural sector has begun to share examples of failures. While it is easy to share successes, it’s far harder to openly share information about failures. Failure in current western society is definitely not a desirable outcome. Yet we learn from failure. As a response to ‘ideas’ festivals and TED talks, events such as Failure Lab have been gaining momentum.

The need to share (in considered ways) about failures in digital preservation is somewhat new, however it’s not an entirely new concept. (The now infamous story of how parts of Toy Story 2 were deleted have helped illustrate the need for regularly checking backup functions.) More recently, at PASIG 2017, one of the most memorable presentations of the whole conference was Eduardo Del Valle’s Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation. I believe I speak for many of the PASIG conference attendees when I state how valuable a presentation this was.

In May 2017, the Digital Preservation Coalition ran possibly the most useful event I attended in all of 2017: Digital Preservationists Anonymous (aka Fail Club). We were able to share our war stories within the safety and security of Chatham House Rules and learn a lot from each other that will be able to take us forward in our work at our respective institutions. Hearing another organisation that is further ahead, inform us about the tricky things they’ve encountered helps us progress better, faster.

iPres 2017 and the Operational Pragmatism Panel

Yet there are other problematic issues within the field of digital preservation. It’s not always an easy field to work in; it doesn’t yet have the diversity it needs, nor necessarily respect the diversity of views already present.

Operational Pragmatism in Digital Preservation: Establishing context-aware minimum viable baselines was a panel session I facilitated at iPres 2017, held in September 2017 in Kyoto, Japan. The discussion was set out as a series of ‘provocations’ (developed collaboratively by the panellists) about different aspects digital preservation. (Future blog posts are yet to published about the topics and views presented during the panel discussion.) I had five experienced panellists representing a range of different countries they’ve worked in around the world (Canada, China, France, Kenya, the Netherlands, the UK and the USA) plus myself (originally from Australia). Another eight contributors (from Australia, Germany, New Zealand, the UK and the USA) also fed into forming the panel topic, panel member makeup or the provocations. Each panellist was allocated a couple of minutes to present their point of view in response to each provocation. Then the discussion was opened up to the wider audience. It was never going to be an easy panel. I was asking a lot of my panellists. They were each having to respond to one challenging question after another, providing a simple answer to each question (that could be used to inform decisions about what the ‘bare minimum’ work could be done for each digital preservation scenario). This was no small feat.

Rather than the traditional panel presentation, where only a series of experts get to speak, it was intended as a more inclusive discussion. The discussion was widened to include the audience in good faith, so that audience members could share openly throughout, if they wished. However, it became apparent that there were some other dynamics at play.

One Person Alone is Never Enough

Since I first commenced working in digital preservation in 2005, I have witnessed the passion and commitment to viewpoints that individuals within this field hold. I expected a lively discussion and healthy debate, potentially with opposing views between the panellists (who had been selected to represent different GLAM sectors, organisation sizes, nations, cultures, backgrounds and approaches to digital preservation).

As I was facilitating the panellists for this demanding session, I had organised an audience facilitator (someone well-established within the digital preservation community). Unfortunately, due to circumstances out of our control, this person was unable to be present (and an experienced replacement was unable to be found at short notice). This situation left my panellists open to criticism. One panellist was on the receiving end of a disproportionate amount of scrutiny from the audience. Despite attempts, as a lone facilitator, I was unable to defuse the situation. After the panel session finished, several audience members remarked that they didn’t feel comfortable participating in the discussion.

Facilitating a safe environment for both panellists and for the wider audience to debate topics they are passionate about is vitally important, yet this failed to occur in this instance. As a result, the panel were unable to summarise and present conclusions about possible ‘minimum baselines’ for each of the provocations. It’s clear in this instance that a single facilitator was not enough.

Community Responsibility

In this respect, we have failed as a community. While we may have vastly differing viewpoints, it is essential we cultivate environments where people feel safe to express their views and have them received in a professional and respected manner. The digital preservation community is growing – in both size and diversity. We are aware we need to put in place, improve or refresh our technical infrastructures. Now is also the time to look at how we handle our social infrastructure. It is my opinion that there is a place for a wide range of individuals, with a vast variety of backgrounds and skills needed in the digital preservation field.

There are people who are already working in digital preservation and who have great skills. They might not all be software developers, but they know how to project manage, speak, write, problem-solve, and are subject matter experts in a wide range of areas. The value of diversity has been proven. If we only have coders, computer scientists or individuals from any one background working in the field of digital preservation, then surely, we will fail.

Moving Forward

In the hours and days following the panel, I reached out to my communities online for pointers to Codes of Ethics, Codes of Conduct and other articles discussing challenging situations in similar industries. Borrowing from other industries and adapting to fit the context at hand has always been important to me. I don’t want to reinvent the wheel and would prefer to learn from others’ experiences. The panel ‘provocations’ presented were not contentious, yet how the discussion evolved throughout the duration of the panel somewhat echoes other events that have occurred within the tech industry.

At the time of publishing this post, neither the digital preservation community nor iPres has a Code of Conduct or Code of Ethics. There have been mentions of the lack of an iPres Code of Conduct in previous years. For iPres 2018, developing a Code of Conduct has become a priority. However, it shouldn’t have taken us this long to put in place some frameworks of this type, given we all know we must work collaboratively if we are to succeed. Back in 1997, UNESCO suggested that if Audiovisual Archiving was a profession, it would also require a Code of Ethics (Audiovisual archives: a practical reader – section 4, pages 15-17).

Codes of Conduct and Codes of Ethics are a starting point. Several examples include:

There’s a longer list of Codes of Conduct and Codes of Ethics that have been compiled over the past six months since iPres 2017. Even the Loop electronic music makers summit (an initiative of the Ableton software company) I attended last November in Berlin, had in place a thorough Code of Conduct.

Building Better Communities

Codes are not enough. This is about building better communities.

A 2016 article emerging from the tech community has a list of suggestions for facilitating the development of ‘plumbers’ (and therefore functional infrastructure) rather than ‘rock stars’, under the section titled: “How do we as a community prevent rock stars?”.

Building and maintaining infrastructure is typically not fun nor sexy – but this is what digital preservation demands. Without us working collaboratively and inclusively, we will not be able to acquire, preserve or provide access to the digital content we are the stewards of. This is because we won’t fully understand the contexts of the individuals producing the content, if we don’t have the same kind of diversity within our own field of digital preservation.

Diversity may not be easy, but neither is digital preservation. While it might not be rocket science per se, we’re accustomed to working on hard and complex things. Here are some suggestions to help us take the next step(s):

  • Organisers: encourage, model and – where necessary – enforce ‘good practice’ behaviours codes
  • Participants: recognise, appreciate and celebrate the privilege of being able to debate digital preservation as part of what we do. Allow and encourage minority, less confident and new voices to hold an equal place in our discussions
  • Everyone: recognise and work towards addressing our own unconscious biases and privileges

Like Kenney and McGovern’s Three-Legged Stool for Digital Preservation (a model our DPOC project is very much based on), where the organisational infrastructure, resources framework and technological infrastructure are of equal importance, recognising that the complexity of the digital preservation challenge is best addressed through multiple perspectives is essential. We must model and welcome the benefits of our diversity. Each of us brings something unique and every skill or bit of knowledge is valuable.

Email preservation 2: it is hard, but why?

A post from Sarah (Oxford) with input from Somaya (Cambridge) about the 24 January 2018 DPC event on email archiving from the Task Force on Technical Approaches to Email Archives.

The discussion of the day circulated around what they had learnt during the year of the task force, that personal and public stories are buried in email, considerable amounts of email have been lost over previous decades, that we should be treating email as data (it allows us to understand other datasets), that current approaches to collecting and preserving email don’t work as they’re not scalable and the need for the integration of artificial intelligence and machine learning (this is already taking place in legal professions with ‘predictive coding’ and clustering technologies) to address email archives, including natural language processing functions is important.


Back in July, Edith attended the first DPC event on email preservation, presented by the Task Force on Technical Approaches to Email Archives. She blogged about here. In January this year, Somaya and I attended the second event hosted again by the DPC.

Under the framework of five working groups, this task force has spent 12 months (2017) focused on five separate areas of the final report, which is due out in around May this year:

  • The Why: Overview / Introduction
  • The When/Who/Where: Email Lifecycles Perspectives
  • The What: The Needs of Researchers
  • The How: Technical Approaches and Solutions
  • The Path Forward: Sustainability & Community Development

The approach being taken is technical, rather than on policy. Membership of the task force includes the DPC, representatives from universities and national institutions from around the world and technology companies including Google and Microsoft.

For Chris Prom (from University of Illinois Urbana Champaign, who authored the 2011 DPC Technology Watch Report on Preserving Email) and Kate Murray’s (Library of Congress and contributor to FADGI) presentation about the work they have been doing, you can view their slides here. Until the final report is published, I have been reviewing the preliminary draft (of June 2017) and available documents to help develop my email preservation training course for Oxford staff in April.

So, when it comes to email preservation, most of the tools and discussions focus on processing email archives. Very little of the discussion has to do with the preservation of email archives over time. There’s a very good reason for this. Processing email archives is the bottleneck in the process, the point at which most institutions are still stuck at. It is hard to make decisions around preservation, when there is no means for collecting email archives or processing them in a timely manner.

There were many excellent questions and proposed solutions from the speakers at the January event. Below are some of the major points from the day that have informed my thinking of how to frame training on email preservation:

Why are email archives so hard to process?

  1. They are big. Few people cull their emails and over time they build up. Reply and ‘reply all’ functions expand out emails chains and attachments are growing in size and diversity. It takes a donor a while to prepare their email archives, much less for an institution to transfer and process them.
  2. They are full of sensitive information. Which is hard to find. Many open source technology assisted review (TAR) tools miss sensitive information. Software used for ‘predictive coding’ and machine learning for reviewing email archives are well out of budget for heritage institutions. Manual review is far too labour intensive.
  3. There is no one tool that can do it all. Email preservation requires ‘tool chaining’ in order to transfer, migrate and process email archives. There are a very wide variety of email software programs which in turn create a many different email file format types. Many of the tools used in email archive processing are not compatible with each of the different email file types; this requires a multiple file format migrations to allow for processing. For a list of some of the current available tools, see the Task Force’s list here.

What are some of the solutions?

  1. Tool chaining will continue. It appears for now, tool chaining is here to stay, often mixing proprietary with open source tools to get workflows running smoothly. This means institutions will need to invest in establishing email processing workflows: the software, people who know about how to handle different email formats etc.
  2. What about researchers? Access to emails is tightly controlled due to sensitivity restraints, but is there space to get researchers to help with the review? If they use the collection for research, could they also be responsible for flagging anything deemed as sensitive? How could this be done ethically?
  3. More automation. Better tool development to assisted with TAR. Reviewing processes must become more automated if email archives are ever to be processed. The scale of work is increasing and traditional appraisal approaches (handling one document at a time) and record schedules are no longer suitable.
  4. Focus on bit-level preservation first. Processing of email archives can come later, but preserving it needs to start on transfer. (But we know users want access and our institutions want to provide this access to email archives.)
  5. Perfection is no longer possible. While archivists would like to be precise, in ‘scaling up’ email archive processing we need to think about it as ‘big data’ and take a ‘good enough’ approach.

Digital Preservation Roadshow – Part 2

Building on the success of CUL’s digital preservation roadshow kit, the Oxford fellows have begun assembling a local version. The kit is a mixture of samples of old hardware, storage technology, quiz activities, and general “digital preservation swag”.

Pens, pins, and a BBC Micro

We were able to trial run it as part of a GLAM (Gardens, Libraries and Museums) showcase at the Weston Library this January. Among the showcase attendees’ favourite items was an early floppy disk camera (c.1998) and our BBC Micro Computer (1981).

Sony Digital Mavica (MVC-FD7) 

Technical Fellow James Mooney at the Oxford GLAM Showcase

Our floppy disk camera was among the first in the Mavica “FD” series from Sony. Sony produced 3.5” floppy disk cameras from late 1997 until 2002 (when it moved on to Mavica for CD). MVC-FD7 takes 8-bit images which can be easily transferred to a home computer. This is one of the reasons that the Mavica FD series was so popular – the FAT12 file system and wide spread adoption of 3.5″ floppy disk drives in computers made transfer a simple and quick task.

It is easy to forget that the floppy disk camera is really the grandfather of the microSD card!

 

 

BBC Micro

The BBC Micro is well known by most British people who went to school in the 1980s and ’90s – but even today some UK classrooms will feature a BBC Micro for more nostalgic reasons.  The BBC Microcomputer series was design and built by Acorn for the BBC Computer Literacy Project. Most schools in the UK adopted the system, and for many children the BBC BASIC programming language was the first one they learnt.

There is to this day a cult following of BBC Micro educational games, such as Granny’s Garden (1983).


The kit will be displayed in different Oxford libraries throughout 2018 to promote the DPOC training programme and raise awareness of Bodleian Libraries’ new digital preservation policy.


Breaking through with Library Carpentry

Thursday 11th January saw the Cambridge University Library’s annual conference take place. This year, it was entitled ‘Breakthrough the Library’, and focused on cutting-edge innovation in libraries and archives. I can honestly say that this was the first ever conference I’ve been to where every single speaker I saw (including the ten or so who gave lightning talks) were absolutely excellent.

So it’s hard to pick the one that made the most impression. Of course, an honourable mention must go to the talk about Jasper the three legged cat, but if I had to plump for the one that was most pertinent to moving Digital Preservation forward, I’d have picked “Library Carpentry: software and data skills for librarian professionals”, from Dr James Baker of the University of Sussex.

I’d heard of the term ‘Library Carpentry’ (and the initiatives it stems from – Software Carpentry and Data Carpentry) and thus had an idea what the talk was about on the way in. Their web presence explains things far better than I can, too (see https://librarycarpentry.github.io/), so I’m going to skip the exposition and make a different point…

As a full-blown, time-served nerd who’s clearly been embittered by 20 years in the IT profession (though I’m pleased to report, not as much as most of my long-term friends and colleagues!), I went into the talk with a bit of a pessimistic outlook. This was because, in my experience, there are three stages one passes through when learning IT skills:

  • Stage 1: I know nothing. This computer is a bit weird and confuses me.
  • Stage 2: I know EVERYTHING. I can make this computer sing and dance, and now I have the power to conquer the world.
  • Stage 3: … er – hang on… The computer might not have been doing exactly what I thought it was, after all… Ooops! What did I just do?

Stage 1 is just something you get through (if you want – I have nothing but respect for happy Stage 1 dwellers, though). If so inclined, all it really takes is a bit of persistence and a dollop of enthusiasm to get through it. If you want to but think you might struggle, then have a go at this computer programming aptitude test from the University of Kent – you may be pleasantly surprised… In my own case, I got stuck there for quite a while until one day a whole pile of O Level algebra that was lurking in my brain suddenly rose out of the murk, and that was that.

Stage 2 people, on the other hand, tend to be really dangerous… I have personally worked with quite a few well-paid developers who are stuck in Stage 2, and they tend to be the ones who drop all the bombs on your system. So the faster you can get through to Stage 3, the better. This was at the root of my concern, as one of the ideas of Library Carpentry is to pick up skills quick, and then pass them on. But I needn’t have worried because…

When I asked Dr Baker about this issue, he reassured me that ‘questioning whether the computer has done what you expected’ is a core learning point that is central to Library Carpentry, too. He also declared the following (which I’m going to steal): “I make a point of only ever working with people with Impostor Syndrome”.

Hence it really does look as if getting to Stage 3 without even going through Stage 2 at all is what Library Carpentry is all about. I believe moves are afoot to get some of this good stuff going at Cambridge… I watch with interest and might even be able to find the time to join in..? I bet it’ll be fun.