Self-archiving the DPOC research outputs

The Digital Preservation at Oxford and Cambridge project ended on the 31st of December 2018. Although follow-on digital preservation projects are continuing at both organisations, the initial DPOC project itself has been wrapped up. This also means that activity on the www.dpoc.ac.uk blog and our Twitter hash (#dp0c) are being wound down.

To give the outputs from the DPOC project a good chance of remaining accessible in the future, we have been planning our ‘project funeral’ over the past few months. Keep on reading to find out how we archived the DPOC project’s research outputs and how you can access it in the future.

This blog has two sections:

  • Section 1: Archiving of external project outputs
  • Section 2: Archiving of internal project outputs

SECTION 1: EXTERNAL PROJECT OUTPUTS

Making use of our institutional repositories

The DPOC blog, a WordPress site maintained by Bodleian Libraries’ Systems and Services (BDLSS), has been used to disseminate external project outputs over the past 2.5 years. While the WordPress platform is among the less complex applications for BDLSS to maintain, it is still an application based platform which requires ongoing maintenance which may alter the functionality, look and feel of the DPOC blog over time. It cannot be guaranteed that files uploaded to the blog remain accessible and persistently citable over time. This is a known issue for research websites (even digital preservation ones!). For this reason, any externally facing project outputs have instead been deposited with our institutional repositories ORA (Oxford) and Apollo (Cambridge). The repositories, rather than the DPOC blog, are the natural homes for the project’s outputs.

The deposits to ORA and Apollo range from datasets, reports, abstracts, chapters and posters created by the DPOC Fellows. A full list of externally available outputs is available on our resource page, or by searching for the keyword “DPOC” on ORA and Apollo.

Image Capture: Public data sets, journals, and other research outputs from the DPOC project can be accessed through Apollo and ORA

 

Archiving our social media

One of the deposited datasets cover our social media activities. The social media dataset contains exports of all WordPress blog posts, social media statistics, and Twitter data.

A full list of Tweets which have used the #dp0c tag between August 2016 and February 2019 can be downloaded by external users from ORA. Due to Twitter’s Terms of Service, only Tweet identifiers are available as part of the public dataset. However, full Tweets generated by the project team have also been retained under embargo for internal staff use only.

As part of wrapping up the DPOC project, the blog will also be amended to reflect that it is no longer actively updated. However, as we want to keep a record of the original look of the site before these edits Bodleian Libraries’ Electronic Manuscripts and Archives are currently crawling the site. To view an archived version of dpoc.ac.uk please visit Bodleian Libraries’ archive.it page.

 


SECTION 2: INTERNAL PROJECT DOCUMENTATION

Appraising internal project documentation

Over the past 2.5 years the DPOC project has created a large body of internal documentation as an outcome of its research activities. We wanted to choose wisely what documentation to keep and what documentation to dispose of, so that other library staff can easily navigate and make use of the project outputs.

The communication plan which was created at start of the project was valuable in the appraisal process, helping us both locate and make decisions about what content to keep. Our communication plan listed:

  1. How project decisions would be recorded
  2. How different communication platforms and project management tools (such as SharePoint, Asana and Slack) would be used and backed up
  3. And which standards for file naming and versioning the Fellows would use

 

Accessing internal project documentation

In October-December both organisations appraised the content which was on the joint DPOC SharePoint site, and moved material of enduring value into local SharePoint instances for each institution. This way the documentation could be made available to other library staff rather than DPOC project members only.

We had largely followed the file naming standards outlined in the communication plan, but work was still required to manually clean up some file names. Additional contextualising descriptions were added to make content more easily understandable by staff who have not previously come across the project.

Image Caption: SharePoint

Oxford also used its departmental Confluence page which integrates with the SharePoint instance. Code written during the project is managed in GitLab.

Image Caption: Confluence


SUCCESSION PLANNING

Oxford: Although some of the DPOC Fellows are continuing work on other digital preservation related projects at Bodleian Libraries, ownership of documents, repository datasets and the WordPress website was formalised and assigned to the Head of Digital Collections and Preservation. This role (or the successor of this role) will make curatorial and preservation decisions about any DPOC project outputs managed by Bodleian Libraries.

CambridgePreservation activities will continue at CUL following on from the DPOC project in 2019. Questions regarding DPOC datasets and internal documentation hosted at CUL should be addressed to digitialpreservation[AT]lib.cam[DOT]ac.uk


SUMMARY

  • For a list of publicly available project outputs, please visit the resource page or search for the keyword “DPOC” on ora.ox.ac.uk and repository.cam.ac.uk
  • An archived version of dpoc.ac.uk is available through Bodleian Libraries’ modern archives.  Alternatively, the UK Web Archive and the Internet Archive also stores crawled version of the site.
  • If you are a CUL member of staff looking for internal project documentation, please contact  digitialpreservation[AT]lib.cam[DOT]ac.uk
  • If you are a Bodleian Libraries member of staff looking for internal project documentation, please contact digitalpreservation[AT]bodleian.ox[DOT]ac.uk

Devising Your Digital Preservation Policy: Learnings from the DPOC project

On December 4th the DPOC Policy and Planning Fellows ran a joint workshop in London presenting learnings and experiences of policy writing at CUL and Bodleian Libraries. Supporting the event were also Kirsty Lingstadt (Head of the Digital Library at the University of Edinburgh) and Jenny Mitcham (Head of Good Practice at the Digital Preservation Coalition). Kirsty and Jenny talked about their experience of policy writing in other organisational settings, illustrating how policy writing must be tailored to fit specific institutional contexts but that the broad principles remain the same.

In total 30 attendees partook in the workshop which mixed presentations with round table discussions. To make the event as interactive as possible Mentimeter was used to poll attendees on their own experiences of policy writing. Although the survey only represents a small selection of organisations in the process of writing digital preservation policy, the Fellows wanted to share some of the results in the hope that it will facilitate further discussion. Feel free to use the comments section below to let the project team know if the results from the poll seem familiar (or perhaps unfamiliar).


Question: Do you know who to consult on a digital preservation policy (in your organisation)?

Most workshop participants knew who they needed to consult on digital preservation in their organisation and also had a good working relationship with them. This is the first step when starting a new policy – knowing your organisational culture and context.

Being new to their organisations, the DPOC Fellows spent a lot of time of time early on in the project reaching out to staff across the libraries. If you are also new to your institution, getting to know those who have been there a long time is an important starting point to understanding what type of policy will suit your organisation’s culture before you begin any writing.

Question: What barriers can you see to developing a digital preservation policy (in your organisation)?

‘Time’ was identified as by far the largest barriers to writing new digital preservation policy by participants. And it is true that policy development does take a lot of time if you want the resulting document to be more than ‘just a paper’ which is filed away at the end of the process.

To get staff onboard with new policy, allocating resources for policy consultation is therefore crucial and the effort involved is not always appreciated by senior management. For example, it took the Fellows between 1-2 years to develop a new digital preservation policy for their organisations, illustrating why it is important to give staff sufficient time to write policy. While policy consultation took a long time, the DPOC Fellows felt that this was a worthwhile investment for their organisations, as time spent consulting on policy was also a great outreach and learning opportunity for the organisations as a whole.

Question: Does your organisation have a policy template?

Most participants did not have an organisation wide policy template. However, templates are part of policy best practice. A policy template is a skeleton document which outlines high level sections and headlines  which should be included in every organisational policy regardless of topic – from an HR policy to a digital preservation policy, they should all follow the same structure. The purpose of having these standardised headlines is to ensure that staff can easily digest and recognise any policy at a quick glance. Templates can also enforce good document management practices.

If you are interested in finding out more, a high level policy template which was developed for the DPOC project can be requested through the DPOC blog contact form or by emailing the Digital Preservation Coalition.

Questions: Where are institutional policies publishes (in your organisation)?

Once the policy is signed off, it is time to publicise it wider. Among the workshop participants the most common places to publish policies were either on an institutional website or intranet (although there are other options listed in the word cloud).

As a word of caution, make sure that your organisation is consistent in where it publish policies and ensure that documents are versioned. The international digital preservation policy review which the Fellows undertook in 2016 (analysing 50 different policies) found that most digital preservation policies do not use any document versioning. No versioning, in combination with the proliferation of different policy publication routes in an organisation, will soon become a real issue when staff try to locate up to date documents. (Again, if your organisation has a good policy template in place you can better enforce versioning!)

One option which was listed several times in the word cloud is to publish policy in an institutional repository; this is primarily useful if you do not have a reliable records management system in your organisation. Using a repository means that you can assign a DOI to the policy for persistent referencing and also has the added benefit of becoming the clear canonical copy of the policy

Question: How long will it take to…?

Participants were asked how long (using multiples of months) they think it would take their organisations to:

  • Draft a policy
  • Have it approved
  • Begin implementation of the policy
  • See real impact and benefits in the organisation

As seen from the chart, the drafting of a policy document is only one small aspect of policy and planning work. This is important to remember if you want to avoid your policy becoming just another ‘piece of paper’ that is filed away and not looked at again after its been written. Advocacy, communication and implementation plans continue for years to come after the original document has been drafted. 


Where next…

To find out more about policy writing during the DPOC project have a look at this recent blog post from CUL’s Policy and Planning Fellow Somaya Langley and at the workshop presentation slides available through the DPC. The Fellows are also happy to take questions through the blog and encourage use of the comments section.

Memory Makers: Digital preservation skills and how to get them

The Memory Makers Conference was hosted at Amsterdam Museum in the Netherlands 29th-30th November. Bodleian Libraries’ Policy and Planning Fellow, Edith Halvarsson, attended.


The Memory Makers conference in Amsterdam brought together training providers from the private, higher education and continuing education sector to discuss digital preservation skills, how to get them (and how to retain them).

In my experience, research on skills development is often underrepresented at digital preservation conferences, and when such talks are included the attendance tend to be lower than for technology based strands. However, taking a 1.5 day deep dive into this topic is one of the most interesting and thought-provoking activities I’ve done this year and I am happy that NDE and DPC decided to highlight this area by giving it its own conference. So in this blog I wanted to summarise some of the thoughts that have stayed with me since coming back from Amsterdam

The expectation gap

‘The expectation gap’ is something which we have discussed in a roundabout way among the Fellows over the past years, but it was a presentation by Dr Sarah Higgins which really put words onto this phenomena for me. The notion of an ‘expectation gap’ also nicely frames why we need to think seriously about lifelong learning and competency frameworks.

Sarah has been teaching Information Management to Masters Students at Aberystwyth University (Wales) for almost a decade and has been observing both the development of the programme and the career trajectories of students graduating into the field. In this time there’s been a growing gap between what employers expect of students in terms of digital preservation skills and what certified MA programmes can offer.

The bodies which certify Information Management courses in the UK (CILIP and ARA) still only require minimal digital skills as part of their competency frameworks. This has made it challenging to argue for new and mandatory digital preservation related modules on UK MA programmes. MA programmes have definitely shifted to begin meeting the digital preservation challenge, but they are still at an early stage.

So while UK Information Management courses continue to frame a lot of teaching around physical collections, the expectations of digital skills from organisations hiring recent graduates from these programmes has skyrocketed. This has made the gap between reality and fantasy even larger.  There has been a growing trend for organisations to hire new graduates and expecting them to be the magic bullet; the readymade lone experts in all areas of digital preservation who do not require any further development or support ever again. Many of Sarah’s graduates who began working on digital preservation/curation/archiving projects after graduation were essentially ‘set up to fail’ – not a nice or fair place to be at in your first job.

Dr Natalie Harrower: https://twitter.com/natalieharrower/status/1068124988358709254

Developing skills frameworks

To meet the challenge of unclear competency expectations, Sharon McMeekin (Head of Training and Skills at DPC) called for continued development of skills frameworks such as DigCurV. While DigCurV has been immensely valuable (we have for example drawn on it continuously in the DPOC project), the digital preservation field has matured a lot over the past couple of years and new learnings could now be incorporated into the model. A useful new addition to DigCurV, Sharon argued, would be to create more practitioner levels which reflects the expected skills progressions over 1-10 years for new graduates entering the field.

If such frameworks were taken on by certifying bodies, it could potentially temper both unrealistic job descriptions and help staff argue for professional development opportunities.

Lifelong learning

In her talk, Sarah strongly argued that we should expect recent Information Management graduates to also require more workplace based training after graduation. A two-year MA programme is not the endpoint for learning, especially in a quickly moving and developing field. This means that ongoing learning opportunities must also be considered by hiring organisations.

It was refreshing to hear form the British Library who strongly subscribe to this idea. The British Library team teach introductory courses on digital preservation and drop in lab sessions for all library staff on a yearly basis.

Micky Lindlar: https://twitter.com/MickyLindlar/status/1068155027108306944

But the digital preservation team also engages with a wide range of training opportunities that are perhaps not considered traditional Information Management skills. Maureen Pennock (Head of Digital Preservation at the BL) argued that skills for digital preservation are not necessarily unique to the field, and can be acquired in places which you may not initially have consider. Such skills include project management, social media management, presentation delivery, and statistical analysis. Although it should be noted that Maureen also strongly stated that no one person should be expected to be an expert in all these areas at the same time.

Learning collaboratively

Another set of presentations which I really enjoyed was focused on “collaborative learning”. Puck Huijtsing (Netwerk Oorlogsbronnen) challenged why we are so attached to lecture style learning which we are familiar with from school and higher education. She argued that collaborative learning has been shown to be a successful model when training people to take on a new craft (and she believes that digital preservation is a craft). Puck went on to elaborate on Amsterdam’s strong history of craft guilds and how these taught and shared new skills, arguing that it could potentially be a more accessible and sustainable model for workplace based training.

A number of successful training models presented by the Netherland Institute for Sound and Visions then illustrated how collaborative hands-on workshops can be delivered in practices. In one workshop series delivered by the institute, participants were asked to undertake small projects which focused on discreet digital collection material which they had a pre-existing relationship with. The institutes research indicates that this model is successful in aiding retention and uptake of digital preservation and archiving skills. These are workshops which we are also keen to test out at Bodleian Libraries next year to see if they are received well by staff.

Summary

It is clear from the Memory Makers conference that there are a lot of people out there who care about learning and professional development in the digital preservation field. This blog only summarises a small section of all the excellent work that was presented over 1.5 days, and I would encourage others to look at presentation slides and the Twitter hash for the event (#MemoryMakers18) if this is a topic which interests you as well.

Reflections on the International Conference on Digital Preservation (iPres) 2018

The iPres conference celebrated its fifteenth birthday in 2018. Bodleian Libraries’ Policy and Planning Fellow, Edith, discusses her take on this year’s conference theme.  


In 2003 a small international meeting, hosted by the Chinese Academy of Science, prompted the creation of what is today iPres (the International Conference on Digital Preservation). The conference has since grown massively; this year almost 500 delegates attended. To celebrate its fifteenth birthday, iPres 2018 had a self-reflecting theme, considering how the theory of digital preservation has today matured into a community of practice.

In the three years that I’ve worked in the digital preservation field, I have often felt that I have the same conversations on repeat. Which is not to say that I do not love having them! However, the opportunity to reflect on significant developments in digital preservation since 2003 is comforting and shows how these conversations eventually do have lasting impact. Knowing how far the community has come in the past fifteen years opens up my imagination around where digital preservation might be by 2033. And despite current world challenges I am very optimistic!


So what did iPres 2018 have to say about developments since 2003?

1) We now have a joint vocabulary

Barbara Sierman, of the Koninklijke Bibliotheek, commented that a development which is particularly striking to her is that digital preservation today has a shared vocabulary. In the early 2000’s even defining the issues around preservation was a barrier when speaking to colleagues. The fact that we now have a shared vocabulary, comments Sierman, means that practitioners are able to present their research and practices at conferences such as iPres.

This is something hugely valuable and does show that digital preservation is emerging as a distinct discipline. Importantly, having established a vocabulary and theories also enables the digital preservation community to challenge and test these very notions and use them as a reference point for new ones.

Twitter – @euanc – https://twitter.com/euanc/status/1044941732155215873


2) More people see the value of digital preservation

“The ability to authenticate and validate turns out to be a superpower in an era where data and truth has become a key economic product.”

This was a comment from William Kilbride (Digital Preservation Coalition) on growing interest in the field. I agree that public awareness of digital collecting and digital preservation is something which appears to have changed rapidly in the last year or so. I think there is a growing consciousness that the internet is not permanent and that your digital life has value. My personal observation has been that recent events, (such as Cambridge Analytica as well as the stricter General Data Protection Regulation in the EU), have prompted more people to see their social media and other data as something they can make decisions about. This is for example the first year when friends have started asking me how to extract and preserve their social media!


3) Digital preservation is becoming more Business-as-Usual (but we are not completely there yet)

Twitter-@karirene69, https://twitter.com/karirene69/status/1045014419045064704

In the panel Taking Stock after 15 Years Maureen Pennock, of the British Library, reflected on the role of research in developing digital preservation as a field. Many of the research projects undertaken in the late 1990’s to 2000’s profoundly shaped the field and without them we would today not have sustainable digital collecting programmes in place in some organisations.

Having the space to undertake innovative research will always be important to ensure that digital preservation can address emerging challenges. It is also highly encouraging that BAU digital preservation programmes are now becoming more common and that organisations are collecting at large and automated scales. However, Pennock warns that there is a difference between research and practice and that the latter needs to function outside the remit of discreet research project funding. This still an ongoing challenge to BAU practices for digital preservation.


And what about the future?

It is always hard to predict which topics are “fads” and which ones make a more lasting impact. However, a hot topic this year (which divided opinions) was whether or not digital preservation should develop into a separate profession with its own code of ethics. The development of digital preservation as a profession could be an important advocacy tool. Conversely, it also runs the risks of isolating digital preservation activities by framing them as something separate from other professions such as archivists, records managers and librarians.

Twitter – @mopennock – https://twitter.com/mopennock/status/1044944038170972161

Now that we have the vocabularies, theories, practices, and attention of the media (as outlined above) – should we instead be making a more concerted effort to integrate with library, archives and other research conferences? This will no doubt be a continued area of discussion for iPres 2019 and beyond!

Digital preservation with limited resources

What should my digital preservation strategy be, if I do not have access to repository software or a DAMS system?

At Oxford, we recently received this question from a group of information professionals working for smaller archives. This will be a familiar scenario for many – purchasing and running repository software will require a regular dedicated budget, which many archives in the UK do not currently have available to them.

So what intermediate solutions could an archive put in place to better its chances of not losing digital collection content until such a time? This blog summarises some key points from meeting with the archivists, and we hope that these may be useful for other organisations who are asking the same question.


Protect yourself against human error

CC-BY KateMangoStar, Freepik

Human error is one of the major risks to digital content. It is not uncommon that users will inadvertently drag files/folders or delete content by mistake. It is therefore important to have strict user restrictions in place which limits who can delete, move, and edit digital collections. For this purpose you need to ensure that you have defined an “archives directory” which is separate from any “working directories” where users can still edit and actively work with content.

If you have IT support available to you, then speak to them about setting up new user restrictions.


Monitor fixity

CC-BY Dooder, Freepik

However, even with strong user restrictions in place, human error can occur. In addition to enforcing stronger user restrictions in the “archives directory”, tools like Fixity from AVP can be used to spot if content has been moved between folders, deleted, or edited. By running regular Fixity reports an archivist can spot any suspicious looking changes.

We are aware that time constraints are a major factor which inhibits staff from adding additional tasks to their workload, but luckily Fixity can be set to run automatically on a weekly basis, providing users with an email report at the end of the week.


Understand how your organisation does back-ups

CC-BY Shayne_ch13, Freepik

A common IT retention period for back-ups of desktop computers is 14 days. The two week period enables disaster recovery of working environments, to ensure that business can continue as usual. However, a 14 day back-up is not the same as preservation storage and it is not a suitable solution for archival collections.

In this scenario, where content is stored on a file system with no versioning, the archivist only has 14 days to spot any issues and retrieve an older back-up before it is too late. So please don’t go on holiday or get ill for long! Even with tools like Fixity, fourteen days is an unrealistic turn-around time (if the issue is at all spotted in the first place).

If possible, try and make the case to your organisation that you require more varied types of back-ups for the “archival directory”. These should include back-ups which are at least retained for a year. Using a mix of tape storage and/or cloud service providers can be a less expensive way of storing additional back-ups which do not require ongoing access. It is an investment which is worth making.

As a note of warning though – you are still only dealing in back-ups. This is not archival storage. If there are issues with multiple back-ups (due to for example transfer or hardware errors) you can still lose content. The longer term goal, once better back-ups are in place, should be to monitor the fixity of multiple copies of content from the “archival directory”. (For more information about the difference between back-ups used for regular IT purposes and storage for digital preservation see the DPC Handbook)


Check that your back-ups work
Once you have got additional copies of your collection content, remember to check that you can retrieve them again from storage.

Many organisations have been in the positions where they think they have backed up their content – only to find out that their back-ups have not been created properly when they need them. By testing retrieval you can protect your collections against this particular risk.


But… what do I do if my organisation does not do back-ups at all?
Although the 14 day back-up retention is common in many businesses, it is far from the reality which certain types of archives operate within. A small community organisation may for example do all its business on a laptop or workstation which is shared by all staff (including the archive).

This is a dangerous position to be in, as hardware failure can cause immediate and total loss. There is not a magic bullet for solving this issue, but some of the advice which Sarah (Training and Outreach Fellow at Bodleian Libraries) has provided in her Personal Digital Archiving Course could apply.

Considerations from Sarah’s course include:

  • Create back-ups on additional removable hard drive(s) and store them in a different geographical location from the main laptop/workstation
  • Make use of free cloud storage limits (do check the licenses though to see what you are agreeing to – it’s not where you would want to put your HR records!)
  • Again – remember to check your back-ups!
  • For digitized images and video, consider using the Internet Archive’s Gallery as an additional copy (note that this is open to the public, and requires assigning a CC-BY license)  (If you like the work that the Internet Archive does – you can donate to them here )
  • Apply batch-renaming tools to file names to ensure that they contain understandable metadata in case they are separated from their original folders

(Email us if you would like to get a copy of Sarah’s lecture slides with more information)


Document all of the above

CC-BY jcomp, Freepik

Make sure to write down all the decisions you have made regarding back-ups, monitoring, and other activities. This allows for succession planning and ensures that you have a paper trail in place.


Stronger in numbers

CC-BY, Kjpargeter, Freepik

Licenses, contracts and ongoing management is expensive. Another venue to consider is looking to peer organisations to lower some of these costs. This could include entering into joint contracts with tape storage providers, or consortium models for using repository software. An example of an initiative which has done this is the NEA (Network Electronic Archive) group which has been an established repository for over ten years supporting 28 small Danish archives.


Summary:
These are some of the considerations which may lower the risk of losing digital collections. Do you have any other ideas (or practical experience) of managing and preserving digital collections with limited resources, and without using a repository or DAMS system?

Project update

A project update from Edith Halvarsson, Policy and Planning Fellow at Bodleian Libraries. 


Ms Arm.e.1, Folio 23v

Bodleian Libraries’ new digital preservation policy is now available to view on our website, after having been approved by Bodleian Libraries’ Round Table earlier this year.

The policy articulates Bodleian Libraries’ approach and commitment to digital preservation:

“Bodleian Libraries preserves its digital collections with the same level of commitment as it has preserved its physical collections over many centuries. Digital preservation is recognized as a core organizational function which is essential to Bodleian Libraries’ ability to support current and future research, teaching, and learning activities.”

 

Click here to read more of Bodleian Libraries’ policies and reports.

In other related news we are currently in the process of ratifying a GLAM (Gardens, Libraries and Museums) digital preservation strategy which is due for release after the summer. Our new digitization policy is also in the pipelines and will be made publicly available. Follow the DPOC blog for future updates.

Digital Preservation Roadshow – Part 2

Building on the success of CUL’s digital preservation roadshow kit, the Oxford fellows have begun assembling a local version. The kit is a mixture of samples of old hardware, storage technology, quiz activities, and general “digital preservation swag”.

Pens, pins, and a BBC Micro

We were able to trial run it as part of a GLAM (Gardens, Libraries and Museums) showcase at the Weston Library this January. Among the showcase attendees’ favourite items was an early floppy disk camera (c.1998) and our BBC Micro Computer (1981).

Sony Digital Mavica (MVC-FD7) 

Technical Fellow James Mooney at the Oxford GLAM Showcase

Our floppy disk camera was among the first in the Mavica “FD” series from Sony. Sony produced 3.5” floppy disk cameras from late 1997 until 2002 (when it moved on to Mavica for CD). MVC-FD7 takes 8-bit images which can be easily transferred to a home computer. This is one of the reasons that the Mavica FD series was so popular – the FAT12 file system and wide spread adoption of 3.5″ floppy disk drives in computers made transfer a simple and quick task.

It is easy to forget that the floppy disk camera is really the grandfather of the microSD card!

 

 

BBC Micro

The BBC Micro is well known by most British people who went to school in the 1980s and ’90s – but even today some UK classrooms will feature a BBC Micro for more nostalgic reasons.  The BBC Microcomputer series was design and built by Acorn for the BBC Computer Literacy Project. Most schools in the UK adopted the system, and for many children the BBC BASIC programming language was the first one they learnt.

There is to this day a cult following of BBC Micro educational games, such as Granny’s Garden (1983).


The kit will be displayed in different Oxford libraries throughout 2018 to promote the DPOC training programme and raise awareness of Bodleian Libraries’ new digital preservation policy.


Advocating for digital preservation

Bodleian Libraries and Cambridge University Library are entering into the last phase of the DPOC project, where they are starting to write up business cases for digital preservation. In preparation, the Fellows attended DPC’s “advocacy briefing day” in London.  Policy and Planning Fellow, Edith, blogs about some of the highlights and lessons from the day.


This week I had the pleasure of attending DPC’s advocacy training day. It was ran by Catherine Heaney, the founder of DHR Communications, and a veteran when it comes to advocating for supporting digital heritage. Before the event I thought I had a clear idea of what advocacy means in broad terms. You invite yourself into formal meetings and try to deliver measured facts and figures which will be compelling to the people in front of you – right?

Well… not quite it turns out. Many of these assumptions were put on their head during this session. Here are my four favourite pieces of (sometimes surprising) advocacy advice from Catherine.

Tip 1: Advocacy requires tenaciousness

The scenario which was described above is what communications professionals might call “the speech” – but it is only one little part of effective advocacy. “The digital preservation speech” is important, but it is not necessarily where you will get the most buy-in for digital preservation. Research has shown that one-off communications like these are usually not effective.

In fact, all of those informal connections and conversations you have with colleagues also come under advocacy and may reap greater benefits due to their frequency. And if one of these colleagues are themselves talented at influencing others, they can be invaluable in advocating for digital preservation when you are not there in person.

Lesson learnt: you need to keep communicating the message whenever and wherever you can if you want it to seep in to peoples’ consciousness. Since digital preservation issues do not crop up that often in popular culture and the news, it is up to us to deliver, re-deliver… and then re-deliver the message if we want it to stick.

Tip 2: Do your background research

When you know that you will be interacting with colleagues and senior management, it is important to do your background research and find out what argument will most appeal to the person you are meeting. Having a bog-standard ‘speech’ about digital preservation which you pull out at all occasions is not the most effective approach. In order to make your case, the problem you are attempting to solve should also reflect the goals and the challenges which the person you are trying to advocate to are facing.

The aspects which appeal about digital preservation will be different depending on the role, concerns and responsibilities of the person you are advocating to. Are they concerned with:

  • Legal or reputational risk?
  • Financial costs and return on investment?
  • About being seen as someone at the forefront of the digital preservation fields?
  • Creating reproducible research?
  • Collecting unique collections?
  • Or perhaps about the opportunity to collaborate cross-institutionally?

Tip 3: Ensure that you have material for a “stump speech” ready

Tailoring your message to the audience is important, and this will be easier if you have material ready at hand which you can pick and choose from. Catherine suggested preparing a folder of stories, case studies, data and facts about digital preservation which you can cut and paste from to suit the occasion.

What is interesting though is the order of that list of “things to collect”:

  1. Stories
  2. Case studies
  3. Data and facts

The ranking is intentional. We tend to think that statistics and raw data will convince people, as this appeals to their logic. In fact, your argument will be stronger if your pitch starts with a narrative (a story) about WHY we need digital preservation and case studies to illustrate your point.  Catherine advises that it is then when the audience is listening that you bring out the data and facts. This approach is both more memorable and more effective in capturing your audience’s attention.

Tip 4: Personalise your follow up

This connects to tip 2 – about knowing your audience. Catherine advised that, although it may feel strange at first, writing a personalised follow up message is a very effective tool. When you do have the chance to present your case to an important group within your organisation, the follow up message can further solidify that initial pitch (again – see tip 1 about repeated communication).

By taking notes about the concerns or points that have been made during a meeting, you have the opportunity to write personalised messages which captures and refers back to the concerns raised by that particular person. The personalised message also has the additional benefit of opening up a channel for future communication.


This was just a small subsection of all the interesting things we talked about on the advocacy briefing day. For some more information have a look at the hashtag for the day #DPAdvocacy.

Using ePADD with Josh Schneider

Edith, Policy and Planning Fellow at Bodleian Libraries, writes about her favourite features in ePADD (an open source software for email archives) and about how the tool aligns with digital preservation workflows.


At iPres a few weeks ago I had the pleasure of attending an ePadd workshop ran by Josh Schneider from Stanford University Libraries. The workshop was for me one of the major highlights of the conference, as I have been keen to try out ePADD since first hearing about it at DPC’s Email Preservation Day. I wrote a blog about the event back in July, and have now finally taken the time to review ePADD using my own email archive.

ePADD is primarily for appraisal and delivery, rather than a digital preservation tool. However, as a potential component in ingest workflows to an institutional repository, ensuring that email content retains integrity during processing in ePADD is paramount. The creators behind ePADD are therefore thinking about how to enhance current features to make the tool fit better into digital preservation workflows. I will discuss these features later in the blog, but first I wanted to show some of the capabilities of ePADD. I can definitely recommend having a play with this tool yourself as it is very addictive!

ePADD: Appraisal module dashboard

Josh, our lovely workshop leader, recommends that new ePADD users go home and try it on their own email collections. As you know your own material fairly well it is a good way of learning about both what ePADD does well and its limits. So I decided to feed in my work emails from the past year into ePADD – and found some interesting trends about my own working patterns.

ePADD consists of four modules, although I will only be showing features from the first two in this blog:

Module 1: Appraisal (Module used by donors for annotation and sensitivity review of emails before delivering them to the archive)

Module 2: Processing (A module with some enhanced appraisal features used by archivist to find additional sensitive information which may have been missed in the first round of appraisal)

Module 3: Discovery (A module which provides users with limited key word searching for entities in the email archive)

Module 4: Delivery (This module provides more enhanced viewing of the content of the email archive – including a gallery for viewing images and other document attachments)

Note that ePADD only support MBOX files, so if you are an Outlook user like myself you will need to first convert from PST to MBOX. After you have created an MBOX file, setting up ePADD is fairly simple and quick. Once the first ePADD module (“Appraisal”) was up and running, processing my 1,500 emails and 450 attachments took around four minutes. This time includes time for natural language processing. ePADD recognises and indexes various “entities” – including persons, places and events – and presents these in a digestible way.

ePADD: Appraisal module processing MBOX file

Looking at the entities recognised by ePADD, I was able to see who I have been speaking with/about during the past year. There were some not so surprising figures that popped up (such as my DPOC colleagues James Mooney and Dave Gerrard). However, curiously I seem to also have received a lot of messages about the “black spider” this year (turns out they were emails from the Libraries’ Dungeons and Dragons group).

ePADD entity type: Person (some details removed)

An example of why you need to look deeper at the results of natural language processing was evident when I looked under the “place entities” list in ePADD:

ePADD entity type: Place

San Francisco comes highest up on the list of mentioned places in my inbox. I was initially quite surprised by this result. Looking a bit closer, all 126 emails containing a mention of San Francisco turned out to be from “Slack”.  Slack is an instant messaging service used by the DPOC team, which has its headquarters in San Francisco. All email digests from Slack contains the head office address!

Another one of my favourite things about ePADD is its ability to track frequency of messages between email accounts. Below is a graph showing correspondence between myself and Sarah Mason (outreach and training fellow on the DPOC project). The graph shows that our peak period of emailing each other was during the PASIG conference, which DPOC hosted in Oxford at the start of September this year. It is easy to imagine how this feature could be useful to academics using email archives to research correspondence between particular individuals.

ePADD displaying correspondence frequency over time between two users

The last feature I wanted to talk about is “sensitivity review” in ePADD. Although I annotate personal data I receive, I thought that the one year mark of the DPOC project would also be a good time to run a second sensitivity review of my own email archive. Using ePADD’s “lexicon hits search” I was able to sift through a number of potentially sensitive emails. See image below for categories identified which cover everything from employment to health. These were all false positives in the end, but it is a feature I believe I will make use of again.

ePADD processing module: Lexicon hits for sensitive data

So now on to the Digital Preservation bit. There are currently three risks of using ePADD in terms of preservation which stands out to me.

1) For practical reasons, MBOX is currently the only email format option supported by ePADD. If MBOX is not the preferred preservation format of an archive it may end up running multiple migrations between email formats resulting in progressive loss of data

2) There are no checksums being generated when you download content from an ePADD module in order to copy it onto the next one. This could be an  issue as emails are copied multiple times without monitoring of the integrity of the email archive files occurring

3) There is currently limited support for assigning multiple identifiers to archives in ePADD. This could potentially become an issue when trying to aggregate email archives from different intuitions. Local identifiers could in this scenario clash and other additional unique identifiers would then also be required

Note however that these concerns are already on the ePADD roadmap, so they are likely to improve or even be solved within the next year.

To watch out for ePADD updates, or just have a play with your own email archive (it is loads of fun!), check out their:

PASIG 2017 Twitter round-up

After many months of planning it feels quite strange to us that PASIG 2017 is over. Hosting the PASIG conference in Oxford has been a valuable experience for the DPOC fellows and a great chance for Bodleian Libraries’ staff to meet with and listen to presentations by digital preservation experts from around the world.

In the end 244 conference delegates made their way to Oxford and the Museum of Natural History. The delegates came from 130 different institutions and every continent of the world was represented (…well, apart from Antarctica).

What was especially exciting though were all the new faces. In fact 2/3 of the delegates this year had not been to a PASIG conference before! Is this perhaps a sign that interest in digital preservation is on the rise?

As always at PASIG, Twitter was ablaze with discussion in spite of an at times flaky Wifi connection. Over three days #PASIG17 was mentioned a whopping 5300 times on Twitter and had a “reach” of 1.7 million. Well done everyone on some stellar outreach! Most active Twittering came from the UK, USA and Austria.

Twitter activity by country using #PASIG17 (Talkwalker statistics)

Although it is hard to choose favourites among all the Tweets, a few of the DPOC project’s personal highlights included:

Cambridge Fellow Lee Pretlove lists “digital preservation skills” and why we cannot be an expert in all areas. Tweet by Julian M. Morley

Bodleian Fellow James makes some insightful observations about the incompatibility between tar pits and digital preservation.

Cambridge Fellow Somaya Langley presents in the last PASIG session on the topic of “The Future of Digital Preservation”.  

What were some of your favourite talks and Twitter conversations? What would you like to see more of at PASIG 2018? #futurePASIG