Designing digital preservation training – it’s more than just talking

Sarah, Oxford’s Outreach and Training Fellow, writes about the ‘training cycle’ and concludes that delivering useful training is more than just talking at learners.


We have all been there before: trying to keep our eyes open as someone drones on in the front of the room, while the PowerPoint slides seem to contain a novella that hurts your eyes to squint to read. That’s not how training is supposed to go.

Rather, engaging your learner in a variety activities will help them retain knowledge. And in a field like digital preservation, the more hands-on the training, the better. So often we talk about concepts or technical tools, but we very rarely provide examples, demonstrate them, or (better yet) have staff experiment with them.

And training is just one small part of the training process. I’ve learned there are many steps involved in developing a course that will be of use to staff. Most of your time will not be spent in the training room.

Identifying Learner’s Needs

Often easier said than done. It’s better to prepare for all types of learners and pitch the material to a wide audience. With hands-on tasks, it’s possible to have additional work prepared for advanced learners, so they don’t get bored while other learners are still working through the task.

Part of the DPOC project has been about finding the gaps in digital preservation skills and knowledge, so that our training programmes can better meet staff’s needs. What I am learning is that I need to cast my net wide to reach everyone!

Planning and Preparation

The hard bit. Start with what your outcomes are going to be and try not to put too many into a session. It’s too easy to be extra ambitious. Once you have them, then you pick your activities, gather your materials (create that PowerPoint) and practise! Never underestimate the value of practising your session on your peers beforehand.

Teaching and Learning

The main event. It’s important to be confident, open and friendly as a trainer. I admit, I stand in the bathroom and do a “Power Pose” for a few minutes to psyche myself up. You are allowed nerves as a trainer! It’s important to be flexible during the course

Assessment

Because training isn’t just about Teaching and Learning. That only accounts for 1/5th of the training cycle. Assessment is another 1/5th and if that’s going to happen during the course, then it needs to be planned. Using a variety of the activities mentioned above will help with that. Be aware though: activities almost always take longer than you plan! 

Activities to facilitate learning:

  • questioning
  • group activities such as, case studies, card sorting, mindmapping, etc.
  • hands-on tasks with software
  • group discussions
  • quizzes and games
  • modelling and demonstrations followed by an opportunity to practise the skill

Evaluation

Your evaluation is crucial to this. Make notes after your session on what you liked and what you need to fix. Peer evaluation is also important and sending out surveys immediately after will help with response rates. However, if you can do a paper evaluation at the end of the course, your response rates will be higher. Use that feedback to improve the course, tweak activities and content, so that you can start all over again.

Save Comic Sans

Happy April Fools’ Day! This was the joke post put out by the DPOC team. Though none of the following post is true (Comic Sans is going nowhere so far as we know), it is important to think about the preservation of font files. Ever notice that if a certain font file is not installed in your computer, the certain files can look completely different? Suddenly specialised font files become an important part of the digital file (maintaining its original look and feel) and preserving it becomes important. Just something to think about.


Save Comic Sans!

We were deeply saddened by today’s news that Microsoft Office products will in the future stop supporting the iconic Comic Sans font. The decision comes as a direct reaction to the slow decline in popularity and uptake from the Microsoft user community. The font became a staple in the mid 1990’s, but has seen a back-lash, particularly from the media industry, over the last few years. Repeated ridicule from leading public relation agencies and graphic designers has inevitably led to the drastic response from Microsoft.

‘Ban Comic Sans’, a fanatic society of typographic purists, have after an extensive smear campaign fought over a 15-year period finally won their case. “Clearly, Comic Sans as a voice conveys silliness, childish naiveté, irreverence, and is far too casual[…]”, they comment gleefully following the news from Microsoft Head Office.

(Above: Propaganda spread by the “group” Ban Comic Sans http://bancomicsans.com/propaganda/)

As preservation professionals and historians, we feel that it is our duty to speak up for all the other lovers of the font. Fans who have for years been shamed into silence by the widespread acceptance of these fanatical views. The digital preservation of Comic Sans is not only about safeguarding 20 years of cultural history, but it is also about doing the right thing for our children and grandchildren. As a small tribute, and as a show of our appreciation www.dpoc.ac.uk, will from now on only blog in Comic Sans. We refuse to say RIP to the font – we say it is time to fight the good fight.

If you have an anecdote about a time you enjoyed Comic Sans – please comment below and show your support. Perhaps we can make a difference together.

Visit to the National Archives: herons and brutalism

An update from Edith Halvarsson about the DPOC team’s trip to visit the National Archives last week. Prepare yourself for a discussion about digital preservation, PRONOM, dark archives, and wildlife!


Last Thursday DPOC visited the National Archives in London. David Clipsham kindly put much time into organising a day of presentations with the TNA’s developers, digitization experts and digital archivists. Thank you Diana, David & David, Ron, Ian & Ian, Anna and Alex for all your time and interesting thoughts!

After some confusion, we finally arrived at the picturesque Kew Gardens station. The area around Kew is very sleepy, and our first thought on arrival was “is this really the right place?” However, after a bit more circling around Kew, you definitely cannot miss it. The TNA is located in an imposing brutalist building, surrounded by beautiful nature and ponds built as flood protection for the nation’s collections. They even have a tame heron!

After we all made it on site, the day the kicked off with an introduction from Diana Newton (Head of Digital Preservation). Diana told us enthusiastically about the history of the TNA and its Digital Records Infrastructure. It was really interesting to hear how much has changed in just six years since DRI was launched – both in terms of file format proliferation and an increase in FOI requests.

We then had a look at TNA’s ingest workflows into Preservica and storage model with Ian Hoyle (Senior Developer) and David Underdown (Senior Digital Archivist). It was particularly interesting to hear about the TNA’s decision to store all master file content on offline tape, in order to bring down the archive’s carbon footprint.

After lunch with Ron Davies (Senior Project Manager), Anna de Sousa and Ian Henderson spoke to us about their work digitizing audiovisual material and 2D images. Much of our discussion focused on standards and formats (particularly around A/V). Alex Green and David Clipsham then finished off the day talking about born-digital archive accession streams and PRONOM/DROID developments. This was the first time we had seen the clever way a file format identifier is created – there is much detective work required on David’s side. David also encouraged us and anyone else who relies on DROID to have a go and submit something to PRONOM – he even promised its fun! Why not read Jenny Mitcham’s and Andrea Byrne’s articles for some inspiration?

Thanks for a fantastic visit and some brilliant discussions on how digital preservation work and digital collecting is done at the TNA!

Training begins: personal digital archiving

Outreach & Training Fellow, Sarah, has officially begun training and capacity building with session on personal digital archiving at the Bodleian Libraries. Below Sarah shares how the first session went and shares some personal digital archiving tips.


Early Tuesday morning and the Weston Library had just opened to readers. I got to town earlier than usual, stopping to get a Melbourne-style flat white at one of my favourite local cafes – to get in me in the mood for public speaking. By 9am I was in the empty lecture theatre, fussing over cords, adjusting lighting and panicking of the fact I struggled to log in to the laptop.

At 10am, twenty-one interested faces were seated with pens at the ready; there was nothing else to do but take a deep breath and begin.

In the 1.5 hour session, I covered the DPOC project, digital preservation and personal digital archiving. The main section of the training was learning about personal digital archiving, preservation lifecycle and the best practice steps to follow to save your digital stuff!

The steps of the Personal Digital Archiving & Preservation Lifecycle are intended to help with keeping your digital files organised, findable and accessible over time. It’s not prescriptive advice, but it is a good starting point for better habits in your personal and work lives. Below are tips for every stage of the lifecycle that will help build better habits and preserve your valuable digital files.

Keep Track and Manage:

  • Know where your digital files are and what digital files you have: make a list of all of the places you keep your digital files
  • find out what is on your storage media – check the label, read the file and folder names, open the file to see the content
  • Most importantly: delete or dispose of things you no longer need.
    • This includes: things with no value, duplicates, blurry images, previous document versions (if not important) and so on.

Organise:

  • Use best practice for file naming:
    • No spaces, use underscores _ and hyphens – instead
    • Put ‘Created Date’ in the file name using yyyymmdd format
    • Don’t use special characters <>,./:;'”\|[]()!@£$%^&*€#`~
    • Keep the name concise and descriptive
    • Use a version control system for drafts (e.g. yyyymmdd_documentname_v1.txt)
  • Use best practice for folder naming;
    • Concise and descriptive names
    • Use dates where possible (yyyy or yyyymmdd)
    • keep file paths short and avoid a deep hierarchy
    • Choose structures that are logical to you and to others
  • To rename large groups of image files, consider using batch rename software

Describe:

  • Add important metadata directly into the body of a text document
    • creation date & version dates
    • author(s)
    • title
    • access rights & version
    • a description about the purpose or context of the document
  • Create a README.txt file of metadata for document collections
    • Be sure to list the folder names and file names to preserve the link between the metadata and the text file
    • include information about the context of the collection, dates, subjects and relevant information
    • this is a quick method for creating metadata around digital image collections
  • Embed the metadata directly in the file
  • for image and video: be sure to add subjects, location and a description of the trip or event
  • Add tags to documents and images to aid discoverability
  • Consider saving the ‘Creation Date’ in the file name, a free text field in the metadata, in the document header or in a README text file if it is important to you. In some cases transferring the file (copying to new media, uploading to cloud storage) will change the creation date and the original date will be lost. The same goes for saving as a different file type. Always test before transfer or ‘Save As’ actions or record the ‘Creation Date’ elsewhere.

Store:

  • Keep two extra backups in two geographically different locations
  • Diversify your backup storage media to protect against potential hardware faults
  • Try to save files in formats better suited to long-term access (for advice on how to choose file formats, visit Stanford University Libraries)
  • refresh your storage media every three to five years to protect against loss of hardware failure
  • do annual spot checks, including checking all backups. This will help check for any loss, corruption or damaged backups. Also consider checking all of the different file types in your collection, to ensure they are still accessible, especially if not saved in a recommended long-term file format.

Even I can admit I need better personal archiving habits. How many photographs are still on my SD cards, waiting for transfer, selection/deletion and renaming before saving in a few choice safe backup locations? The answer is: too many. 

Perhaps now that my first training session is over, I should start planning my personal side projects. I suspect clearing my backlog of SD cards is one of them.

Useful resources on personal digital archiving:

DPC Technology Watch Report, “Personal digital archiving” by Gabriela Redwine

DPC Case Note, “Personal digital preservation: Photographs and video“, by Richard Wright

Library of Congress “Personal Archiving” website, which includes guidance on preserving specific digital formats, videos and more

 

A view from the basement – a visit the DPC Glasgow

Last Monday, Sarah, Edith and Lee visited the Digital Preservation Coalition (DPC) at their DPC Glasgow Office on University Gardens. The aim of the visit was to understand how the DPC has and will lend support to the DPOC project. The DPOC team is very fortunate in having the DPC’s expertise, resources and services at their disposal as a supporting partner in the project and we were keen to find out more.

Plied with tea, coffee and Sharon McMeekin’s awesome lemon cake, William Kilbride gave us an overview of the DPC, explaining that that they are not-for-profit membership based organisation who used to mainly cater for the UK and Ireland. However, international agencies are now welcome (UN, NATO, ICC to name a few) and this has changed the nature of their program and the features that they offer (website, streaming, event recording). They are vendor neutral but do have a ‘Commercial Supporter’ community to help support events and raise funds for digital preservation work. They have six members of staff working from the DPC Glasgow and DPC York offices. They focus upon four main areas of:

  • Workforce Development, Training and Skills
  • Communication and Advocacy
  • Research and Practice
  • Partnerships and Sustainability

William explained the last three areas and Sharon gave us an overview of the work that she does for developing workforce skills and offering training events, especially the ‘Getting Started in Digital Preservation’ and ‘Making Progress’ workshops. The DPC also provide Leadership Scholarships to help develop knowledge and CPD in digital preservation, so please do apply for those if you are working somewhere that can spare your time out of the office but can’t fund you.

In terms of helping DPOC, the DPC can help with hosting events (such as PASIG 2017) and provide supporting training resources for our organisations. They can also help with procurement processes, auditing as well as calling on the wealth of advice gained from their six members of staff.

We left feeling that, despite working as a collaborative team with colleagues we can already bounce ideas off, we had a wider support network that we could call on, guide us and help us share our work more widely. From a skills and training perspective, the idea that they are happy to review, comment and suggest further avenues for the skills needs analysis toolkit to ensure it will benefit of the wider community is of tremendous use. Yet this is one such example, and help with procurement, policy development and auditing is also something they are willing to help the project with.

It is reassuring that the DPC are there and have plenty of experience to share in the digital preservation sphere. Tapping into networks, sharing knowledge and collaborating really is the best way to help achieve a coherent, sustainable approach to digital preservation and helps those working in it to focus on specific tasks rather than try and ‘reinvent the wheel’ when somebody else has already spent time on it.

IDCC 2017 – data champions among us

Outreach and Training Fellow, Sarah, provides some insight into some of the themes from the recent IDCC conference in Edinburgh on the 21 – 22 February. The DPOC team also presented their first poster,”Parallel Auditing of the University of Cambridge and the University of Oxford’s Institutional Repositories,” which is available on the ‘Resource’ page.


Storm Doris waited to hit until after the main International Digital Curation Conference (IDCC) had ended, allowing for two days of great speakers. The conference focused on research data management (RDM) and sharing data. In Kevin Ashley’s wrap-up, he touched on data champions and the possibilities of data sharing as two of the many emerging themes from IDCC.

Getting researchers to commit to good data practice and then publish data for reuse is not easy. Many talks focused around training and engagement of researchers to improve their data management practice. Marta Teperek and Rosie Higman from Cambridge University Library (CUL) gave excellent talks on engaging their research community in RDM. Teperek found value in going to the community in a bottom-up, research led approach. It was time-intensive, but allowed the RDM team at CUL to understand the problems Cambridge researchers faced and address them. A top-down, policy driven approach was also used, but it has been a combination of the two that has been the most effective for CUL.

Higman went on to speak about the data champions initiative. Data champions were recruited from students, post-doctoral researchers, administrators and lecturers. What they had in common was their willingness to advocate for good RDM practices. Each of the 41 data champions was responsible for at least one training session year. While the data champions did not always do what the team expected, their advocacy for good RDM practice has been invaluable. Researchers need strong advocates to see the value in publishing their data – it is not just about complying with policy.

On day two, I heard from researcher and data champion Dr. Niamh Moore from University of Edinburgh. Moore finds that many researchers either think archiving their data is either a waste of time or are concerned about the future use of their data. As a data champion, she believes that research data is worth sharing and thinks other researchers should be asking,  ‘how can I make my data flourish?’. Moore uses Omeka to share her research data from her mid-90s project at the Clayoquot Sound peace camp called Clayoquot Lives. For Moore, benefits to sharing research data include:

  • using it as a teaching resource for undergraduates (getting them to play with data, which many do not have a chance to do);
  • public engagement impact (for Moore it was an opportunity to engage with the people previously interviewed at Clayoquot); and
  • new articles: creating new relationships and new research where she can reuse her own data in new ways or other academics can as well.

Opening up data and archiving leads to new possibilities. The closing keynote on day one discussed the possibilities of using data to improve the visitor experience for people at the British Museum. Data Scientist, Alice Daish, spoke of data as the unloved superhero. It can rescue organisations from questions and problems by providing answers, helping organisations make decisions, take actions and even provide more questions. For example, Daish has been able to wrangle and utilise data at the British Museum to learn about the most popular collection items on display (the Rosetta Stone came first!).

And Daish, like Teperek and Higman, touched on outreach as the only way to advocate for data – creating good data, sharing it, and using it to its fullest potential. And for the DPOC team, we welcome this advocacy; and we’d like to add to it and see that steps are also made to preserve this data.

Also, it was a great to talk about the work we have been doing and the next steps for the project—thanks to everyone who stopped by our poster!

Oxford Fellows (From left: Sarah, Edith, James) holding the DPOC poster out front of the appropriately named “Fellows Entrance” at the Royal College of Surgeons.

DPC Student Conference – What I Wish I Knew Before I Started

At the end of January, I went to the Chancellor’s Hall at the University of London’s Art Deco style Senate House. Near to the entrance of the Chancellor’s Hall was Room 101. Rumours circulated amongst the delegates keenly awaiting the start of the conference that the building and the room were the inspiration for George Orwell’s Nineteen Eighty-Four.

Instead of facing my deepest and darkest digital preservation fears in Senate House, I was keen to see and hear what the leading digital preservation trainers and invited speakers at different stages of their careers had to say. For the DPOC project, I wanted to see what types of information were included in introductory digital preservation training talks, to witness the styles of delivery and what types of questions the floor would raise to see if there were any obvious gaps in the delivery. For the day’s programme, presenters’ slides and Twitter Storify, may I recommend that you visit the DPC webpage for this event:

http://www.dpconline.org/events/past-events/wiwik-2017

The take away lesson from the day, is just do something, don’t be afraid to start. Sharon McMeekin showed us how much the DPC can help (see their new website, it’s chock full of digital preservation goodness) and Steph Taylor from CoSense showed us that you can achieve a lot in digital preservation just through keeping an eye on emerging technologies and that you spend most of your time advocating that digital preservation is not just backing up. Steph also reinforced to the student delegation that you can approach members of the digital preservation community, they are all very friendly!

From the afternoon session, Dave Thompson reminded those assembled that we also need to think about the information age that we live in, how people use information, how they are their own gatekeepers to their digital records and how recordkeepers need to react to these changes, which will require a change in thinking from traditional recordkeeping theory and practice. As Adrian Brown put it for digital archivists, “digital archivists are archivists with superpowers”. One of those superpowers is the ability to adapt to your working context and the technological environment. Digital preservation is a constantly changing field and the practitioner needs to be able to adapt and change to the environment around them in a chameleon like manner to get their institution’s work preserved. Jennifer Febles reminded us that is also OK to say that “you don’t know” when training people, you can go away and learn or even learn from other colleagues. As for the content of the day, there were no real gaps, the day programme was spot on as far as I could tell from the delegates.

Whilst reflecting on the event on the journey back on the train (and whilst simultaneously being packed into the stifling hot carriage like a sweaty sardine), the one thing that I really wanted to find out was what the backgrounds of the delegates were. More specifically, what ‘information schools’ they were attending, what courses they were undertaking, how much their modules concerned digital recordkeeping and their preservation, and, most importantly, what they are being taught in those modules.

My thoughts then drifted towards thinking of those who have been given the label of ‘digital preservation experts’. They have cut their digital preservation teeth after their formal qualifications and training in an ostensibly different subject. Through a judicious application and blending of discipline-specific learning, learning about related fields they then apply this learning to their specific working context. Increasingly, in the digital world, those from a recordkeeping background need to embrace computer science skills and applications, especially for those where coding and command line operation is not a skill they have been brought up with. We seem to be at a point where the leading digital preservation practitioners are plying their trade (as they should) and not teaching their trade in a formal education setup. A very select few are doing both but if we pulled practitioners into formal digital preservation education programmes, would we then drain the discipline of innovative practice? Should digital preservation skills (which DigCurV has done well to define) be better suited to one big ‘on the job’ learning programme rather than more formal programmes. A mix of both would be my suggestion but this discussion will never close.

Starting out in digital preservation may seem terribly daunting, with so much to learn as there is so much going on. I think that the ‘information schools’ can equip students with the early skills and knowledge but from then on, the experience and skills is learned on the job. The thing that makes the digital preservation community standout is that people are not afraid to share their knowledge and skills for the benefit of preserving cultural heritage for the future.

The things we find…

Sarah shares some finds from Edith’s Digitized image survey of the Bodleian Libraries’ many digitization projects and initiatives over the years.


We’ve been digitizing our collections for a long time. And that means we have a lot of things, in a lot of places. Part of the Policy & Planning Fellow’s task is to find them, count them, and make sure we’re looking after them. That includes making decisions to combat the obsolescence of the hardware they are stored on, the software they rely on (this includes the website that has been designed to display them), and the files themselves so they do not become victim to bit rot.

At Oxford, Edith has been hard at work searching, counting, emailing, navigating countless servers and tape managers, and writing up the image survey report. But while she has been hard at work, she has been sharing some of her best finds with the team and I thought it was time we share them with you.

Below are some interesting finds from Edith’s image survey work. Some of them a real gems:

What? a large and apparently hungry dragon from Oracula, folio 021v (Shelfmark: Barocci 170) Found? On the ODL (Oxford Digital Library) site here.

What? Toby the Sapient Pig. Found? On the Bodleian Treasures website. Currently on display in the Treasures gallery at the Weston library and open to the public. The digital version is available 24/7.

What? A very popular and beautiful early manuscript: an illustrated guide to Oxford University and its colleges, prepared for Queen Elizabeth I in 1566. This page is of the Bodleian Libraries’ Divinity School. Found? On the ODL (Oxford Digital Library) site here.

What? Corbyn in the early years (POSTER 1987-23). Found? Part of the CPA Poster Collection here.

What? And this brilliant general election poster (POSTER 1963-04). Found? Part of the CPA Poster Collection here.

What? Cosmographia, 1482, a map of the known World (Auct. P 1.4). Found? In Medieval and Renaissance Manuscripts here.

What? Gospels, folio 28v (Auct. D. 2.16). Found? Medieval and Renaissance Manuscripts here.

There are just a few of the wonderful and weird finds in our rich and diverse collections. One thing is certain, digitized collections provide hours of discovery to anyone with a computer and Internet access. It is one of the most exciting things about digitization–access to almost anyone, anywhere.

Of course providing access means preserving the digital images. Knowing what we have and where we have it, is one step to ensuring that they will be preserved for future access and discovery of the beautiful, the weird, and the wonderful.

Polonsky Fellows visit Western Bank Library at Sheffield University

Overview of DPOC’s visit to the Western Bank Library at Sheffield University by James Mooney, Technical Fellow at Bodleian Libraries, Oxford.
___________________________________________________________________________

The Polonsky Fellows were invited to the Western Bank Library at Sheffield University to speak with Laura Peaurt and other members of the Library. The aim of the meeting was to discuss the experiences of using and implementing Ex Libris’ Rosetta product.

After arriving by train, it was just a quick tram ride to Western Bank campus at Sheffield University, then we had the fun of using the paternoster lift in the Western Bank Library to arrive at our meeting, it’s great to see this technology has been preserved and still in use.

Paternoster lifts still in use at the Western Library. Image Credit: James Mooney

We met with Laura Peaurt (Digital Preservation Manager), Chris Jones (Library Systems Manager) and Angus Taggart (Library Systems Manager – Research).

Andy Bussey, Head of Digital Services & Systems was kind enough to give us an hour of his time at the start of the meeting, allowing us to discuss parts of the procurement and implementation process.

When working out the requirements for the system, Sheffield was able to collaborate with the White Rose University Consortium (the Universities of Leeds, Sheffield and York) to work out an initial scope.

When reviewing the options both open source and proprietary products were considered. For the Western Library and the University back in 2014, after a skills audit, the open source options had to be ruled out due to a lack of technical and developmental skills to customise or support them. I’m sure if this was revisited today the outcome may well have been different as the team has grown and gained experience and expertise. Many organisations may find it easier to budget for a software package and support contract with a vendor than to pursue the creation of several new employment positions.

With that said, as part of the implementation of Rosetta, Laura’s role was created as there was an obvious need for a Digital Preservation manager, we then went on to discuss the timeframe of the project and then moved onto the configuration of the product with Laura providing a live demonstration of the product whilst talking about the current setup, the scalability of the instances and the granularity of the sections within Rosetta.

During the demonstrations we discussed what content was held in Rosetta, how people had been trained with Rosetta and what feedback they had received so far. We reviewed the associated metadata which had been stored with the items that had been ingested and went over the options regarding integration with a Catalogue and/or Archival Management System.

After lunch we went on discuss the workflows currently being used with further demonstrations so we could see an end-to-end examples including what ingest rules and polices were in place along with what tools were in use and what processes were carried out. We then looked at how problematic items were dealt with in the Technical Analysis Workbench, covering the common issues and how additional steps in the ingest process can minimise certain issues.

As part of reviewing the sections of Rosetta we also inspected of Rosetta’s metadata model, the DNX (Digital Normalised XML) and discussed ingesting born-digital content and associated METS files.

Western Library. Image Credit: A J Buildings Library.

We visited Sheffield with many questions and during the course of the discussions throughout the day many of these were answered but as the day came to a close we had to wrap up the talks and head back to the train station. We all agreed it had been an invaluable meeting and sparked further areas of discussion. Having met face to face and with an understanding of the environment at Sheffield will make future conversations that much easier.

DPOC visits the Wellcome Library in London

A brief summary by Edith Halvarsson, Policy and Planning Fellow at the Bodleian Libraries, of the DPOC project’s recent visit to the Wellcome Library.
___________________________________________________________________________

Last Friday the Polonsky Fellows had the pleasure of spending a day with Rioghnach Ahern and David Thompson at the Wellcome Library. With a collection of over 28.6 million digitized images, the Wellcome is a great source of knowledge and experience in working with digitisation at a large scale. Themes of the day centred around pragmatic choices, achieving consistency across time and scale, and horizon scanning for emerging trends.

The morning started with an induction from Christy Henshaw, the Wellcome’s Digital Production Manager. We discussed digitisation collection development and Jpeg2000 profiles, but also future directions for the library’s digitised collection. One point which particularly stood out to me, was changes in user requirements around delivery of digitised collections. The Wellcome has found that researchers are increasingly requesting delivery of material for “use as data”. (As a side note: this is something which the Bodleian Libraries have previously explored in their Blockbooks project, which used facial recognition algorithms traditionally associated with security systems, to trace provenance of dispersed manuscripts). As the possibilities for large scale analysis using these types of algorithms multiply, the Wellcome is considering how delivery will need to change to accommodate new scholarly research methods.

Somay_Wellcome_20170120

Brain teaser: Spot the Odd One Out (or is it a trick question?). Image credit: Somaya Langley

Following Christy’s talk we were given a tour of the digitization studios by Laurie Auchterlonie. Laurie was in the process of digitising recipe books for the Wellcome Library’s Recipe Book Project. He told us about some less appetising recipes from the collection (such as three-headed pig soup, and puppy dishes) and about the practical issues of photographing content in a studio located on top of one of the busiest underground lines in London!

After lunch with David and Rioghnach at the staff café, we spent the rest of the afternoon looking at Goobi plug-ins, Preservica and the Wellcome’s hybrid-cloud storage model. Despite talking digitisation – metadata was a reoccurring topic in several of the presentations. Descriptive metadata is particularly challenging to manage as it tends to be a work in progress – always possible to improve and correct. This creates a tension between curators and cataloguers doing their work, and the inclination to store metadata together with digital objects in preservation systems to avoid orphaning files. Wellcome’s solution has been to articulate their three core cataloguing systems as the canonical bibliographic source, while allowing potentially out of data metadata to travel with objects in both Goobi and Preservica for in-house use only. As long as there is clarity around which is the canonical metadata record, these inconsistencies are not problematic to the library. ‘You would be surprised how many institutions have not made a decision around which their definitive bibliographic records is’, says David.

Dave_thomson_20170120

Presentation on the Wellcome Library’s digitisation infrastructure. Image credit: Somaya Langley

The last hour was spent pondering the future of digital preservation and I found the conversations very inspiring and uplifting. As we work with the long-term in mind, it is invaluable to have these chances to get out of our local context and discuss wider trends with other professionals. Themes included: digital preservation as part of archival masters courses, cloud storage and virtualisation, and the move from repository software to dispersed micro-services.

The fellow’s field trip to the Wellcome is one of a number of visits that DPOC will make during 2017 talk to institutions around the UK about their work around digital preservation. Watch www.dpoc.ac.uk for more updates.