Policy ramblings

For the second stage of the DPOC project Oxford and Cambridge have started looking at policy and strategy development. As part of the DPOC deliverables, the Policy and Planning Fellows will be collaborating with colleagues to produce a digital preservation policy and strategy for their local institutions. Edith (Policy and Planning Fellow at Oxford) blogs about what DPOC has been up to so far.


Last Friday I met with Somaya (Policy and Planning Fellow) and Sarah (Training and Outreach Fellow) at the British Library in London. We spent the day discussing review work which DPOC has done of digital preservation policies so far. The meeting also gave us a chance to outline an action plan for consulting stakeholders at CUL and Bodleian Libraries on future digital preservation policy development.

Step 1: Policy review work
Much work has already gone into researching digital preservation policy development [see for example the SCAPE project and OSUL’s policy case study]. As considerable effort has been exerted in this area, we want to make sure we are not reinventing the wheel while developing our own digital preservation policies. We therefore started by reading as many digital preservation policies from other organisations as we could possibly get our hands on. (Once we ran out of policies in English, I started feeding promising looking documents into Google Translate with a mixed bag of results.) The policy review drew attention to aspects of policies which we felt were particular successful, and which could potentially be re-purposed for the local CUL and Bodleian Libraries contexts.

My colleague Sarah helped me with the initial policy review work. Between the two of us we read 48 policies dating from 2008-2017. However, determining which documents were actual policies was trickier than we had first anticipated. We found that documents named ‘strategy’ sometimes read as policy, and documents named policy sometimes read as more low level procedures. For this reason, we decided to add another 12 strategy documents to the review which had strong elements of policy in them. This brought us up to a round 60 documents in total.

So we began reading…. But we soon found that once you are on your 10th policy of the day, you start to get them muddled up. To better organise our review work, we decided to put them into a classification system developed by Kirsten Snawder (2011) and adapted by Madeline Sheldon (2013). Snawder and Sheldon identified nineteen common topics from digital preservation policies. The topics range from ‘access and use’ to ‘preservation planning’ [for the full list of topics, see Sheldon’s article on The Signal from 2013]. I was interested in seeing how many policies would make direct reference to the Open Archival Information System (OAIS) reference model, so I added this in as an additional topic to the original nineteen identified by Snawder and Sheldon.

Reviewing digital preservation policies written between 2008-2017

Step 2: Looking at findings
Interestingly, after we finished annotating the policy documents we did not find a correlation between covering all of Snawder and Sheldon’s nineteen topics and having what we perceived as an effective policy. Effective in this context was defined as the ability of the policy to clearly guide and inform preservation decisions within an organisation. In fact, the opposite was more common as we judged several policies which had good coverage of topics from the classification system to be too lengthy, unclear, and sometimes inaccessible due to heavy use of digital preservation terminology.

In terms of OAIS, another interesting finding was that 33 out of 60 policies made direct reference to the OAIS. In addition to these 33, several of the ones which did not make an overt reference to the model still used language and terminology derived from it.

So while we found that the taxonomy was not able to guide us on which policy topics were an absolute essential in all circumstances, using it was a good way of arranging and documenting our thoughts.

Step 3: Thinking about guiding principles for policy writing
What this foray into digital preservation policies has shown us is that there is no ‘one fits all’ approach or a magic formula of topics which makes a policy successful. What works in the context of one institution will not work in another. What ultimately makes a successful policy also comes down to communication of the policy and organisational uptake. However, there are number of high level principles which the three of us all felt strongly about and which we would like to guide future digital preservation policy development at our local institutions.

Principle 1: Policy should be accessible to a broad audience. Contrary to findings from the policy review, we believe that digital preservation specific language (including OAIS) should be avoided at policy level if possible. While reviewing policy statements we regularly asked ourselves:

“Would my mother understand this?”

If the answer is yes, the statement gets to stay. If it is no, maybe consider re-writing it. (Of course, this does not apply if your mother works in digital preservation.)

Principle 2: Policy also needs to be high-level enough that it does not require constant re-writing in order to make minor procedural changes. In general, including individuals’ names or prescribing specific file formats can make a policy go out of date quickly. It is easier to change these if they are included in lower level procedures and guidelines.

Principle 3: Digital preservation requires resources. Getting financial commitment to invest in staff at policy level is important. It takes time to build organisation expertise in digital preservation, but losing it can happen a lot quicker. Even if you choose to outsource several aspects of digital preservation, it is important that staff have skills which enables them to understand and critically assess the work of external digital preservation service providers.

What are your thoughts? Do you have other principles guiding digital preservation policy development in your organisations? Do you agree or disagree with our high-level principles?

Preserving research – update from the Cambridge Technical Fellow

Cambridge’s Technical Fellow, Dave, discusses some of the challenges and questions around preserving ‘research output’ at Cambridge University Library.


One of the types of content we’ve been analysing as part of our initial content survey has been labelled ‘research output’. We knew this was a catch-all term, but (according to the categories in Cambridge’s Apollo Repository), ‘research output’ potentially covers: “Articles, Audio Files, Books or Book Chapters, Chemical Structures, Conference Objects, Datasets, Images, Learning Objects, Manuscripts, Maps, Preprints, Presentations, Reports, Software, Theses, Videos, Web Pages, and Working Papers”. Oh – and of course, “Other”. Quite a bundle of complexity to hide behind one simple ‘research output’ label.

One of the categories in particular, ‘Dataset’, zooms the fractal of complexity in one step further. So far, we’ve only spoken in-depth to a small set of scientists (though our participation on Cambridge’s Research Data Management Project Group means we have a great network of people to call on). However, both meetings we’ve had indicate that ‘Datasets’ are a whole new Pandora’s box of complicated management, storage and preservation challenges.

However – if we pull back from the complexity a little, things start to clarify. One of the scientists we spoke to (Ben Steventon at the Steventon Group) presented a very clear picture of how his research ‘tiered’ the data his team produced, from 2-4 terabyte outputs from a Light Sheet Microscope (at the Cambridge Advanced Imaging Centre) via two intermediate layers of compression and modelling, to ‘delivery’ files only megabytes in size. One aspect of the challenge of preserving such research then, would seem to be one of tiering preservation storage media to match the research design.

(I believe our colleagues at the JISC, who Cambridge are working with on the Research Data Management Shared Service Pilot Project, may be way ahead of us on this…)

Of course, tiering storage is only one part of the preservation problem for research data: the same issues of acquisition and retention that have always been part of archiving still apply… But that’s perhaps where the ‘delivery’ layer of the Steventon Group’s research design starts to play a role. In 50 or 100 years’ time, which sets of the research data might people still be interested in? It’s obviously very hard to tell, but perhaps it’s more likely to be the research that underpins the key model: the major finding?

Reaction to the ‘delivered research’ (which included papers, presentations and perhaps three or four more from the list above) plays a big role, here. Will we keep all 4TBs from every Light Sheet session ever conducted, for the entirety of a five or ten-year project? Unlikely, I’d say. But could we store (somewhere cold, slow and cheap) the 4TBs from the experiment that confirmed the major finding?

That sounds a bit more within the realms of possibility, mostly because it feels as if there might be a chance that someone might want to work with it again in 50 years’ time. One aspect of modern-day research that makes me feel this might be true is the complexity of the dependencies between pieces of modern science, and the software it uses in particular. (Blender, for example, or Fiji). One could be pessimistic here and paint a negative scenario of ‘what if a major bug is found in one of those apps, that calls into question the science ‘above it in the chain’. But there’s an optimistic view, here, too… What if someone comes up with an entirely new, more effective analysis method that replaces something current science depends on? Might there not be value in pulling the data from old experiments ‘out of the archive’ and re-running them with the new kit? What would we find?

We’ll be able to address some of these questions in a bit more detail later in the project. However, one of the more obvious things talking to scientists has revealed is that many of them seem to have large collections of images that need careful management. That seems quite relevant to some of the more ‘close to home’ issues we’re looking at right now in The Library.

When was that?: Maintaining or changing ‘created’ and ‘last modified’ dates

Sarah has recently been testing scenarios to investigate the question of changes in file ‘date created’ and ‘last modified’ metadata. When building training, it’s always best to test out what your advice before giving it and below is the result of Sarah’s research with helpful screenshots.


Before doing some training that involved teaching better recordkeeping habits to staff, I ran some tests to be sure that I was giving the right advice when it came to created and last modified dates. I am often told by people in the field that these dates are always subject to change—but are they really? I knew I would tell staff to put created dates in file names or in document headers in order to retain that valuable information, but could the file maintain the correct embedded date anyways?  I set out to test a number of scenarios on both my Mac OS X laptop and Windows desktop.

Scenario 1: Downloading from cloud storage (Google Drive)

This was an ALL DATES change for both Mac OS X and Windows.

Scenario 2: Uploading to cloud storage (Google Drive)

Once again this was an ALL DATES change for both systems.

Note: I trialled this a second time with the Google Drive for PC application and in OS X and found that created and last modified dates do not change when the file is uploaded or downloaded the Google Drive folder on the PC. However, when in Google Drive via the website, the created date is different (the date/time of upload), though the ‘file info’ will confirm the date has not changed. Just to complicate things.

Scenario 3: Transfer from a USB

Mac OS X had no change to the dates. Windows showed an altered created date, but maintained the original last modified date.

Scenario 4: Transfer to a USB

Once again there was no change of a dates in the Mac OS X. Windows showed an altered created date, but maintained the original last modified date.

Note: I looked into scenarios 3 and 4 for Windows a bit further and saw that Robocopy is an option as a command prompt that will allow directories to be copied across and maintains those date attributes. I copied a ‘TEST’ folder containing the file from the Windows computer to the USB, and back again. It did what was promised and there were no changes to either dates in the file. It is a bit annoying that an extra step is required (that many people would find technically challenging and therefore avoid).

Scenario 5: Moving between folders

No change across either systems. This was a relief for me considering how often I move files around my directories.

Conclusions

When in doubt (and you should always be in doubt), test the scenario. Even when I tested these scenarios three of four times, it did not always come out with the same result. That alone should make one cautious. I still stick to putting created date in the file name and in the document itself (where possible), but it doesn’t meant I always receive documents that way.

Creating a zip of files/folders before transfer is one method of preserving dates, but I had some weird issues trying to unzip the file in cloud storage that took a few tries before the dates remained preserved. It is also possible to use Quickhash for transferring files unchanged (and it generates a checksum).

I ignored the last accessed date during testing, because it was too easy to accidentally double-click a file and change it (as you can see happened to my Windows 7 test version).

Has anyone tested any other scenarios to assess when file dates are altered? Does anyone have methods for transferring files without causing any change to dates?

An approach to selecting case studies

Cambridge Policy & Planning Fellow, Somaya, writes about a case study approach developed by the Cambridge DPOC Fellows for CUL. Somaya’s first blog post about the case studies looks at the selection methodology the Cambridge DPOC fellows used to choose their final case studies.


Physical format digital carriers. Photo: Somaya Langley

Background & approach

Cambridge University Library (CUL) has moved to a ‘case study’ approach to the project. The case studies will provide an evidence-based foundation for writing a policy and strategy, developing a training programme and writing technical requirements within the time constraints of the project.The case studies we choose for the DPOC project will enable us to test hands-on day-to-day tasks necessary for working with digital collection materials at CUL. They also need to be representative of our existing collections and future acquisitions, our Collection Development Policy FrameworkStrategic Plan,  our current and future audiences, while considering the ‘preservation risk’ of the materials.

Classes of material

Based on the digital collections surveying work I’ve been doing, our digital collections fall into seven different ‘classes’:

  1. Unpublished born-digital materials – personal and corporate papers, digital archives of significant individuals or institutions
  2. Born-digital university archives – selected records of the University of Cambridge
  3. Research outputs – research data and publications (including compliance)
  4. Published born-digital materials – physical format carriers (optical media), eBooks, web archives, archival and access copies of electronic subscription services, etc.
  5. Digitised image materials – 2D photography (and 3D imaging)
  6. Digital (and analogue) audiovisual materials – moving image (film and video) and sound recordings
  7. In-house created content – photography and videography of events, lectures, photos of conservation treatments, etc.

Proposed case studies

Approximately 40 potential case studies suggested by CUL and Affiliated Library staff were considered. These proposed case studies were selected from digital materials in our existing collections, current acquisition offers, and requests for assistance with digital collection materials, from across Cambridge University. Each proposed case study would allow us to trial different tools (and digital preservation systems), approaches, workflow stages, and represent different ‘classes’ of material.

Digital lifecycle stages

The selected stages are based on a draft Digital Stewardship End-to-End Workflow I am developing. The workflow includes approximately a dozen different stages. It is based on the Digital Curation Centre’s Curation Lifecycle Model, and is also aligned with the Digital POWRR (Preserving Digital Objects with Restricted Resources) Tool Evaluation Grid.

There are also additional essential concerns, including:

  • data security
  • integration (with CUL systems and processes)
  • preservation risk
  • remove and/or delete
  • reporting
  • resources and resourcing
  • system configuration

Selected stages for Cambridge’s case studies

Dave, Lee and I discussed the stages and cut it down to the bare-minimum required to test out various tasks as part of the case studies. These stages include:

  1. Appraise and Select
  2. Acquire / Transfer
  3. Pre-Ingest (including Preconditioning and Quality Assurance)
  4. Ingest (including Generate Submission Information Package)
  5. Preservation Actions (sub-component of Preserve)
  6. Access and Delivery
  7. Integration (with Library systems and processes) and Reporting

Case study selection

In order to produce a shortlist, I needed to work out a parameter best suited in order to rank the proposed case studies from a digital preservation perspective. The initial parameter we decided on was complexity. Did the proposed case study provide enough technical challenges to fully test out what we needed to research?

We also took into account a Streams Matrix (still in development) that outlines different tasks taken at each of the at each of the selected digital life cycle stages. This would ensure different variations of activities were factored in at each stage.

We revisited the case studies once in ranked order and reviewed them, taking into account additional parameters. The additional parameters included:

  • Frequency and/or volume – how much of this type of material do we have/are we likely to acquire (i.e. is this a type of task that would need to be carried out often)?
  • Significance – how significant is the collection in question?
  • Urgency – does this case study fit within strategic priorities such as the current Cambridge University Library Strategic Plan and Collection Development Policy Framework etc.?
  • Uniqueness – is the case study unique and would it be of interest to our users (e.g. the digital preservation field, Cambridge University researchers)?
  • Value to our users and/or stakeholders – is this of value to our current and future users, researchers and/or stakeholders?

This produced a shortlist of eight case studies. We concluded that each provided different long-term digital preservation issues and were experiencing considerable degrees of ‘preservation risk’.

Conclusion

This was a challenging and time-consuming approach, however it ensures fairness in the selection process. The case studies will enable us to have tangible evidence in which to ground the work of the rest of the project. The Cambridge University Library Polonsky Digital Preservation Project Board have agreed that we will undertake three case studies, including a digitisation case study, a born-digital case study and one more – the details of which are still being discussed. Stay tuned for more updates.

Customizable JHOVE TIFF output handler anyone?

Technical Fellow, James, talks about the challenges with putting JHOVE’s full XML output into a reporting tool and how he found a work around. We would love feedback about how you use JHOVE’s TIFF output. What workarounds have you tried to extract the data for use in reporting tools and what do you think about having a customizable TIFF output handler for JHOVE? 


As mentioned in my last blog post, I’ve been looking to validate a reasonably large collection of TIFF master image files from a digitization project. On a side note from that, I would like to talk about the output from JHOVE’s TIFF module.

The JHOVE TIFF module allows you to specify an output handler as either Text, a XML audit, or a full XML output format.

Text provides a straight forward line by line breakdown of the various characteristics and properties of each TIFF processed. But not being a structured document means that processing the output when many files are characterized is not ideal.

The XML audit output provides a very minimal XML document which will simply report if the TIFF files were valid and well formed or not; this is great to a quick check, but lacks some other metadata properties that I was looking for.

The full XML output provides the same information that was provided in text output format, but with the advantage of being a structural document. However, I’ve found some of the additional metadata structuring in the full XML rather cumbersome to process with further reporting tools.

As result, I’ve been struggling a bit to extract all of the properties I would like from the full XML output into a reporting tool. I then started to wonder about having a more customizable output handler which would simply report the the properties I required in a neat and easier to parse XML format.

I had looked at using an XSLT transformation on the XML output but, as mentioned, I found it rather complicated to extract some of the metadata property values I wanted due to the excessive nesting of these and the property naming structure. I think I need to brush up on my XSLT skills perhaps?

In the short term, I’ve converted the XML output to a CSV file, using a little freeware program called XML2CSV from A7Soft. Using the tool, I selected the various fields required (filename, last modified date, size, compression scheme, status, TIFF version, image width & height, etc) for my reporting. Then, the conversion program extracted the selected values, which provided a far simpler and smaller document to process in the reporting tool.

I would be interested to know what others have done when confronted with the XML output and wonder if there is any mileage in a more customizable output handler for the TIFF module…

 

Update 31st May 2017

Thanks to Ross Spencer, Martin Hoppenheit and others from Twitter. I’ve now created a basic JHOVE XML to CSV XSLT stylesheet. Draft version on my GitHub should anyone want to do something similar.

Skills interviewing using the DPOC skills interview toolkit

Cambridge Outreach & Training Fellow, Lee, shares his experiences in skills auditing.


As I am nearing the end of my fourteenth transcription and am three months into skills interview process, now is a good time to pause and reflect. This post will look at the experience of the interview process using the DPOC digital preservation skills toolkit. this toolkit is currently under development; we are learning and improving it as we trial it at Cambridge and Oxford.

Step 1: Identify your potential participants

To understand colleagues’ use of technology and training needs, a series of interviews were arranged. We agreed that a maximum sample of 25 participants would give us plenty (perhaps too much?) of material to work with. Before invitations were sent out, a list was made up of potential participants. In building the list, a set of criteria ensured that a broad range of colleagues were captured. This criteria consisted of:

  • in what department or library do they work?
  • is there a particular bias of colleagues from a certain department or library and can this be redressed?
  • what do they do?
  • is there a suitable practitioner to manager ratio?

The criteria relies on you having a good grasp of your institution, its organisation and the people within it. If you are unsure, start asking managers and colleagues who do know your institution very well—you will learn a lot! It is also worth having a longer list than your intended maximum in case you do not get responses, or people are not available or do not wish to participate.

Step 2: Inviting your potential participants

Prior to sending out invitations, the intended participant’s managers were consulted to see if they would agree to their staff time being used in this way. This was also a good opportunity to continue awareness raising of the project as well as getting buy-in to the the interview process.

The interviews were arranged in blocks of five to make planning around other work easier.

Step 3: Interviewing

The DPOC semi-structured skills interview questions were put to the test at this step. Having developed the questions beforehand ensured I covered the necessary digital preservation skills during the interview.

Here are some tips I gained from the interview process which helped to get some great responses.

  • Offer refreshments before the interview. Advise beforehand that a generous box of chocolate biscuits will be available throughout proceeding. This also gives you an excellent chance to talk informally to your subject and put them at ease, especially if they appear nervous.
  • If using, make sure your recording equipment is working. There’s nothing worse than thinking you have fifty minutes of interview gold only to find that you’ve not pressed play or the device has run out of power. Take a second device, or if you don’t want the technological hassle, use pen(cil) and paper.
  • Start with colleagues that you know quite well. This will help you understand the flow of the questions better and they will not shy away from honest feedback.
  • Always have printed copies of interview questions. Technology almost always fails you.

My next post will be about transcribing and analysing interviews.

Over 20 years of digitization at the Bodleian Libraries

Policy and Planning Fellow Edith writes an update on some of her findings from the DPOC project’s survey of digitized images at the Bodleian Libraries.


During August-December 2016 I have been collating information about Bodleian Libraries’ digitized collections. As an early adopter of digitization technology, Bodleian Libraries have made digital surrogates of its collections available online since the early 1990’s. A particular favourite of mine, and a landmark among the Bodleian Libraries’ early digital projects, is the Toyota Transport Digitization Project (1996). [Still up and running here]

At the time of the Toyota Project, digitization was still highly specialised and the Bodleian Libraries opted to outsource the digital part to Laser Bureau London. Laser Bureau ‘digitilised’ 35mm image negatives supplied by Bodleian Libraries’ imaging studio and sent the files over on a big bundle of CDs. 1244 images all in all – which was a massive achievement at the time. It is staggering to think that we could now produce the same many times over in just a day!

Since the Toyota projects completion twenty years ago, Bodleian Libraries have continued large scale digitization activities in-house via its commercial digitization studio, outsourced to third party suppliers, and in project partnerships. With generous funding from the Polonsky Foundation the Bodleian Libraries are now set to add over half a million image surrogates of Special Collection manuscripts to its image portal – Digital.Bodleian.

What happens to 20 years’ worth of digitized material? Since 1996 both Bodleian Libraries and digitization standards have changed massively. Early challenges around storage alone have meant that content inevitably has been squirreled away in odd locations and created to the varied standards of the time. Profiling our old digitized collections is the first step to figuring out how these can be brought into line with current practice and be made more visible to library users.

“So what is the extent of your content?”, librarians from other organisations have asked me several times over the past few months. In the hope that it will be useful for other organisations trying to profile their legacy digitized collections, I thought I would present some figures here on the DPOC blog.

When tallying up our survey data, I came to a total of approximately 134 million master images in primarily TIFF and JP2 format. From very early digitization projects however, the idea of ‘master files’ was not yet developed and master and access files will, in these cases, often be one and the same.

The largest proportion of content, some 127,000,000 compressed JP2s, were created as part of the Google Books project up to 2009 and are available via Search Oxford Libraries Online. These add up to 45 TB of data. The library further holds three archives of 5.8million/99.4TB digitized image content primarily created by the Bodleian Libraries’ in-house digitization studio in TIFF. These figures does not include back-ups – with which we start getting in to quite big numbers.

Of the remaining 7 million digitized images which are not from the Google Books project, 2,395,000 are currently made available on a Bodleian Libraries website. In total the survey examined content from 40 website applications and 24 exhibition pages. 44% of the images which are made available online were, at the time of the survey, hosted on Digital.Bodleian, 4% on ODL Greenstone and 1% on Luna.The latter two are currently in the processes of being moved onto Digital.Bodleian. At least 6% of  content from the sample was duplicated across multiple website applications and are candidates for deduplication. Another interesting fact from the survey is that JPEG, JP2 (transformed to JPEG on delivery) and GIF are by far the most common access/derivative formats on Bodleian Libraries’ website applications.

The final digitized image survey report has now been reviewed by the Digital Preservation Coalition and is being looked at internally. Stay tuned to hear more in future blog posts!

Validating half a million TIFF files. Part One.

Oxford Technical Fellow, James, reports on the validation work he is doing with JHOVE and DPF Manager in Part One of this blog series on validation tools for auditing the Polonsky Digitization Project’s TIFF files.


In 2013, The Bodleian Libraries of the University of Oxford and the Biblioteca Apostolica Vaticana (Vatican Library) joined efforts in a landmark digitization project. The aim was to open up their repositories of ancient texts including Hebrew manuscripts, Greek manuscripts, and incunabula, or 15th-century printed books. The goal was to digitize over one and half million pages. All of this was made possible by funding from the Polonsky Foundation.

As part of our own Polonsky funded project, we have been preparing the ground to validate over half a million TIFF files which have been created from digitization work here at Oxford.

Many in the Digital Preservation field have already written articles and blogs on the tools available for validating TIFF files, Yvonne Tunnat (from ZBW Leibniz Information Centre for Economics) wrote a blog for the Open Preservation Foundation regarding the tools. I also had the pleasure of hearing from Yvonne and Michelle Lindlar (from TIB Leibniz Information Centre for Science and Technology) talk at IDCC 2017 conference on this very subject in more detail when discussing JHOVE in their talk, How Valid Is Your Validation? A Closer Look Behind The Curtain Of JHOVE

The go-to validator for TIFF files?

Preparation for validation

In order to validate the master TIFF files, firstly we needed to retrieve these from our tape storage system; fortunately around two-thirds of the images had already been restored to spinning disk storage as part of another internal project. When the master TIFF files were written to tape this included MD5 hashes of the files, so as part of this validation work we will confirm the fixity of all the files. Our network storage system had plenty of room to accommodate all the required files, so we began auditing what still needed to be recovered.

Whilst the auditing and retrieval was progressing, I set about investigating validating a sample set of master TIFF files using both JHOVE and DPF Manager to get an estimate on the time it would take to process the approximate 50 TB of files. I was also interested to compare the results of both tools when faced with invalid or corrupted sample sets of files.

We setup a new virtual machine server in order to carry out the validation workload; this allowed us to scale this machine’s performance as required. Both validation tools were going to be run on a RedHat Linux environment and both would be run from the command line.

It quickly became clear that JHOVE was going to be able to validate the TIFF files a lot quicker than DPF Manager. If DPF Manager is being used as part of one of your workflows, you may not have noticed any real-time penalty when processing small numbers of files, however with a large batch, the time difference with the two tools was noticeable.

Potential alternative for TIFF validation?

During the testing I noticed there were several issues with DPF Manager, including the lack of being able to specify the number of threads the process could use, which I suspect resulted in the poor initial performance. I dutifully reported the bug to the DPF community GitHub and was pleased to see an almost instant response stating that it would be resolved in the next monthly release. I do love Open Source projects, and I think this highlights the importance of those using the tools being responsible for improving them. Without community engagement, these projects are liable to run out of steam and slowly die.

I’m going to reserve judgement on the tools until the next release of DPF Manager. We will then also be in a position to report back on our findings from this validation case study. So check back with our blog for Part Two.

I would be interested to hear from anyone else who might have been faced with validating large batches of files, what tools are you using? what challenges have you faced? Do let me know!

Designing digital preservation training – it’s more than just talking

Sarah, Oxford’s Outreach and Training Fellow, writes about the ‘training cycle’ and concludes that delivering useful training is more than just talking at learners.


We have all been there before: trying to keep our eyes open as someone drones on in the front of the room, while the PowerPoint slides seem to contain a novella that hurts your eyes to squint to read. That’s not how training is supposed to go.

Rather, engaging your learner in a variety activities will help them retain knowledge. And in a field like digital preservation, the more hands-on the training, the better. So often we talk about concepts or technical tools, but we very rarely provide examples, demonstrate them, or (better yet) have staff experiment with them.

And training is just one small part of the training process. I’ve learned there are many steps involved in developing a course that will be of use to staff. Most of your time will not be spent in the training room.

Identifying Learner’s Needs

Often easier said than done. It’s better to prepare for all types of learners and pitch the material to a wide audience. With hands-on tasks, it’s possible to have additional work prepared for advanced learners, so they don’t get bored while other learners are still working through the task.

Part of the DPOC project has been about finding the gaps in digital preservation skills and knowledge, so that our training programmes can better meet staff’s needs. What I am learning is that I need to cast my net wide to reach everyone!

Planning and Preparation

The hard bit. Start with what your outcomes are going to be and try not to put too many into a session. It’s too easy to be extra ambitious. Once you have them, then you pick your activities, gather your materials (create that PowerPoint) and practise! Never underestimate the value of practising your session on your peers beforehand.

Teaching and Learning

The main event. It’s important to be confident, open and friendly as a trainer. I admit, I stand in the bathroom and do a “Power Pose” for a few minutes to psyche myself up. You are allowed nerves as a trainer! It’s important to be flexible during the course

Assessment

Because training isn’t just about Teaching and Learning. That only accounts for 1/5th of the training cycle. Assessment is another 1/5th and if that’s going to happen during the course, then it needs to be planned. Using a variety of the activities mentioned above will help with that. Be aware though: activities almost always take longer than you plan! 

Activities to facilitate learning:

  • questioning
  • group activities such as, case studies, card sorting, mindmapping, etc.
  • hands-on tasks with software
  • group discussions
  • quizzes and games
  • modelling and demonstrations followed by an opportunity to practise the skill

Evaluation

Your evaluation is crucial to this. Make notes after your session on what you liked and what you need to fix. Peer evaluation is also important and sending out surveys immediately after will help with response rates. However, if you can do a paper evaluation at the end of the course, your response rates will be higher. Use that feedback to improve the course, tweak activities and content, so that you can start all over again.

(Mis)Adventures in guest blogging

Sarah shares her recent DPC guest blogging experience. The post is available to read at: http://www.dpconline.org/blog/beware-of-the-leopard-oxford-s-adventures-in-the-bottom-drawer 


As members of the Digital Preservation Coalition (DPC), we have the opportunity to contribute to their blog on issues in digital preservation. As the Outreach & Training Fellow at Oxford, that tasks falls upon me when its our turn to contribute.

You would think that because I contribute to this blog regularly,  I’d be an old hat at blogging. It turns out that writer’s block can hit at precisely the worst possible time. But, I forced out what I could and then turned to the other Fellows at Oxford for support. Edith and James both added their own work to the post.

With a final draft ready, the day approached when we could submit it to the blog. Even the technically-minded struggled with technology now and again. First, it was the challenge of uploading images—it only took about 2 or 3 tries and then I deleted the evidence mistakes. Finally, I clicked ‘submit’ and waited for confirmation.

And I waited…

And got sent back to the homepage. Then I got a ‘failure notice’ email that said “I’m afraid I wasn’t able to deliver your message to the following addresses. This is a permanent error; I’ve given up. Sorry it didn’t work out.” What just happened? Did it work or not?

So I tried again….

And again…

And again.  I think I submitted 6 more times before I emailed to the DPC to ask what I had done wrong. I had done NOTHING wrong, except press ‘submit’ too much. There were as many copies waiting for approval as there were times when I had hit ‘submit’. There was no way to delete the evidence, so I couldn’t avoid that embarrassment.

Minus those technological snafus, everything worked and the DPOC team’s first guest blog post is live! You can read the post here for an Oxford DPOC project update.

Now that I’ve got my technological mistakes out of the way, I think I’m ready to continue contributing to the wider digital preservation community through guest blogging. We are a growing (but still relatively small) community and sharing our knowledge, ideas and experiences freely through blogs is important. We rely on each other to navigate the field where things can be complex and ever-changing. Journals and project websites date quickly, but community-driven and non-profit blogs remain a good source of relevant and immediate information. They are valuable part of my digital preservation work and I am happy to be giving back.