Data reproducibility, provenance capture and preservation

An update from the Cambridge Fellows about their visit to the Cambridge Computer Laboratory to learn about the team’s research on provenance metadata.


In amongst preparing reports for the powers that be and arranging vendor meetings, Dave and Lee took a trip over to the William Gates Building which houses the University of Cambridge’s Computer Laboratory. The purpose of the visit was to find out about the Digital Technology Group’s  projects from one of their Senior Research Associates, Dr. Ripduman Sohan. 

The particular project was the FRESCO project which stands for Fabric For Reproducible Computing. You can find out more about the strands of this project here: https://www.cl.cam.ac.uk/research/dtg/fresco. The link to the poster is especially useful and clearly and succintly captures the key points of the meeting far better than my meeting notes.

Cambridge Computer Laboratory - FRESCO Poster

FRESCO Poster. Image credit: Cambridge Computer Laboratory.

The discussion on provenance was of interest to me coming from an recordkeeping background and hearing it discussed in computer science terms. What he was talking about and what archivists do really wasn’t a million miles apart – just that the provenance capture on the data happens in nanoseconds on mind blowing amounts of data.

Rip’s approach, to my ears at least, was refreshing. He believes that computer scientists should start to listen to, move across into and understand ‘other’ domains like the humanities. Computer science should be ‘computing for the future of the planet’ and not a subject that should impose itself on other disciplines which creates a binary choice of the CompSci way or the highway. This is so they can use their computer science skills to help both future research and the practitioners working with humanities information and data.

Leave a Reply

Your email address will not be published.