Here are a few ideas that were discussed during the two chats:
Backlogs, workflows, delivery mechanisms, lack of known standards, appraisal and familiarity with software were major barriers to providing access.
Participants were eager to learn more about new tools, existing functioning systems, providing access to restricted material and complicated objects, which institutions are already providing access to data, what researchers want/need, and if any user testing has been done.
Access is being prioritized by user demand, donor concerns, fragile formats and a general mandate that born-digital records are not preserved unless access is provided.
Very little user testing has been done.
A variety of archivists, IT staff and services librarians are needed to provide access.
You can search #bdaccess on Twitter to see how the conversation evolves or view the complete conversation from these chats on Storify.
Daniel Johnson is the digital preservation librarian at the University of Iowa, exploring, adapting, and implementing digital preservation policies and strategies for the long-term protection and access to digital materials.
Seth Anderson is the project manager of the MoMA Electronic Records Archive initiative, overseeing the implementation of policy, procedures, and tools for the management and preservation of the Museum of Modern Art’s born-digital records.
Welcome to the Canadian edition of International Perspectives on Digital Preservation. My name is Alexandra Jokinen. I am the new(ish) Digital Archives Intern at Dalhousie University in Halifax, Nova Scotia. I work closely with the Digital Archivist, Creighton Barrett, to aid in the development of policies and procedures for some key aspects of the University Libraries’ digital archives program—acquisitions, appraisal, arrangement, description, and preservation.
One of the ways in which we are beginning to tackle this very large, very complex (but exciting!) endeavour is to execute digital preservation on a small scale, focusing on the processing of digital objects within a single collection, and then using those experiences to create documentation and workflows for different aspects of the digital archives program.
The collection chosen to be our guinea pig was a recent donation of work from esteemed Canadian ecologist and environmental scientist, Bill Freedman, who taught and conducted research at Dalhousie from 1979 to 2015. The fonds is a hybrid of analogue and digital materials dating from 1988 to 2015. Digital media carriers include: 1 computer system unit, 5 laptops, 2 external hard drives, 7 USB flash drives, 5 zip disks, 57 CDs, 6 DVDs, 67 5.25 inch floppy disks and 228 3.5 inch floppy disks. This is more digital material than the archives is likely to acquire in future accessions, but the Freedman collection acted as a good test case because it provided us with a comprehensive variety of digital formats to work with.
Our first area of focus was appraisal. For the analogue material in the collection, this process was pretty straightforward: conduct macro-appraisal and functional analysis by physically reviewing material. However, (as could be expected) appraisal of the digital material was much more difficult to complete. The archives recently purchased a forensic recovery of evidence device (FRED) but does not yet have all the necessary software and hardware to read the legacy formats in the collection (such as the floppy disks and zip disks), so, we started by investigating the external hard drives and USB flash drives. After examining their content, we were able to get an accurate sense of the information they contained, the organizational structure of the files, and the types of formats created by Freedman. Although, we were not able to examine files on the legacy media, we felt that we had enough context to perform appraisal, determine selection criteria and formulate an arrangement structure for the collection.
The next step of the project will be to physically organize the material. This will involve separating, photographing and reboxing the digital media carriers and updating a new registry of digital media that was created during a recent digital archives collection assessment modelled after OCLC’s 2012 “You’ve Got to Walk Before You Can Run” research report. Then, we will need to process the digital media, which will entail creating disk images with our FRED machine and using forensic tools to analyze the data. Hopefully, this will allow us to apply the selection criteria used on the analogue records to the digital records and weed out what we do not want to retain. During this process, we will be creating procedure documentation on accessioning digital media as well as updating the archives’ accessioning manual.
The project’s final steps will be to take the born-digital content we have collected and ingest it using Archivematica to create Archival Information Packages for storage and preservation and accessed via the Archives Catalogue and Online Collections.
So there you have it! We have a long way to go in terms of digital preservation here at Dalhousie (and we are just getting started!), but hopefully our work over the next several months will ensure that solid policies and procedures are in place for maintaining a trustworthy digital preservation system in the future.
Alexandra Jokinen has a Master’s Degree in Film and Photography Preservation and Collections Management from Ryerson University in Toronto. Previously, she has worked as an Archivist at the Liaison of Independent Filmmakers of Toronto and completed a professional practice project at TIFF Film Reference Library and Special Collections.
The University of Illinois at Urbana-Champaign’s (Illinois) library-based Research Data Service (RDS) will be launching an institutional data repository, the Illinois Data Bank (IDB), in May 2016. The IDB will provide University of Illinois researchers with a repository for research data that will facilitate data sharing and ensure reliable stewardship of published data. The IDB is a web application that transfers deposited datasets into Medusa, the University Library’s digital preservation service for the long-term retention and accessibility of its digital collections. Content is ingested into Medusa via the IDB’s unmediated self-deposit process.
As we conceived of and developed our dataset curation workflow for digital datasets ingested in the IDB, we turned to archivists in the University Archives to gain an understanding of their approach to processing digital materials. [Note: I am not specifying whether data deposited in the IDB is “born digital” or “digitized” because, from an implementation perspective, both types of material can be deposited via the self-deposit system in the IDB. We are not currently offering research data digitization services in the RDS.] There were a few reasons for consulting with the archivists: 1) Archivists have deep, real-world curation expertise and we anticipate that many of the challenges we face with data will have solutions whose foundations were developed by archivists and 2) If, through discussing processes, we found areas where the RDS and Archives have converging preservation or curation needs, we could communicate these to the Preservation Services Unit, who develops and manages Medusa, and 3) I’m an archivist by training and I jump on any opportunity to talk with archivists about archives!
Even though the RDS and the University Archives share a central goal–to preserve and make accessible the digital objects that we steward–we learned that there are some operational and policy differences between our approaches to digital stewardship that necessitate points of variance in our processing/curation workflow:
Appraisal and Selection
In my view, appraisal and selection are fundamental to the archives practice. The archives field has developed a rich theoretical foundation when it comes to appraisal and selection, and without these functions the archives endeavor would be wholly unsustainable. Appraisal and selection ideally tend to occur in the very early stages of the archival processing workflow. The IDB curation workflow will differ significantly–by and large, appraisal and selection procedures will not take place until at least five years after a dataset is published in the IDB–making our appraisal process more akin to that of an archives that chooses to appraise records after accessioning or even during the processing of materials for long-term storage. Our different approaches to appraisal and selection speak to the different functions the RDS and the University Archives fulfill within the Library and the University.
The University Archives is mandated to preserve University records in perpetuity by the General Rules of the University, the Illinois State Records Act. The RDS’s initiating goal, in contrast, is to provide a mechanism for Illinois researchers to be compliant with funder and/or journal requirements to make results of research publicly available. Here, there is no mandate for the IDB to accept solely what data is deemed to have “enduring value” and, in fact, the research data curation field is so new that we do not yet have a community-endorsed sense of what “enduring value” means for research data. Standards regarding the enduring value of research data may evolve over the long-term in response to discipline-specific circumstances.
To support researchers’ needs and/or desires to share their data in a simple and straightforward way, the IDB ingest process is largely unmediated. Depositing privileges are open to all campus affiliates who have the appropriate University log-in credentials (e.g., faculty, graduate students, and staff), and deposited files are ingested into Medusa immediately upon deposit. RDS curators will do a cursory check of deposits, as doing so remains scalable (see workflow chart below), and the IDB reserves the right to suppress access to deposits for a “compelling reason” (e.g., failure to meet criteria for depositing as outlined in the IDB Accession Policy, violations of publisher policy, etc.). Aside from cases that we assume will be rare, the files as deposited into the IDB, unappraised, are the files that are preserved and made accessible in the IDB.
A striking policy difference between the RDS and the University Archives is that the RDS makes a commitment to preserving and facilitating access to datasets for a minimum of five years after the date of publication in the Illinois Data Bank.
The University Archives, of course, makes a long-term commitment to preserving and making accessible records of the University. I have to say, when I learned that the five-year minimum commitment was the plan for the IDB, I was shocked and a bit dismayed! But after reflecting on the fact that files deposited in the IDB undergo no formal appraisal process at ingest, the concept began to feel more comfortable and reasonable. At a time when terabytes of data are created, oftentimes for single projects, and budgets are a universal concern, there are logistical storage issues to contend with. Now, I fully believe that for us to ensure that we are able to 1) meet current, short-term data sharing needs on our campus and 2) fulfill our commitment to stewarding research data in an effective and scalable manner over time, we have to make a circumspect minimum commitment and establish policies and procedures that enable us to assess the long-term viability of a dataset deposited into the IDB after five years.
The RDS has collaborated with archives and preservation experts at Illinois and, basing our work in archival appraisal theory, have developed guidelines and processes for reviewing published datasets after their five-year commitment ends to determine whether to retain, deaccession, or dedicate more stewardship resources to datasets. Enacting a systematic approach to appraising the long-term value of research data will enable us to allot resources to datasets in a way that is proportional to the datasets’ value to research communities and its preservation viability.
To show that we’re not all that different after all, I’ll briefly mention a few areas where the University Archives and the RDS are taking similar approaches or facing similar challenges:
We are both taking an MPLP-style approach to file conversion. In order to get preservation control of digital content, at minimum, checksums are established for all accessioned files. As a general rule, if the file can be opened using modern technology, file conversion will not be pursued as an immediate preservation action. Establishing strategies and policies for managing a variety of file formats at scale is an area that will be evolving at Illinois through collaboration of the University Archives, the RDS, and the Preservation Services Unit.
Accruals present metadata challenges. How do we establish clear accrual relationships in our metadata when a dataset or a records series is updated annually? Are there ways to automate processes to support management of accruals?
Both units do as much as they can to get contextual information about the material being accessioned from the creator, and metadata is enhanced as possible throughout curation/processing.
The University Archives and the RDS control materials in aggregation, with the University Archives managing at the archival collection level and the RDS managing digital objects at the dataset level.
More? Certainly! For both the research data curation community and the archives community, continually adopting pragmatic strategies to manage the information created by humans (and machines!) is paramount, and we will continue to learn from one another.
The following represents our planned functional workflow for handling dataset deposits in the Illinois Data Bank:
To learn more about the IDB policies and procedures discussed in this post, keep an eye on the Illinois Data Bank website after it launches next month. Of particular interest on the Policies page will be the Accession Policy and the Preservation Review, Retention, Deaccession, Revision, and Withdrawal Procedure document.
Bethany Anderson and Chris Prom of the University of Illinois Archives
The rest of the Research Data Preservation Review Policy/Procedures team: Bethany Anderson, Susan Braxton, Heidi Imker, and Kyle Rimkus
The rest of the RDS team: Qian Zhang, Elizabeth Wickes, Colleen Fallaw, and Heidi Imker
Elise Dunham is a Data Curation Specialist for the Research Data Service at the University of Illinois at Urbana-Champaign. She holds an MLS from the Simmons College Graduate School of Library and Information Science where she specialized in archives and metadata. She contributes to the development of the Illinois Data Bank in areas of metadata management, repository policy, and workflow development. Currently she co-chairs the Research Data Alliance Archives and Records Professionals for Research Data Interest Group and is leading the DACS workshop revision working group of the Society of American Archivists Technical Subcommittee for Describing Archives: A Content Standard.
At SAA 2015, Courtney Mumma (formerly of Artefactual Systems) and I participated in a panel discussion at the Electronic Records Section meeting on “implementing digital preservation tools and systems,” with a focus on “the lessons learned through the planning, development, testing, and production of digital preservation applications.”
The University of Michigan’s Bentley Historical Library is in the midst of a two-year project (2014-2016) funded by the Andrew W. Mellon Foundation to integrate ArchivesSpace, Archivematica, and DSpace in an end-to-end digital archives workflow (for more information on the project itself, see our blog).
Artefactual Systems is responsible for the development work on the project, which has involved adding new functionality to the Archivematica digital preservation system to permit the appraisal and arrangement of digital archives as well as the integration of ArchivesSpace functionality within Archivematica so that users can create and edit archival description in addition to associating digital objects with that information.