by Erin Faulder
Development of the Digital Processing Framework began after the second annual Born Digital Archiving eXchange unconference at Stanford University in 2016. There, a group of nine archivists saw a need for standardization, best practices, or general guidelines for processing digital archival materials. What came out of this initial conversation was the Digital Processing Framework (https://hdl.handle.net/1813/57659) developed by a team of 10 digital archives practitioners: Erin Faulder, Laura Uglean Jackson, Susanne Annand, Sally DeBauche, Martin Gengenbach, Karla Irwin, Julie Musson, Shira Peltzman, Kate Tasker, and Dorothy Waugh.
An initial draft of the Digital Processing Framework was presented at the Society of American Archivists’ Annual meeting in 2017. The team received feedback from over one hundred participants who assessed whether the draft was understandable and usable. Based on that feedback, the team refined the framework into a series of 23 activities, each composed of a range of assessment, arrangement, description, and preservation tasks involved in processing digital content. For example, the activity Survey the collection includes tasks like Determine total extent of digital material and Determine estimated date range.
The Digital Processing Framework’s target audience is folks who process born digital content in an archival setting and are looking for guidance in creating processing guidelines and making level-of-effort decisions for collections. The framework does not include recommendations for archivists looking for specific tools to help them process born digital material. We draw on language from the OAIS reference model, so users are expected to have some familiarity with digital preservation, as well as with the management of digital collections and with processing analog material.
Processing born-digital materials is often non-linear, requires technical tools that are selected based on unique institutional contexts, and blends terminology and theories from archival and digital preservation literature. Because of these characteristics, the team first defined 23 activities involved in digital processing that could be generalized across institutions, tools, and localized terminology. These activities may be strung together in a workflow that makes sense for your particular institution. They are:
- Survey the collection
- Create processing plan
- Establish physical control over removeable media
- Create checksums for transfer, preservation, and access copies
- Determine level of description
- Identify restricted material based on copyright/donor agreement
- Gather metadata for description
- Add description about electronic material to finding aid
- Record technical metadata
- Create SIP
- Run virus scan
- Organize electronic files according to intellectual arrangement
- Address presence of duplicate content
- Perform file format analysis
- Identify deleted/temporary/system files
- Manage personally identifiable information (PII) risk
- Normalize files
- Create AIP
- Create DIP for access
- Publish finding aid
- Publish catalog record
- Delete work copies of files
Within each activity are a number of associated tasks. For example, tasks identified as part of the Establish physical control over removable media activity include, among others, assigning a unique identifier to each piece of digital media and creating suitable housing for digital media. Taking inspiration from MPLP and extensible processing methods, the framework assigns these associated tasks to one of three processing tiers. These tiers include: Baseline, which we recommend as the minimum level of processing for born digital content; Moderate, which includes tasks that may be done on collections or parts of collections that are considered as having higher value, risk, or access needs; and Intensive, which includes tasks that should only be done to collections that have exceptional warrant. In assigning tasks to these tiers, practitioners balance the minimum work needed to adequately preserve the content against the volume of work that could happen for nuanced user access. When reading the framework, know that if a task is recommended at the Baseline tier, then it should also be done as part of any higher tier’s work.
We designed this framework to be a step towards a shared vocabulary of what happens as part of digital processing and a recommendation of practice, not a mandate. We encourage archivists to explore the framework and use it however it fits in their institution. This may mean re-defining what tasks fall into which tier(s), adding or removing activities and tasks, or stringing tasks into a defined workflow based on tier or common practice. Further, we encourage the professional community to build upon it in practical and creative ways.
Erin Faulder is the Digital Archivist at Cornell University Library’s Division of Rare and Manuscript Collections. She provides oversight and management of the division’s digital collections. She develops and documents workflows for accessioning, arranging and describing, and providing access to born-digital archival collections. She oversees the digitization of analog collection material. In collaboration with colleagues, Erin develops and refines the digital preservation and access ecosystem at Cornell University Library.