SAA 2019 recap | Session 204: Demystifying the Digital: Providing User Access to Born-Digital Records in Varying Contexts

by Steven Gentry


Introduction

Session 204 addressed how three dissimilar institutions—North Carolina State University (NCSU), the Wisconsin Historical Society (WHS), and the Canadian Centre for Architecture (CCA)—are connecting their patrons with born-digital archival content. The panelists consisted of Emily Higgs (NCSU Libraries Fellow, North Carolina State University), Hannah Wang (Electronic Records & Digital Preservation Archivist, Wisconsin Historical Society), and Stefana Breitwieser (Digital Archivist, Canadian Centre for Architecture). In addition, Kelly Stewart (Director of Archival and Digital Preservation Services, Artefactual Systems) briefly spoke about the development of SCOPE, the tool featured in Breitwieser’s presentation.

Note: The content of this recap has been paraphrased from the panelists’ presentations and all quoted content is drawn directly from the panelists’ presentations.

Session summary

Emily Higgs’s presentation focused on the different ways that NCSU’s Special Collections Research Center (SCRC) staff enhance access to their born-digital archives. After a brief overview of NCSU’s collections, Higgs first described their lightweight workflow for bridging researchers and requested digital content, a process that involves SCRC staff accessing an administrator account on a reading room Macbook; transferring copies of requested content to a read-only folder shared with a researcher account; and limiting the computer’s overall  capabilities, such as restricting its internet and ports (the latter is accomplished via Endpoint Protector Basic). Should a patron want copies of the material, they simply drag and drop those resources into another folder for SCRC staff to review.

Higgs then described an experimental Named Entity Recognition (NER) workflow that employs spaCy and which allows archivists to better describefiles in NCSU’s finding aids.The workflow employs a Jupyter notebook (see her Github repository for more information) to automate the following process:

  • “Define directory [to be analyzed by spaCy].”
  • “Walk directory…[to retrieve] text files [such as PDFs].”
  • “Extract text (textract).”
  • “Process and NER (spaCy).”
  • “Data cleaning.”
  • “Ranked output of entities (csv) [which is based on the number of times a particular name appears in the files].”

Once the process is completed, the most frequent 5-10 names are placed in an ArchivesSpace scope and content note. Higgs concluded by emphasizing this workflow’s overall ease of use and noting that—in the future—staff will integrate application programming interfaces (APIs) to enhance the workflow’s efficiency.

Next to speak was Hannah Wang, who addressed how Wisconsin Historical Society (WHS) has made its born-digital state government records more accessible. Wang began her presentation by discussing the Wisconsin State Preservation of Electronic Records Project (WiSPER) Project and its two goals:

  • “Ingest a minimum of 75 GB of scheduled [and processed] electronic records from state agencies.”
  • “Develop public access interface.” 

And explained the reasons behind Preservica’s selection:

  • WHS’s lack of significant IT support meant an easily implementable tool was preferred over open-source and/or homegrown solutions.
  • Preservica allowed WHS to simultaneously preserve and provide (tiered) access to digital records.
  • Preservica has a public-facing WordPress site, which fulfilled the second WiSPER grant objective.

Wang then addressed how WHS staff appropriately restricted access to digital records by placing records into one of three groupings:

  • “Content that has a legal restriction.”
  • “Content that requires registration and onsite viewing [such as email addresses].”
  • “Open, unrestricted content.” 

WHS staff actually achieved this goal by employing different methods to locate and restrict digital records:

  • For identification: 
    • Reviewing “[record] retention schedules…[and consulting with] agency [staff who would notify WHS personnel of sensitive content].” 
    • Using resources like bulk extractor
    • Reading records if necessary.
  • For restricting records:
    • Employing scripts—such as batch scripts—to transfer and restrict individual files and whole SIPs.

Wang demonstrated how WHS makes its restricted content accessible via Preservica:

  • “Content that has a legal restriction”: Only higher levels of description can be searched by external researchers, although patrons have information concerning how to access this content.
  • “Content that requires registration and onsite viewing”: Individual files can be located by external researchers, although researchers still need to visit the WHS to view materials. Again, information concerning how to access this content is provided.

Wang concluded her presentation by describing efforts to link materials in Preservica with other descriptive resources, such as WHS’s MARC records; expressing hope that WHS will integrate Preservica with their new ArchivesSpace instance; and discussing the usability testing that resulted in several upgrades to the WHS Electronic Records Portal prior to its release.

The penultimate speaker was Stefana Breitwieser, who spoke about SCOPE and its features. Breitwieser first discussed the “Archaeology of the Digital” project and how—through this project—the CCA acquired the bulk of its digital content, more than “660,000 files (3.5 TB).” In order to better enhance access to these resources, Breitwieser stressed that two problems had to be addressed:

  • “[A] long access workflow [that involved twelve steps].”
  • “Low discoverability.” Breitwieser stressed some issues with their current access tool included its inability to search across collections and its non-usage of metadata in Archivematica.

CCA staff ultimately decided on working alongside Artefactual Systems to build SCOPE, “an access interface for DIPs from Archivematica.” The goals of this project included:

  • “Direct user access to access copies of digital archives from [the] reading room.”
  • “Minimal reference intervention [by CCA staff].”
  • “Maximum discoverability using [granular] Archivematica-generated metadata.”
  • “Item-level searching with filtering and facetting.” 

To illustrate SCOPE’s capabilities, Breitwieser demonstrated the tool and its features (e.g. its ability to download DIPs) for the audience. During the presentation, she emphasized that although incredibly useful, SCOPE will ultimately supplement—rather than replace—the CCA’s finding aids. 

Breitwieser concluded by describing the CCA’s reading room—which include computers that possess a variety of useful software (e.g. computer-aided design, or CAD, software) and, like NCSU’s workstation, only limited technical capabilities—and highlighting CCA’s much simpler 5-step access workflow.

The final speaker, Kelly Stewart, spoke of SCOPE’s development process. Heavily emphasized during this presentation were Artefactual’s use of CCA user stories to develop “feature files”—or “logic-based, structured descriptions” of these user stories—that were used by Artefactual staff to build SCOPE. After its completion, Stewart noted that “user acceptance testing” occurred repeatedly until SCOPE was deemed ready. Stewart concluded her presentation with the hope that other archivists will implement and improve upon SCOPE.


Steven Gentry currently serves a Project Archivist at the Bentley Historical Library. His responsibilities include assisting with accessioning efforts, processing complex collections, and building various finding aids. He previously worked at St. Mary’s College of Maryland, Tufts University, and the University of Baltimore.