What's Your Set-Up? Born-Digital Processing at UNC Chapel Hill

by Jessica Venlet


At UNC-Chapel Hill Libraries’ Wilson Special Collections Library, our workflow and technology set-up for born-digital processing has evolved over many years and under the direction of a few different archivists. This post provides a look at what technology was here when I started work in 2016 and how we’ve built on those resources in the last three years. Our set-up for processing and appraisal centers on getting collections ready for ingest to the Libraries’ Digital Collections Repository where other file-level preservation actions occur. 

What We Had 

I arrived at UNC in 2016 and was happy to find an excellent stock of hardware and two processing computers. Thank you to my predecessors! 

The computers available for processing were an iMac (10.11.6, 8GB RAM, 500GB storage) and a Lenovo PC (Windows 7, 8 GB RAM, 465 GB storage, 2.4GHz processor).  These computers were not used for storing collection material. Collections were temporarily stored and processed on a server before ingest to the repository. While I’m not sure how these machines were selected, I was glad to have dedicated      born-digital processing workstations.

In addition to the computers, we had a variety of other devices including:

  • One Device Side Data FC0525 5.25” floppy controller and a 5.25” disk drive
  • One Tableau USB write blocker
  • One Tableau SATA/IDE write blocker
  • Several USB connectable 3.5” floppy drives 
  • Two memory card readers (SanDisk 12 in 1 and Delkin)
  • Several zip disk drives (which turned out to be broken)
  • External CD/DVD player 
  • 3 external hard drives and several USB drives
  • Camera for photographing storage devices
  • A variety of other cords and adapters, most of which are used infrequently. Some examples are extra SATA/IDE adapters (like this one or this kit), Molex power adapters and power cords (like this or this), and USB adapter kit (like this one). 

The primary programs in use at the time were FTK Imager, Exact Audio Copy, and Bagger. 

What We Have Now

Since 2016, our workflow has evolved to include more appraisal and technical review before ingest. As a result, our software set-up expanded to include actions like virus scanning and file format identification. While it was great to have two dedicated workstations, our computers definitely needed an upgrade, so we worked on securing replacements.

The iMac was replaced with a Mac Mini (10.14.6, 16 GB RAM, 251 GB flash storage). Our PC was upgraded to a Lenovo P330 tower (Windows 10, 16 GB RAM, 476 GB storage). The Mini was a special request, but the PC request fit into a normal upgrade cycle. We continue to temporarily store collections on a server for processing before ingest.

Our peripheral devices remain largely the same as above, but we have added new (functional) zip drives and another Tableau USB write blocker used for appraisal outside of the processing space (e.g. offsite for a donor visit). We also purchased a KryoFlux, which can be used for imaging floppies. While not strictly required for processing, the KryoFlux may be useful to have if you encounter frequent issues accessing floppies. To learn more about the KryoFlux, check out the excellent Archivist’s Guide to the KryoFlux resource.

The software and tools that we’ve used have changed more often that our hardware set-up. Since about May 2018, we’ve settled on a pretty stable selection of software to get things done. Our commonly used tools are Bagger, Brunnhilde (and the dependencies that go with like Siegfried and clamAV), Bulk_Extractor, Exact Audio Copy, ffmpeg, IsoBuster, LibreOffice, Quick View Plus, rsync, text editors (text wrangler or BBEdit), and VLC Media Player. 

Recommended Extras

  • USB hub. Having extra USB ports has proven useful. 
  • A basic repair toolkit. This isn’t something we use often, but we have had a few older external hard drives come through that we needed to remove from an enclosure to connect to the write blocker. 
  • Training Collection Materials. One of the things I recommend most for a digital archives set-up is a designated set of storage devices and files that are for training and testing only. This way you have some material ready to go for testing new tools or training colleagues. Our training and testing collection includes a few 3.5” and 5.25” floppies, optical discs, and a USB drive that is loaded with files (including files with information that will get caught by our PII scanning tools). Many of the storage devices were deaccessioned and destined for the recycle. 

So, that’s how our set-up has changed over the last several years. As we continue to understand our needs for born-digital processing and as born-digital collections grow, we’ll continue to improve our hardware and software set-up.


Jessica Venlet works as the Assistant University Archivist for Digital Records & Records Management at the UNC-Chapel Hill Libraries’ Wilson Special Collections Library. In this role, Jessica is responsible for a variety of things related to both records management and digital preservation. In particular, she leads the processing and management of born-digital special collections. She earned a Master of Science in Information degree from the University of Michigan.

Welcome to the newest series on bloggERS, “What’s Your Set-Up?”

By Emily Higgs


Welcome to the newest series on bloggERS, “What’s Your Set-Up?” In the coming weeks, bloggERS will feature posts from digital archives professionals will explore the question: what equipment do you need to get your job done? 

This series was born from personal need:; as the first Digital Archivist at my institution, one of my responsibilities has been setting up a workstation to ingest and process our born-digital collections. It’s easy to be overwhelmed by the range of hardware and software needed, the variety of options for different equipment types, and where to obtain everything. In my context, some groundwork had already been done by forward-thinking former employees, who set up a computer with the BitCurator environment and also purchased a WiebeTech USB WriteBlocker. While this was a good first step for a born-digital workstation, we had much farther to go.

The first question I asked was: what do I need to buy?

My initial list of equipment was pretty easy to compile: 3.5” floppy drive, 5.25” floppy drive, optical drive, memory card reader, etc. etc. Then it started to get more complicated: 

  • Do I need to purchase disk controllers now or should I wait until I’m more familiar with the collections and know what I need? 
  • How much will a KryoFlux cost us over time vs. hiring an outside vendor to read our difficult floppies? 
  • Is it feasible to share one workstation among multiple departments? Should some of this equipment be shared consortially, like much of our collections structure? 
  • What brands and models of all this stuff are appropriate for our use case? What is quality and what is not?

The second question was: where do I buy all this stuff? This question contained myriad sub-questions: 

  • How do I balance quality and cost? 
  • Can I buy this equipment from Amazon? Should I buy equipment from Amazon? 
  • Will our budget structure allow for me to use vendors like eBay? 
  • Which sellers on eBay can I trust to send us legacy equipment that’s in working condition?

As with most of my work, I have taken  an iterative approach to this process. The majority of our unprocessed born-digital materials were stored on CDs and 3.5” floppy disks, so those were the focus of our first round of purchasing a few weeks ago. In addition to the basic USB blocker and BitCurator machine we already had, we now have a Dell External USB CD drive, a Tendak USB 3.5” floppy drive, and an Aluratek multimedia card reader to read the most common media in our unprocessed collections. We chose the Tendak drive mainly because of its price point, but it has not been the most reliable hardware and we will likely try something else in the future. As I’ve gone through old boxes from archivists past, I have found additional readers such as an Iomega Jaz drive, which I’m very glad we have; there are a number of Jaz disks in our unprocessed collections as well. 

As I went about this process, I started by emailing many of my peers in the field to solicit their opinions and learn more about the equipment at their institutions. The range of responses I got was extremely helpful for my decision-making process. The team at bloggERS wanted to share that knowledge out to the rest of our readership, helping them learn from their peers at a variety of institutions. We hope you glean some useful information from this series, and we look forward to your comments and discussions on this important topic.


Emily Higgs is the Digital Archivist for the Swarthmore College Peace Collection and Friends Historical Library. Before moving to Swarthmore, she was a North Carolina State University Libraries Fellow. She is also the Assistant Team Leader for the SAA ERS section blog.

Recap: BitCurator Users Forum, October 24-25, 2019

The fifth annual BitCurator Users Forum was held at Yale University from October 24-25, bringing library, archives, and museum practitioners together to learn and discuss many aspects of digital forensics work. Over two days of workshops, lightning talks, and panels, the Forum covered a range of topics around acquisition, processing, and access for born digital materials. In addition to traditional panels and conference sessions, attendees also participated in hands-on workshops on digital forensics techniques and tools, including the BitCurator environment.

Throughout the workshops, sessions, and discussions, one of the most dominant themes to emerge was the question of how archivists and institutions should address the environmental unsustainability of digital preservation. Attendees were quick to highlight recent work in this area, including the article Toward Environmentally Sustainable Digital Preservation by Keith L. Pendergrass, Walker Sampson, Tim Walsh, and Laura Alagna among others. The prevalence of this topic at the Forum as well as other conferences and in our professional literature points to urgency that archivists feel toward ensuring that we are able to continue to preserve our digital holdings while minimizing negative environmental impact as much as possible.

The role of appraisal in relation to the environmental sustainability of digital preservation specifically was a major focus of the Forum. One attendee remarked that the “low cost of storage has outpaced the ability to appraise content,” summing up the situation that many institutions find themselves in, where the ever decreasing cost of digital storage, anxiety about discarding potentially valuable collection material, and a lack of time and guidance on appraisal of digital materials has resulted in the ballooning of their digital holdings.

Participants challenged the notion that “keeping everything forever” should be our default preservation strategy. One common thread to emerge was the need to be more thoughtful about what we choose to retain and to develop and share appraisal criteria for born digital materials to help us make those decisions.

Also related to concerns about the environmental impact of digital preservation, presenters posed questions about how much data and related metadata for digital collections should be captured in the first place. Kelsey O’Connell, digital archivist at Northwestern University, proposed defining levels of digital forensics rather than applying the same workflow to every collection. Taking this type of approach to acquisition and metadata creation for born digital collection materials could help institutions minimize the storage of unnecessary collection data.

The BitCurator Users Forum provides an excellent opportunity for library and archives practitioners to learn new skills and discuss the many challenges and opportunities in the field of digital archiving. This year’s Forum was no exception and I have no doubt that it will continue to serve as a valuable resource for experienced practitioners as well as those just starting out.

—————————————————————————————————————-

Sally DeBauche is a Digital Archivist at Stanford University and the ePADD Project Manager.

DLFF’d Behind?

This year’s Digital Library Foundation Forum (DLFF or #DLF2019 or #DLFforum if you’re social) was held October 14-16 in Tampa, FL. As usual, many of the sessions were directly relevant to the Electronic Records Section membership; also as usual, the Forum was heavily Tweeted, giving a lot of us who couldn’t be there a mix of vicarious engagement and serious conference envy.

Thankfully, the DLF(F) ethos of collaboration makes it a little easier for everyone who couldn’t be there: OSF repositories for the DLF Forum and DigiPres meetings host (most of) the presentation slides for the 2019 meetings, organizers set up shared notes documents for the sessions, and each session had its own hashtag to help corral the discussion, annotations, and meta-commentary we’ve come to expect from libraries/archives/allied trades Twitter.

As most anyone who’s attended DLF Forum will tell you, every time slot has something great in it, and there’s no substitute for being there: for the next best thing, we’re happy to present below a few sessions which caught our interest– the session description and available materials, shared notes, and of course, the Twitter feed. Enjoy, and FOMO no more!

SAA 2019 Recap| Email Archiving: Strategies, Tools, Techniques Workshop

Email Archiving: Strategies, Tools, Techniques was a one-day workshop held on August 1, 2019. Chris Prom (University of Illinois) and Tricia Patterson (Harvard University) taught the workshop, which gave a broad overview of the opportunities and challenges of email archiving and some tools that can be used to make this daunting task easier.

As a processing archivist, email sits squarely within the electronic records processing workflow I’m helping develop: I took this class to build my digital archiving skills and to learn about techniques for managing email archives. Attending this class while my department is developing a digital archiving workflow helped me think ahead about technical limitations, ethical considerations, storage, and access issues related to email.

For me, the class was a good introduction to the opportunities and challenges of preserving this ephemeral and widespread communication. The class was divided into three sections: Assessing Needs and Challenges, Understanding Tools and Techniques, and Implementing Workflows. These sections were based on the Lifecycle of Email Model from The Future of Email Archives CLIR Report.

During the first portion of the class, we discussed the types of communication that occur through email, and the functions which fall under the creation and use as well as appraisal and selection categories of the email lifecycle. This section featured an interesting group activity asking us to list all of the email accounts we had used in our lifetime, the type of correspondence that occurred on the platform, an estimated size of the collection, and the scope and contents. This exercise helped illustrate how large, multifaceted, and varied even a single email a collection can be: I found this exercise effective for thinking about the complexities of archiving email.

In the second section, Prom and Patterson walked the class through seven tools for capturing and processing emails. The instructors gave a brief description of each tool’s functions and where they fit in the lifecycle model before giving a demo. Unfortunately, the demo portion was the weakest part of this workshop for me: instead of a live demonstration, the instructors used screenshots and a video recording. It was difficult to read the screenshots and the slides containing the screenshots do not have any explanatory text, so unless you took good notes, it would be difficult to understand how these tools work after the class was over. If SAA offers this class again, I would suggest the instructors do a live demo and provide more notes on how the tools work so that we can use class materials as a resource when we are doing this work at our own institutions. 

The group activity for this class was to export a small portion of our own email and use one of the tools discussed in class to begin processing. During this activity, we discovered that Yahoo makes it difficult or impossible to export email. I think this activity would have been more effective if we had been told to download our own emails and how before the class began. Most of the time allotted for this activity was spent figuring out how to download our emails and waiting for them to download, so we never got the chance to use the programs we discussed.

Overall, I thought the class provided a good introduction to the complexities of preserving email and introducing open-source and hosted tools that help with different parts of the email lifecycle. I would recommend this class to people who are exploring how to archive email and what would work for their institution.

Kahlee Leingang is a Processing Archivist at Iowa State University, where she works on creating guidelines and workflows for processing, preservation, and access of born-digital records as well as processing collections in the backlog.

Call for Contributions: What’s Your Set-Up?

bloggERS!, the blog of the Electronic Records Section of SAA, is accepting proposals for blog posts on the theme “What’s Your Set-Up?” These posts will address the question: what equipment do you need to get your job done in digital archives? We’re particularly interested in posts that consist of a detailed description of hardware, software, and other equipment used in your institution’s digital archives workflow (computers, readers, drives, etc.), as well as more general posts about equipment needs in digital archives.

See our call for posts below and email any proposals to ers.mailer.blog@gmail.com.

We look forward to hearing from all of you.

—The bloggERS! editorial subcommittee

Call for Posts

When starting a digital archives program from scratch, archivists can be easily overwhelmed by the range of hardware and software needed to effectively manage and preserve digital media, the variety of options for different equipment types, and where to obtain everything needed. As our practice evolves, so does the required equipment, and archivists are constantly replacing and improving our equipment according to our needs and resources. 

This series hopes to help break down barriers by allowing archivists to learn from their peers at a variety of institutions. We want to hear about the specific equipment you use in your day-to-day workflows, addressing questions such as: what do your workstations consist of? How many do you have? What readers and drives work reliably for your workflows? How did you obtain them? What doesn’t work? What is on your wish list for equipment acquisition?

We welcome posts from staff at institutions with all levels of budgetary resources. 

Other potential topics and themes for posts:

  • Creating a low-cost digital archives workstation
  • Stories of assembling workstations iteratively
  • Strategies for obtaining the necessary equipment, and preferred vendors
  • Working with IT to establish and support digital archives hardware and software
  • Stories of success or failure with advanced equipment such as the FRED Forensic Workstation or the Kryoflux

Writing for bloggERS! “What’s Your Set-Up?” Series

  • We encourage visual representations: Posts can include or largely consist of comics, flowcharts, a series of memes, etc!
  • Written content should be roughly 600-800 words in length
  • Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
  • Align with other editorial guidelines as outlined in the bloggERS! guidelines for writers.

Please let us know if you are interested in contributing by sending an email to ers.mailer.blog@gmail.com!

Data As Collections

By Nathan Gerth

Over the past several years there has been a growing conversation about “collections as data” in the archival field. Elizabeth Russey Roke underscored the growing impact of this movement in her recent post on the Collections As Data: Always Already Computational final report at Blo. Much like her, I have seen this computational approach to the data in our collections manifest itself in ways at my home institution, with our decision to start providing researchers with aggregate data harvested from our born-digital collections.

Data as Collections

At the same time, in my role as a digital archivist working with congressional papers, I have seen a growing body of what I call “data as collections.” I am using the term data in this case specifically in reference to exports from relational database systems in collections. More akin to research datasets than standard born-digital acquisitions, these exports amplify the privacy and technical challenges associated with typical digital collections. However, they also embody some of the more appealing possibilities for the computational research highlighted by the “collections as data” initiative, given their structured nature and millions of data points.   

The problem of curating and supplying access to a particular type of data export has become an acute problem in the field of congressional papers. As documented in a white paper by a Congressional Papers Section Task Force in 2017, members in the U.S. House of Representatives and U.S. Senate have widely adopted proprietary Constituent Management Systems (CMS) or Constituent Services Systems (CSS) to manage constituent communications. The exports from these systems document the core interactions between the American people and their representatives in Congress. Unfortunately, these data exports have remained largely inaccessible to archivists and researchers alike.

The question of curating, preserving, and supplying access to the exports from these systems has galvanized the work of several task forces in the archival community. In recent years, congressional papers archivists have collaborated to document the problem in the white paper referenced above and to support the development of a tool to access these exports. The latter effort, spearheaded by West Virginia Libraries, earned a Lyrasis Catalyst Fund grant in 2018 to assess the development possibilities for an open-source platform developed at WVU to open and search these data exports. You can see a screenshot of the application in action below.

https://lh6.googleusercontent.com/iLmJ2ebnslIsoeST9MSbfuSBwtCfZM7-aHoV620y-9gq1daH9iZbpGAxy1NJX8vSB1qSrnLsxeveRCGr0VybE6JrlB5Z_X6MMzzXWK3_S2FXLlOPekDdFdhMV95a81U1AQ4j

Screenshot of data table viewed in the West Virginia University Libraries CSS Data Tool

The project funded by the grant, America Contacts Congress, has now issued its final report and the members of the task force that served as its advisory board are transitioning to the next stage of the project. Here are where things stand:

What We Now Know

We now know much more about the key research audiences for this data and the archival needs associated with the tool. Researchers expressed solid enthusiasm for gaining access to the data, especially computationally minded quantitative scholars. For those of us involved in testing data in the tool, the project gave us a moment to become much more familiar with our data. I, for my part, also know a great deal more about the 16 million records in the relational data tables we received from the office of Senator Harry Reid, in addition to the 3 million attachments referenced by those tables. Without the ability to search and view the data in the tool, the tables and attachments from the Reid collection would have existed as little more than binary files.

Unresolved Questions

While members of the grant’s advisory board know much more about how the tool might be used in the sphere of congressional papers, we would like to learn more about other cases of “data in collections” in the archival field. Who beyond congressional papers archivists are grappling with supplying access to and preserving relational databases? We know, for example, that many state and local governments are using the same Constituent Relationship Management systems, such as iConstituent and Intranet Quorum, deployed in congressional offices. Do our needs overlap with those of other archivists and could this tool serve a broader community? While the amount of CSS data exports in congressional collections is significant, the direction we plan to take tool development and partnerships to supply access to the data will hinge on finding a broader audience of archivists facing similar challenges.

If any of the questions above apply to you, consider contacting the members of the America Contacts Congress project’s advisory board. We would love to hear from you and discuss how the outcomes of the grant might apply to a broader array of data exports in archival collections. Who knows, we might even help you test the tool on your own data exports! For more information about the project, visit our webpage.

Nathan Gerth

Nathan Gerth is the Head of Digital Services and Digital Archivist at the University of Nevada, Reno Libraries. Splitting his time between application support and digital preservation, he is the primary custodian of the electronic records from the papers of Senator Harry Reid. Outside of the University, he is an active participant in the congressional papers community, serving as the incoming chair of the Congressional Papers Section and as a member of the Association of Centers for the Study of Congress CSS Data Task Force.

SAA 2019 recap | Session 504: Building Community History Web Archives: Lessons Learned from the Community Webs Program

by Steven Gentry


Introduction

Session 504 focused on the Community Webs program and the experiences of archivists who worked at either the Schomburg Center for Research in Black Culture or the Grand Rapids Public Library. The panelists consisted of Sylvie Rollason-Cass (Web Archivist, Internet Archive), Makiba Foster (Manager, African American Research Library and Cultural Center, formerly the Assistant Chief Librarian, the Schomburg Center for Research in Black Culture), and Julie Tabberer (Head of Grand Rapids History & Special Collections).

Note: The content of this recap has been paraphrased from the panelists’ presentations and all quoted content is drawn directly from the panelists’ presentations.

Session summary

Sylvie Rollason-Cass began with an overview of web archiving and web archives, including:

  • The definition of web archiving.
  • The major components of web archives, including relevant capture tools (e.g. web crawlers, such as Wget or Heritrix) and playback software (e.g. Webrecorder Player).
  • The ARC and WARC web archive file formats. 

Rollason-Cass then noted both the necessity of web archiving—especially due to the web’s ephemeral nature—and that many organizations archiving web content are higher education institutions. The Community Webs program was therefore designed to get more public library institutions involved in web archiving, which is critical given that these institutions often collect unique local and/or regional material.

After a brief description of the issues facing public libraries and public library archives—such as a lack of relevant case studies—Rollason-Cass provided information about the institutions that joined the program, the resources provided by the Internet Archive as part of the program (e.g. a multi-year subscription to Archive-It), and the project’s results, including:

  • The creation of more than 200 diverse web archives (see the Remembering 1 October web archive for one example).
  • Institutions’ creation of collection development policies pertaining specifically to web archives, in addition to other local resources.
  • The production of an online course entitled “Web Archiving for Public Libraries.” 
  • The creation of the Community Webs program website.

Rollason-Cass concluded by noting that although some issues—such as resource limitations—may continue to limit public libraries’ involvement in web archiving, the Community Webs program has greatly increased the ability for other institutions to confidently archive web content. 

Makiba Foster then addressed her experiences as a Community Webs program member. After a brief description of the Schomburg Center, its mission, and unique place where “collections, community, and current events converge”, Foster highlighted the specific reasons for becoming more engaged with web archiving:

  • Like many other institutions, the Schomburg Center has long collected clippings files—and web archiving would allow this practice to continue.
  • Materials that document the experiences of the black community are prominent on the World Wide Web.
  • Marginalized community members often publish content on the Web.

Foster then described the #HashtagSyllabusMovement collection, a web archive of educational material “related to publicly produced and crowd-sourced content highlighting race, police violence, and other social justice issues within the Black community.” Foster had known this content could be lost, so—even before participating in the Community Webs program—she began collecting URLs. Upon joining the Community Webs program, Foster used Archive-It to archive various relevant materials (e.g. Google docs, blog posts, etc.) dated from 2014 to the current. Although some content was lost, the #HashtagSyllabusMovement collection both continues to grow—especially if, as Foster hopes, it begins to include international educational content—and shows the value of web archiving. 

In her conclusion, Foster addressed various success, challenges, and future endeavors:

  • Challenges:
    • Learning web archiving technology and having confidence in one’s decisions.
    • Curating content for the Center’s five divisions.
    • “Getting institutional support.”
  • Future Directions:
    • A new digital archivist will work with each division to collect and advocate for web archives.
    • Considering how to both do outreach for and catalog web archives.
    • Ideally, working alongside community groups to help them implement web archiving practices.

The final speaker, Julie Tabberer, addressed the value of public libraries’ involvement in web archives. After a brief overview of the Grand Rapids Public Library, the necessity of archives, and the importance of public libraries’ unique collecting efforts, Tabberer posited the following question: “Does it matter if public libraries are doing web archiving?” 

To test her hypothesis that “public libraries document mostly community web content [unlike academic archives],” Tabberer analyzed the seed URLs of fifty academic and public libraries to answer two specific questions:

  • “Is the institution crawling their own website?”
  • “What type of content [e.g. domain types] is being crawled [by each institution]?”

After acknowledging some caveats with her sampling and analysis—such as the fact that data analysis is still ongoing and that only Archive-It websites were examined—Taberrer showed audience members several graphics that revealed academic libraries 1.) Typically crawled their websites more so than public libraries and 2.) Captured more academic websites than public libraries.

Tabberer then concluded with several questions and arguments for the audience to consider:

  • In addition to encouraging more public libraries to archive web content—especially given their values of access and social justice—what other information institutions are underrepresented in this community?
  • Are librarians and archivists really collecting content that represents the community?
  • Even though resource limitations are problematic, academic institutions must expand their web archiving efforts.

Steven Gentry currently serves a Project Archivist at the Bentley Historical Library. His responsibilities include assisting with accessioning efforts, processing complex collections, and building various finding aids. He previously worked at St. Mary’s College of Maryland, Tufts University, and the University of Baltimore.

SAA 2019 recap | Session 204: Demystifying the Digital: Providing User Access to Born-Digital Records in Varying Contexts

by Steven Gentry


Introduction

Session 204 addressed how three dissimilar institutions—North Carolina State University (NCSU), the Wisconsin Historical Society (WHS), and the Canadian Centre for Architecture (CCA)—are connecting their patrons with born-digital archival content. The panelists consisted of Emily Higgs (NCSU Libraries Fellow, North Carolina State University), Hannah Wang (Electronic Records & Digital Preservation Archivist, Wisconsin Historical Society), and Stefana Breitwieser (Digital Archivist, Canadian Centre for Architecture). In addition, Kelly Stewart (Director of Archival and Digital Preservation Services, Artefactual Systems) briefly spoke about the development of SCOPE, the tool featured in Breitwieser’s presentation.

Note: The content of this recap has been paraphrased from the panelists’ presentations and all quoted content is drawn directly from the panelists’ presentations.

Session summary

Emily Higgs’s presentation focused on the different ways that NCSU’s Special Collections Research Center (SCRC) staff enhance access to their born-digital archives. After a brief overview of NCSU’s collections, Higgs first described their lightweight workflow for bridging researchers and requested digital content, a process that involves SCRC staff accessing an administrator account on a reading room Macbook; transferring copies of requested content to a read-only folder shared with a researcher account; and limiting the computer’s overall  capabilities, such as restricting its internet and ports (the latter is accomplished via Endpoint Protector Basic). Should a patron want copies of the material, they simply drag and drop those resources into another folder for SCRC staff to review.

Higgs then described an experimental Named Entity Recognition (NER) workflow that employs spaCy and which allows archivists to better describefiles in NCSU’s finding aids.The workflow employs a Jupyter notebook (see her Github repository for more information) to automate the following process:

  • “Define directory [to be analyzed by spaCy].”
  • “Walk directory…[to retrieve] text files [such as PDFs].”
  • “Extract text (textract).”
  • “Process and NER (spaCy).”
  • “Data cleaning.”
  • “Ranked output of entities (csv) [which is based on the number of times a particular name appears in the files].”

Once the process is completed, the most frequent 5-10 names are placed in an ArchivesSpace scope and content note. Higgs concluded by emphasizing this workflow’s overall ease of use and noting that—in the future—staff will integrate application programming interfaces (APIs) to enhance the workflow’s efficiency.

Next to speak was Hannah Wang, who addressed how Wisconsin Historical Society (WHS) has made its born-digital state government records more accessible. Wang began her presentation by discussing the Wisconsin State Preservation of Electronic Records Project (WiSPER) Project and its two goals:

  • “Ingest a minimum of 75 GB of scheduled [and processed] electronic records from state agencies.”
  • “Develop public access interface.” 

And explained the reasons behind Preservica’s selection:

  • WHS’s lack of significant IT support meant an easily implementable tool was preferred over open-source and/or homegrown solutions.
  • Preservica allowed WHS to simultaneously preserve and provide (tiered) access to digital records.
  • Preservica has a public-facing WordPress site, which fulfilled the second WiSPER grant objective.

Wang then addressed how WHS staff appropriately restricted access to digital records by placing records into one of three groupings:

  • “Content that has a legal restriction.”
  • “Content that requires registration and onsite viewing [such as email addresses].”
  • “Open, unrestricted content.” 

WHS staff actually achieved this goal by employing different methods to locate and restrict digital records:

  • For identification: 
    • Reviewing “[record] retention schedules…[and consulting with] agency [staff who would notify WHS personnel of sensitive content].” 
    • Using resources like bulk extractor
    • Reading records if necessary.
  • For restricting records:
    • Employing scripts—such as batch scripts—to transfer and restrict individual files and whole SIPs.

Wang demonstrated how WHS makes its restricted content accessible via Preservica:

  • “Content that has a legal restriction”: Only higher levels of description can be searched by external researchers, although patrons have information concerning how to access this content.
  • “Content that requires registration and onsite viewing”: Individual files can be located by external researchers, although researchers still need to visit the WHS to view materials. Again, information concerning how to access this content is provided.

Wang concluded her presentation by describing efforts to link materials in Preservica with other descriptive resources, such as WHS’s MARC records; expressing hope that WHS will integrate Preservica with their new ArchivesSpace instance; and discussing the usability testing that resulted in several upgrades to the WHS Electronic Records Portal prior to its release.

The penultimate speaker was Stefana Breitwieser, who spoke about SCOPE and its features. Breitwieser first discussed the “Archaeology of the Digital” project and how—through this project—the CCA acquired the bulk of its digital content, more than “660,000 files (3.5 TB).” In order to better enhance access to these resources, Breitwieser stressed that two problems had to be addressed:

  • “[A] long access workflow [that involved twelve steps].”
  • “Low discoverability.” Breitwieser stressed some issues with their current access tool included its inability to search across collections and its non-usage of metadata in Archivematica.

CCA staff ultimately decided on working alongside Artefactual Systems to build SCOPE, “an access interface for DIPs from Archivematica.” The goals of this project included:

  • “Direct user access to access copies of digital archives from [the] reading room.”
  • “Minimal reference intervention [by CCA staff].”
  • “Maximum discoverability using [granular] Archivematica-generated metadata.”
  • “Item-level searching with filtering and facetting.” 

To illustrate SCOPE’s capabilities, Breitwieser demonstrated the tool and its features (e.g. its ability to download DIPs) for the audience. During the presentation, she emphasized that although incredibly useful, SCOPE will ultimately supplement—rather than replace—the CCA’s finding aids. 

Breitwieser concluded by describing the CCA’s reading room—which include computers that possess a variety of useful software (e.g. computer-aided design, or CAD, software) and, like NCSU’s workstation, only limited technical capabilities—and highlighting CCA’s much simpler 5-step access workflow.

The final speaker, Kelly Stewart, spoke of SCOPE’s development process. Heavily emphasized during this presentation were Artefactual’s use of CCA user stories to develop “feature files”—or “logic-based, structured descriptions” of these user stories—that were used by Artefactual staff to build SCOPE. After its completion, Stewart noted that “user acceptance testing” occurred repeatedly until SCOPE was deemed ready. Stewart concluded her presentation with the hope that other archivists will implement and improve upon SCOPE.


Steven Gentry currently serves a Project Archivist at the Bentley Historical Library. His responsibilities include assisting with accessioning efforts, processing complex collections, and building various finding aids. He previously worked at St. Mary’s College of Maryland, Tufts University, and the University of Baltimore.

Meet our newest ERS steering committee members!

All this week we’ll be featuring introductions to our newest ERS steering committee members! Today, meet Elizabeth Carron, one of our our new steering committee members.

Tell us a little bit about yourself.

“I graduated from the University of Massachusetts Amherst with a background in Early Modern Literature and French Studies and finished my master’s at Simmons College in 2014. I didn’t take the archives track – rather, I was more focused on subject librarianship and digital scholarship. I made amazing connections in the Five College area as a student and as a librarian; in 2014, shortly after graduating, I was offered a project at the Smith College Archives – and I’ve been in archives and archives management ever since! After my project at Smith ended, I moved to Ann Arbor to be a project archivist in collections development at the Bentley Historical Library. Eventually, the position of Archivist for Records Management was created and I transitioned into that role. It’s been my responsibility to develop the program by establishing and communicating best RIM practices to the University community and to push forward acquisition procedures that will support description, arrangement, and access further down the road.” 

What made you decide you wanted to become an archivist?

“Honestly, no one thing. I studied Early Modern language and literature and got involved with several digital humanities projects, which in turn led to a deeper exposure to libraries and archives. From there, I explored graduate programs while working for a cultural heritage org on the admin side and just felt a click with archival programs. Being an archivist means I get to learn about a variety of topics, to meet new people and communities. It also means I get a hand in history-making. Whether I’m collecting or advocating for resources and partnerships, preservation is a profound responsibility.”

What is one thing you’d like to see the Electronic Records Section accomplish during your time as vice-chair?

“I do a lot of acquiring of electronic/digital records and not so much processing; I’d like to explore this process of acquisition and perhaps work on perspectives to assist with understanding e-records/e-record concerns in this process. “

What three people, alive or dead, would you invite to dinner?

“George Sand and Dolly Parton to keep things lively; and my grampa, who was an amazing cook with a never-ending cache of dad jokes.”