Archival Collections as Data for Digital Scholarship

By Laurie Allen and Stewart Varner

Archives and special collections have a long history experimenting with and embracing digital tools, so it is not surprising that they have been natural partners for digital scholarship librarians. In this blog post, we want to share a couple of experiences we’ve had where digital scholarship and the archives came together.

Laurie Allen, Director for Digital Scholarship, University of Pennsylvania:

The Cope Evans project was an early collaboration between the Digital Scholarship group, Special Collections, and a group of students at the Haverford College Libraries. Over the years, a series of gifts had made it possible for Haverford to digitize and richly describe the Cope Evans Family Papers, which include correspondence and other documents from a connected group of Philadelphia Quaker families. While the ContentDM system used by the library allowed for searching through the digitized items, it did not take full advantage of the available metadata, including geospatial metadata. In the summer of 2014, the library employed a group of students to make use of the metadata records and associated images as a dataset. Over the following two summers, two groups of Haverford undergraduates explored the exported data from the Cope Collections to create maps, network analyses, and other visualizations and analyses of the collection. Of course, their exploration of the data led them directly back to the original materials and the resulting work represented a broader and deeper connection to the materials.

This experimentation with using our collections as data for student work led the Haverford Libraries to continue approaching the data and metadata of Quaker collections in data rich ways going forward. The Quakers and Mental Health site and the Beyond Penn’s Treaty site that have since been made take this work forward at Haverford.

Stewart Varner, Managing Director of Price Lab for Digital Humanities, University of Pennsylvania:

When I was the Digital Scholarship Librarian at the University of North Carolina, I worked on a project called DocSouth Data which was designed to facilitate innovative research methods on Documenting the American South, one of the library’s most popular online collections. Documenting the American South is composed of eighteen thematic collections of digitized material. DocSouth Data takes four of the most text-heavy collections, including the heavily used North American Slave Narrative, and makes them available as .txt files as well as .xml files. With these files, scholars can start looking for patterns using simple tools like Voyant and easily experiment with text analysis methods like topic modeling and sentiment analysis.

DocSouth Data was an exciting partnership between myself, the Library and Information Technology team, and archivists in UNC’s Special Collections. The original idea came from Nick Graham who, at the time, was the Program Coordinator for the North Carolina Digital Heritage Center (and is currently the University Archivist at UNC). I worked closely with Library and Information Technology who created the plain text files, organized them into a clear folder structure and made them available as .zip files on the library’s website. Once DocSouth Data was live, I hosted workshops at UNC and elsewhere that gave faculty, students, and librarians the chance to explore new ways to study the collections.

Since these two projects started, both Laurie and Stewart have joined the project team for the IMLS funded Collections as Data project. The Haverford Libraries contributed two facets to that project.

Philly Born Digital Access Bootcamp

by Faith Charlton

As Princeton University Library’s Manuscripts Division processing team continues to move forward in terms of managing its born-digital materials, much of its focus as of late has been on providing access to this content (else, why preserve it?). So, the timing of the Born Digital Access bootcamp that was held in Philadelphia this past summer was very opportune. Among other takeaways, it was helpful and comforting to learn how other institutions are grappling with the issue of providing or restricting access in relation to what Princeton is currently doing.  

The bootcamp, led by Alison Clemens from Yale and Greg Weideman from SUNY Albany, was well-organized and very informative; and I really appreciate how community-driven and participatory this initiative is, down to the community notes prepared by one of its organizers,Rachel Appel who was in attendance. I also appreciated that the content provided a holistic and comprehensive approach to access, including reinforcement of the fact that the ability to provide access to born-digital materials starts at the point of record creation; and that once implemented, the effectiveness of the means by which institutions are providing access should be determined through frequent user testing.

One point in particular that Alison and Greg emphasized that stood out to me is how the discovery of born-digital content is often almost as difficult as the delivery of that content. This was exemplified during the user testing portion of the bootcamp where attendees had the opportunity to interact with several discovery platforms that describe and/or provide access to digital records. The testing demonstrated that the barriers that remain in terms of locating and accessing digital content are still fairly significant.

The issues surrounding discovery and delivery are something that archivists at Princeton are trying to manage and improve upon. For example, I’m part of two working groups that are tackling these issues from different angles: the Description and Access to Born Digital Archival Collections and the User Experience working groups. The latter has started to embark on both formal and informal user testing of our finding aids site. One aspect that we’re paying particular attention to is the ease with which users can locate and access digital content. I had the opportunity to contribute one of Princeton’s finding aids as a use case for the user testing portion of the workshop; and received helpful feedback, both positive and negative, from bootcamp attendees about the description and delivery methods found on our site. Although one can access the digital records from this collection, there are some impediments in actually viewing the files; namely, one would have to download a program like Thunderbird in order to view the mbox file of emails, a fact that’s not evident to the user.     

Untitled drawing

Technical Services archivists at Princeton are also collaborating with colleagues in Public Services and Systems to determine how we might best provide various methods of access to our born-digital records. Because much of the content in Manuscripts Division collections is (at the moment) restricted due to issues related to copyright, privacy, and donor concerns, we’re trying to determine how we can provide mediated access to content both on and off-site. I was somewhat relieved to learn that, like Princeton, many institutions represented at the bootcamp are still relying on non-networked “frankenstein” computers in the reading room as the only other means of providing access aside from having content openly available online. Hopefully Princeton will be able to provide better forms of mediated access in the near future as we intend to implement a pilot version of networked access in the reading room for various forms of digital content, including text, image, and AV files. The next step could be to implement a “virtual reading room” where users can access content via authentication. As these initiatives are realized, we’ll continue to conduct user testing to make sure that what we’re providing is actually useful to patrons. Princeton staff look forward to continuing to participate in the initiatives of the Born Digital Access group as a way to both learn from and share our experiences with this community.    

Untitled drawing (1)

Faith Charlton is Lead Processing Archivist for Manuscripts Division Collections at Princeton University Library. She is a certified archivist and holds an MLIS from Drexel University, an MA in History from Villanova University, and a BA in History from The College of New Jersey.