Archives and special collections have a long history experimenting with and embracing digital tools, so it is not surprising that they have been natural partners for digital scholarship librarians. In this blog post, we want to share a couple of experiences we’ve had where digital scholarship and the archives came together.
Laurie Allen, Director for Digital Scholarship, University of Pennsylvania:
The Cope Evans project was an early collaboration between the Digital Scholarship group, Special Collections, and a group of students at the Haverford College Libraries. Over the years, a series of gifts had made it possible for Haverford to digitize and richly describe the Cope Evans Family Papers, which include correspondence and other documents from a connected group of Philadelphia Quaker families. While the ContentDM system used by the library allowed for searching through the digitized items, it did not take full advantage of the available metadata, including geospatial metadata. In the summer of 2014, the library employed a group of students to make use of the metadata records and associated images as a dataset. Over the following two summers, two groups of Haverford undergraduates explored the exported data from the Cope Collections to create maps, network analyses, and other visualizations and analyses of the collection. Of course, their exploration of the data led them directly back to the original materials and the resulting work represented a broader and deeper connection to the materials.
This experimentation with using our collections as data for student work led the Haverford Libraries to continue approaching the data and metadata of Quaker collections in data rich ways going forward. The Quakers and Mental Health site and the Beyond Penn’s Treaty site that have since been made take this work forward at Haverford.
Stewart Varner, Managing Director of Price Lab for Digital Humanities, University of Pennsylvania:
When I was the Digital Scholarship Librarian at the University of North Carolina, I worked on a project called DocSouth Data which was designed to facilitate innovative research methods on Documenting the American South, one of the library’s most popular online collections. Documenting the American South is composed of eighteen thematic collections of digitized material. DocSouth Data takes four of the most text-heavy collections, including the heavily used North American Slave Narrative, and makes them available as .txt files as well as .xml files. With these files, scholars can start looking for patterns using simple tools like Voyant and easily experiment with text analysis methods like topic modeling and sentiment analysis.
DocSouth Data was an exciting partnership between myself, the Library and Information Technology team, and archivists in UNC’s Special Collections. The original idea came from Nick Graham who, at the time, was the Program Coordinator for the North Carolina Digital Heritage Center (and is currently the University Archivist at UNC). I worked closely with Library and Information Technology who created the plain text files, organized them into a clear folder structure and made them available as .zip files on the library’s website. Once DocSouth Data was live, I hosted workshops at UNC and elsewhere that gave faculty, students, and librarians the chance to explore new ways to study the collections.
Since these two projects started, both Laurie and Stewart have joined the project team for the IMLS funded Collections as Data project. The Haverford Libraries contributed twofacets to that project.
By Shira Peltzman, Charlie Macquarie, Annalise Berdini, and Kate Tasker
Recently, four digital archivists from across the University of California (UC) system — Shira Peltzman (UCLA), Annalise Berdini (UC San Diego), Kate Tasker (UC Berkeley), and Charlie Macquarie (San Francisco) — collaborated to develop and release a community-driven UC-wide descriptive standard for born-digital archival material. Born of discussions that began in the UC Common Knowledge Group (CKG) for born-digital content, the result is a set of guidelines for creating and updating finding aids to include born-digital archival material. The guidelines came about because the authors recognized a gap in existing guidelines and standards (i.e. DACS, ISAD(G), etc.), and saw an opportunity to come together to standardize what were sometimes disparate descriptive practices in this developing area. Should, for example, the extent metadata element refer to storage capacity; number of files; number of media objects, processed or not? We were all using this element slightly differently, and having a difficult time finding existing comprehensive guidance.
Methodology & Process
To create these guidelines, the first step for us was to separate out theory from practice. On the practical side, this meant looking at a range of finding aids from institutions around the world so that we could get a better sense of how much consensus there was in the digital archives community. Tori Maches, Scott Reed, and Patricia Ciccone, Digital Archives Program Scholars in UCLA’s Center for Primary Research and Training, assisted with this work by compiling a lengthy list of finding aids from around the world that described born-digital material. A key finding from this exercise was that practically every single institution had their own unique approach. We concluded that born-digital description was being treated as a somewhat boutique procedure across the board, and that this was impacting the accessibility and usability of the material being described.
On the theoretical side, our next task was to look at all the existing descriptive standards and content models out there that touched on this subject. The major takeaway here was pretty much what we expected it would be, which is to say that these standards all had significant gaps when it came to born-digital.
After determining which fields we’d need to address and creating a basic document outline, we began scheduling weekly or bi-weekly conference calls to discuss the document and the work that each of us had completed in between calls. Starting in February 2017, we worked individually on assigned sections and used Google Docs to communicate questions, comments, or to provide suggested edits between meetings. We completed the first draft in May 2017 and submitted the document for review to the UC Born-Digital Content CKG. Members had a month to submit feedback and suggest changes or additions. Following their review, we sent the document out to the UC system, asking our fellow CKG members to alert collection management or processing members at their institutions in particular. We allowed another month for this round of review, and after numerous edits and additions, including the addition of a controlled vocabulary and full sample finding aid, we had a document ready to present to the UC Heads of Special Collections for approval. This was obtained October 2017, and designated the guidelines as UC-official and ready to be implemented across all UC libraries.
Contents of UC Guidelines
The UC Guidelines for Born-Digital Archival Description present recommendations for describing born-digital content in an archival finding aid, using 12 standard elements such as Scope and Content, Processing Information, and Organization and Arrangement. The document offers guidance on determining an appropriate level and method of description of born-digital components, establishes a minimum standard requirement for finding aids in the UC system, and includes a metadata fields crosswalk, a sample finding aid, and links to additional resources. It also contains a comprehensive controlled vocabulary for born-digital source media and other born-digital terms, developed by Courtney Dean, Margaret Hughes, Kelly Kress, and Shira Peltzman at UCLA.
We’re excited to see that these guidelines are already helping to grow and sustain digital archives programs at each of our institutions. The task of analyzing each of the descriptive elements prompted critical thinking and discussion among multiple staff members, and investigating these questions has helped us clarify procedures and provide practical solutions. With the backing of the UC Common Knowledge Group and the Heads of Special Collections, the guidelines can also be used as an authoritative resource by individuals or units who need to advocate for new digital processing workflows.
We hope that the UC Guidelines for Born-Digital Archival Description will serve not only as a practical tool for UC archivists, but also as a useful illustration of UC-wide practices and as a set of instructions which can be easily adapted and adopted by our professional community.
Where you can access the Guidelines
The UC Guidelines for Born-Digital Archival Description can be found on GitHub. While the formal comment period has ended, we welcome feedback, suggestions, and questions. Please take a look and let us know what you think.
Shira Peltzman is the Digital Archivist for the University of California, Los Angeles (UCLA), Library, where she leads the development of a sustainable preservation program for born-digital archival material. Shira received her master’s degree in Moving Image Archiving and Preservation from New York University’s Tisch School of the Arts, and was a member of the inaugural cohort of the National Digital Stewardship Residency in New York (NDSR-NY).
Charlie Macquarie is the Digital Archivist at the University of California, San Francisco (UCSF), Library, Archives and Special Collections department, where he oversees the implementation of the digital-archives program. Additionally, he is a Librarian in Residence and Library Research Fellow at the Prelinger Library in San Francisco, where he is interested in creative communities and alternative [digital] library practices that might be built on the library platform.
Annalise Berdini is the Digital Archivist for the University of California, San Diego, where she is responsible for the development and implementation of Geisel Library’s born-digital archives program, and for the management and preservation of the library’s web archives collections.
Kate Taskerworks with born-digital collections and information management systems to enable and enhance research access at The Bancroft Library, UC Berkeley. She holds an MLIS from San Jose State University and is a member of SAA, the Academy of Certified Archivists, & the Society of California Archivists.
As Princeton University Library’s Manuscripts Division processing team continues to move forward in terms of managing its born-digital materials, much of its focus as of late has been on providing access to this content (else, why preserve it?). So, the timing of the Born Digital Access bootcamp that was held in Philadelphia this past summer was very opportune. Among other takeaways, it was helpful and comforting to learn how other institutions are grappling with the issue of providing or restricting access in relation to what Princeton is currently doing.
The bootcamp, led by Alison Clemens from Yale and Greg Weideman from SUNY Albany, was well-organized and very informative; and I really appreciate how community-driven and participatory this initiative is, down to the community notes prepared by one of its organizers,Rachel Appel who was in attendance. I also appreciated that the content provided a holistic and comprehensive approach to access, including reinforcement of the fact that the ability to provide access to born-digital materials starts at the point of record creation; and that once implemented, the effectiveness of the means by which institutions are providing access should be determined through frequent user testing.
One point in particular that Alison and Greg emphasized that stood out to me is how the discovery of born-digital content is often almost as difficult as the delivery of that content. This was exemplified during the user testing portion of the bootcamp where attendees had the opportunity to interact with several discovery platforms that describe and/or provide access to digital records. The testing demonstrated that the barriers that remain in terms of locating and accessing digital content are still fairly significant.
The issues surrounding discovery and delivery are something that archivists at Princeton are trying to manage and improve upon. For example, I’m part of two working groups that are tackling these issues from different angles: the Description and Access to Born Digital Archival Collections and the User Experience working groups. The latter has started to embark on both formal and informal user testing of our finding aids site. One aspect that we’re paying particular attention to is the ease with which users can locate and access digital content. I had the opportunity to contribute one of Princeton’s finding aids as a use case for the user testing portion of the workshop; and received helpful feedback, both positive and negative, from bootcamp attendees about the description and delivery methods found on our site. Although one can access the digital records from this collection, there are some impediments in actually viewing the files; namely, one would have to download a program like Thunderbird in order to view the mbox file of emails, a fact that’s not evident to the user.
Technical Services archivists at Princeton are also collaborating with colleagues in Public Services and Systems to determine how we might best provide various methods of access to our born-digital records. Because much of the content in Manuscripts Division collections is (at the moment) restricted due to issues related to copyright, privacy, and donor concerns, we’re trying to determine how we can provide mediated access to content both on and off-site. I was somewhat relieved to learn that, like Princeton, many institutions represented at the bootcamp are still relying on non-networked “frankenstein” computers in the reading room as the only other means of providing access aside from having content openly available online. Hopefully Princeton will be able to provide better forms of mediated access in the near future as we intend to implement a pilot version of networked access in the reading room for various forms of digital content, including text, image, and AV files. The next step could be to implement a “virtual reading room” where users can access content via authentication. As these initiatives are realized, we’ll continue to conduct user testing to make sure that what we’re providing is actually useful to patrons. Princeton staff look forward to continuing to participate in the initiatives of the Born Digital Access group as a way to both learn from and share our experiences with this community.
Faith Charlton is Lead Processing Archivist for Manuscripts Division Collections at Princeton University Library. She is a certified archivist and holds an MLIS from Drexel University, an MA in History from Villanova University, and a BA in History from The College of New Jersey.