by Marian Clarke and Sally DeBauche
The 92nd Street Y is a community and cultural center located in New York City. Founded in 1874, 92NY is home to the Unterberg Poetry Center, a preeminent place for writers and poets to share their work with the public. Like many organizations, the 92nd Street Y adopted email in the late 1990s, and it has been the primary tool of communication for at least the last 15 years, all but replacing letters, faxes and interoffice memos. A collaboration between the Poetry Center and the digital archives team at Stanford University Libraries Department of Special Collections, this project will apply the pioneering ePADD email curation and access software to the assessment and preservation of the email archive with the aim of developing a model of processing and accessibility that other cultural centers might learn from and adopt. Our team began transferring email in September of 2022, and we expect to complete the project by December of 2023. This project has been made possible through the generous funding of the Email Archives: Building Capacity and Community regrant program—administered by the University of Illinois at Urbana-Champaign and funded by the Andrew W. Mellon Foundation. This post outlines some of the challenges we encountered in the primary stages of the project as we transferred email files for processing with ePADD and details the tools, methods, and strategies that we have found to make this vital first step in our project successful.
The Poetry Center’s email archive records the day-to-day activities of one of the world’s most prestigious literary organizations and contains nearly three-million messages from the accounts of its directors dating back to the late 1990s. The nature of these activities—the ongoing curation of readings, conversations, performances, workshops, seminars, “Discovery” Poetry Contest for emerging writers, and literacy outreach—has resulted in an email archive featuring correspondence with thousands of literary artists across their careers.
Our project team is made up of Marian Clarke, a Project Archivist for 92NY, Sally DeBauche, a Digital Archivist at Stanford Libraries, Bernard Schwartz, Director of 92NY’s Unterberg Poetry Center, and Glynn Edwards, Assistant Director for Special Collections at Stanford Libraries. Being located in New York City, the Bay Area, and Panama, we have coordinated the entire project via Zoom. Working as a geographically dispersed team created some unique challenges, namely in transferring the email files from 92NY to Stanford Libraries and finally on to Marian.
The Poetry Center’s email archive comprises 78 .pst files clocking in at 371 gigabytes of data. The largest email files from the 21 years of email at the 92NY easily belonged to Bernard Schwartz, the longest serving (and current!) director at the Poetry Center. The 92NY IT staff uploaded those files from UPC’s local server to a cloud storage service so that the Stanford team could download them to convert them to ePADD’s target file format for ingest, .mbox using the Emailchemy email conversion tool. The transfer of the email files from Stanford to Marian proved to be more challenging than anticipated and required some creative solutions. Initially, the Stanford team mailed an encrypted hard drive with the email files to Marian. When the hard drive arrived, the folders were empty—it seemed that something had gone wrong with the transfer of the files to the hard drive and wasn’t caught.
The easiest alternative was to upload the files to Google Drive so that Marian could download them, but before taking this route we needed to make sure that we were protecting the data in the process due to the usual prevalence of sensitive and legally protected information in email collections. We tested several encryption tools including the Mac disk utility, which worked until we switched from using Marian’s personal Mac to a PC for the project (the paragraph below further details this decision). We also tried Veracrypt, an open source utility for encryption, but ran into complications when attempting to de-encrypt the files. We ultimately used 7-zip, which is not marketed as an encryption tool but does give the user the option to encrypt a file when they zip it. This tool proved to be the least complicated to use and zipping the files also made the process of uploading and downloading them from Google Drive faster.
Files in hand, Marian turned to importing them into ePADD to begin processing them. Originally, we had planned for Marian to use her personal Macbook Air for the project, however it quickly became clear that it was not up to the task. The Macbook, with 8GB RAM and 120 GB of memory, was purchased in 2017. Ingesting these files into ePADD was extremely slow, sometimes taking 5-6 hours at a time and often ending with a series of error messages. Marian’s personal Macbook simply did not have enough RAM to process this amount of data and the hard drive did not have the capacity to store the resulting ePADD collection files along with the original files at the same time. The solution was to buy a new Dell laptop with 16GB RAM and 1TB of memory entirely dedicated to the project. With the new laptop, Marian successfully imported the email files into ePADD and began processing the collection.
Throughout this process, we have had some gentle reminders that all projects of this nature offer unanticipated technological issues and frustration, but with patience, curiosity, research, and a willingness to experiment with new tools and methods, they are not only possible but offer new models for preserving and making our collections accessible to users.
Marian Clarke is the project archivist working on the email collections of the 92nd Street Y’s Unterberg Poetry Center. She was previously a digital archivist at the Frick Art Reference Library Archives and an audiovisual archivist at LaGuardia and Wagner Archives, CUNY. She holds an MA in media studies from the University of Texas and MLIS from Pratt Institute.
Sally DeBauche is a Digital Archivist in the Department of Special Collection at Stanford University Libraries. She is responsible for creating policy and workflows related to born digital archiving and processing born digital collections, with a particular focus on email. She also project managed the development for the ePADD software from 2020-2021 and consulted on the most recent cycle of development led by the University of Manchester and Harvard University. Sally received a BA in History from the University of Wisconsin-Madison and an MSIS from the University of Texas at Austin.