Archiving Email: Electronic Records and Records Management Sections Joint Meeting Recap

By Alice Sara Prael

This is the second post in the bloggERS series on Archiving Digital Communication.

Email has become a major challenge for archivists working to preserve and provide access to correspondence. There are many technical challenges that differ between platforms as well as intellectual challenges to describe and appraise massive disorganized inboxes.

Image of smart phone showing 1 New Email Message

At this year’s Annual Meeting of the Society of American Archivists, the Electronic Records Section and the Records Management Section joined forces to present a panel on the breakthroughs and challenges of managing email in the archives.

Sarah Demb, Senior Records Manager, kicked off the panel by discussing the Harvard University Archive’s approach to collecting email from internal and donated records. Since their records retention schedule is not format specific, it doesn’t separate email from other types of correspondence. Correspondence in electronic format affects the required metadata and acquisition tools and methods, not the appraisal decisions, which are driven entirely by content. When a collection is acquired, administrative records are often mixed with faculty archives which poses a major challenge for appraisal of correspondence. This is true for paper and email correspondence, but a digital environment lends itself to mixing administrative and faculty records much more easily. Another major challenge in acquiring these internal records is that the emails are often attached to business systems in the form of notifications and reporting features. These system specific emails have significant overlap and cause duplication when system reports exist in one or many inboxes.

Since  internal records at Harvard University Archives are closed for 50 years, and personal information is closed for 80 years, Demb is less concerned with an accidental disclosure of private information to a researcher and more concerned with making the right appraisal decisions during acquisition. Email is acquired by the archive at the end of faculty’s career rather than regular smaller acquisitions, which often leaves the archivist with one large, unwieldy inbox. Although donors are encouraged to weed their own inbox prior to acquisition, this is a rare occurrence. The main strategy that Demb supports is to encourage best practices through training and offering guidance whenever possible.

The next presenter was Chris Prom, Assistant University Archivist at the University of Illinois at Urbana-Champaign. He discussed the work of Andrew W. Mellon Foundation and the Digital Preservation Coalition Task Force on Technical Approaches to Email Archives. This task force includes 12 members representing the U.K. and U.S. as well as numerous “Friends of the Task Force” who provide additional support. The task force recently published a draft report which is available online for comment through August 31st. Don’t worry if you won’t have time to comment in the next two days because the report will go out for a second round of comments in September. The task force is taking cues from other industries that are doing similar work with email, such as legal and forensic fields which use email as evidence. Having corporate representation from Google and Microsoft has been valuable because they are already acting upon suggestions from the task force to make their systems easier to preserve.

One major aspect of the task force’s work is addressing interoperability. Getting data out of one platform and usable by different tools has been an ongoing challenge for archivists managing email. There are many useful tools available, but chaining them together for a holistic workflow is problematic. Prom suggested one potential solution to the ‘one big inbox’ problem is to capture email via API to collect at regular intervals rather than waiting for an entire career’s worth of email to accumulate.

Camille Tyndall Watson, Digital Services Section Manager at State Archives of North Carolina, completed the panel discussing the Transforming Online Mail with Embedded Semantics (TOMES) project. This grant funded project is focused on appraisal by implementing the capstone approach, which identifies certain email accounts with enduring value rather than identifying individual emails. The project includes partners from Kansas, Utah, and North Carolina, but the hope is that this model could be duplicated in other states.

The first challenge was to choose the public officials whose accounts are considered part of the ‘capstone’ based on their position in the organizational chart. The project also crosswalked job descriptions to functional retention schedules. By working with the IT department, the team members are automating as much of the workflow as possible. This included assigning position numbers for ‘archival email accounts’ in order to track positions rather than individuals, which is difficult in an organization with significant turn-over like governmental departments. This nearly constant turn-over requires constant outreach to answer questions like “what is a record” and “why does the archive need your email?” The project is also researching natural language processing to allow for an automated and simplified process of arrangement and description of email collections.

The main takeaway from this panel is that email matters. There are many challenges, but the work is necessary because email, much like paper correspondence, has cultural and historical value beyond the transactional value it serves in our everyday lives.

profilephoto


Alice Sara Prael is the Digital Accessioning Archivist for Yale Special Collections at Beinecke Rare Book & Manuscript Library.  She works with born digital archival material through a centralized accessioning service.

Advertisements

Adventures in Email Wrangling: TAMU-CC’s ePADD Story

By Alston Cobourn

This is the first post in the bloggERS series on Archiving Digital Communication.

Getting Started

Soon after I arrived at Texas A&M University-Corpus Christi in January 2017 as the university’s first Processing and Digital Assets Archivist, two high-level longtime employees retired or switched positions. Therefore, I fast-tracked an effort to begin collecting selected email records because these employees undoubtedly had some correspondence of long-term significance, which was also governed by the Texas A&M System’s records retention schedules.

I began by testing ePADD, software used to conduct various archival processes on email, on small date ranges of my work email account.  I ultimately decided to begin using it on selected campus email because I found it relatively easy to use, it includes some helpful appraisal tools, and it provides an interface for patrons to view and select records of which they want a copy. Since the emails themselves live as MBOX files in folders outside of the software, and are viewable with a text editor, I felt comfortable that using ePADD meant not risking the loss of important records. I installed ePADD on my laptop with the thought that traveling to the employees would make the process of transferring their email easier and encourage cooperation.

Transferring the email

In June 2017, I used ePADD Version 3.1 to collect the email of the two employees.  My department head shared general information and arranged an appointment with the employees’ current administrative assistant or interim replacement as applicable. She also made a request to campus IT that they keep the account of the retired employee open.  IT granted the interim replacement access to the account.

I then traveled to the employees’ offices where they entered the appropriate credentials for the university email account into ePADD, identified which folders were most likely to contain records of long-term historical value, and verified the date range I needed to capture.  Then we waited.

In one instance, I had to leave my laptop running in the person’s office overnight because I needed to maintain a consistent internet connection during ePADD’s approximately eight hours of harvesting and the office was off-campus.  I had not remembered to bring a power cord, but thankfully my laptop was fully charged.

Successes

Our main success—we were actually able to collect some records!  Obvious, yes, but I state it because it was the first time TAMU-CC has ever collected this record format and the email of the departed employee was almost deactivated before we sent our preservation request to IT. Second, my department head and I have started conversations with important players on campus about the ethical and legal reasons why the archives needs to review email before disposal.

Challenges

In both cases, the employee had deleted a significant number of emails before we were able to capture their account and had used their work account for personal email.  These behaviors confirmed what we all already knew–employees are largely unaware that their email is an official record. Therefore, we plan to increase efforts to educate faculty and staff about this fact, their responsibilities, and best practices for organizing their email.  The external conversations we have had so far are an important start.

ePADD enabled me to combat the personal email complication by systematically deleting all emails from specific individual senders in batch. I took this approach for family members, listservs, and notifications from various personal accounts.

The feature that recognizes sensitive information worked well in identifying messages that contained social security numbers. However, it did not flag messages that contained phone numbers, which we also consider sensitive personal information. Additionally, in-message redaction is not possible in 3.1.

For messages I have marked as restricted, I have chosen to add an annotation as well that specifies the reason for the restriction. This will enable me to manage those emails at a more granular level. This approach was a modification of a suggestion by fellow archivists at Duke University.

Conclusion

Currently, the email is living on a networked drive while we establish an Amazon S3 account and an Archivematica instance. We plan to provide access to email in our reading room via the ePADD delivery module and publicize this access via finding aids. Overall ePADD is a positive step forward for TAMU-CC.

Note from the Author:

Since writing this post, I have learned that it is possible in ePADD to use regular expressions to further aid in identifying potentially sensitive materials.  By default the program uses regular expressions to find social security numbers, but it can be configured to find other personal information such as credit card numbers and phone numbers.  Further guidance is provided in the Reviewing Regular Expressions section of the ePADD User Guide.

 

ABCheadshotAlston Cobourn is the Processing and Digital Assets Archivist at Texas A&M University-Corpus Christi where she leads the library’s digital preservation efforts. Previously she was the Digital Scholarship Librarian at Washington and Lee University. She holds a BA and MLS with an Archives and Records Management concentration from UNC-Chapel Hill.