By Alston Cobourn
This is the first post in the bloggERS series on Archiving Digital Communication.
Soon after I arrived at Texas A&M University-Corpus Christi in January 2017 as the university’s first Processing and Digital Assets Archivist, two high-level longtime employees retired or switched positions. Therefore, I fast-tracked an effort to begin collecting selected email records because these employees undoubtedly had some correspondence of long-term significance, which was also governed by the Texas A&M System’s records retention schedules.
I began by testing ePADD, software used to conduct various archival processes on email, on small date ranges of my work email account. I ultimately decided to begin using it on selected campus email because I found it relatively easy to use, it includes some helpful appraisal tools, and it provides an interface for patrons to view and select records of which they want a copy. Since the emails themselves live as MBOX files in folders outside of the software, and are viewable with a text editor, I felt comfortable that using ePADD meant not risking the loss of important records. I installed ePADD on my laptop with the thought that traveling to the employees would make the process of transferring their email easier and encourage cooperation.
Transferring the email
In June 2017, I used ePADD Version 3.1 to collect the email of the two employees. My department head shared general information and arranged an appointment with the employees’ current administrative assistant or interim replacement as applicable. She also made a request to campus IT that they keep the account of the retired employee open. IT granted the interim replacement access to the account.
I then traveled to the employees’ offices where they entered the appropriate credentials for the university email account into ePADD, identified which folders were most likely to contain records of long-term historical value, and verified the date range I needed to capture. Then we waited.
In one instance, I had to leave my laptop running in the person’s office overnight because I needed to maintain a consistent internet connection during ePADD’s approximately eight hours of harvesting and the office was off-campus. I had not remembered to bring a power cord, but thankfully my laptop was fully charged.
Our main success—we were actually able to collect some records! Obvious, yes, but I state it because it was the first time TAMU-CC has ever collected this record format and the email of the departed employee was almost deactivated before we sent our preservation request to IT. Second, my department head and I have started conversations with important players on campus about the ethical and legal reasons why the archives needs to review email before disposal.
In both cases, the employee had deleted a significant number of emails before we were able to capture their account and had used their work account for personal email. These behaviors confirmed what we all already knew–employees are largely unaware that their email is an official record. Therefore, we plan to increase efforts to educate faculty and staff about this fact, their responsibilities, and best practices for organizing their email. The external conversations we have had so far are an important start.
ePADD enabled me to combat the personal email complication by systematically deleting all emails from specific individual senders in batch. I took this approach for family members, listservs, and notifications from various personal accounts.
The feature that recognizes sensitive information worked well in identifying messages that contained social security numbers. However, it did not flag messages that contained phone numbers, which we also consider sensitive personal information. Additionally, in-message redaction is not possible in 3.1.
For messages I have marked as restricted, I have chosen to add an annotation as well that specifies the reason for the restriction. This will enable me to manage those emails at a more granular level. This approach was a modification of a suggestion by fellow archivists at Duke University.
Currently, the email is living on a networked drive while we establish an Amazon S3 account and an Archivematica instance. We plan to provide access to email in our reading room via the ePADD delivery module and publicize this access via finding aids. Overall ePADD is a positive step forward for TAMU-CC.
Note from the Author:
Since writing this post, I have learned that it is possible in ePADD to use regular expressions to further aid in identifying potentially sensitive materials. By default the program uses regular expressions to find social security numbers, but it can be configured to find other personal information such as credit card numbers and phone numbers. Further guidance is provided in the Reviewing Regular Expressions section of the ePADD User Guide.
Alston Cobourn is the Processing and Digital Assets Archivist at Texas A&M University-Corpus Christi where she leads the library’s digital preservation efforts. Previously she was the Digital Scholarship Librarian at Washington and Lee University. She holds a BA and MLS with an Archives and Records Management concentration from UNC-Chapel Hill.