Politics, Transparency, and Email: Lessons Learned from Trying to Preserve the Historical Record

By Angela White

This post is the ninth in our Spring 2016 series on processing digital materials.

———

My first chance to process an email collection came when a small nonprofit organization[1] in the mid-Atlantic selected my institution as the home for its records. The organization was closing its doors after several decades of advocacy around government transparency. My contact, Fergus[2], made clear from the beginning that he wanted us to preserve the organization’s email as part of the project. I explained the features of ePADD, emphasizing the filtering mechanisms and the ability to isolate items that contained sensitive Personally Identifiable Information (PII). Based on Fergus’s enthusiasm, I naively assumed that the employees’ commitment to transparency extended to their own inboxes.

When Fergus announced our intentions to current and former employees, the protests began pouring in. There were several reasons for concern: many employees used their work email addresses for personal correspondence, the accounts contained information from a number of confidential mailing lists, and there were conversations with politically-active people who had expectations of confidentiality. At this point, I also learned that most employees of the organization no longer had access to their accounts and were unable to clean up sensitive information.

I knew ePADD could make short work of the sensitive PII and mailing lists. However, the private conversations were a big part of the appeal—I couldn’t promise to filter those, but I did offer to restrict the accounts for a period of time and emphasized that access would be onsite only. Later I suggested that transfers could be opt-in, but the damage had already been done. The last straw came when federal government staff got wind of the plan and began voicing their concerns. We had to cancel the project in the face of overwhelming opposition and continued on with the rest of the collection.

There are a number of lessons to take away from this email debacle: do not assume that the organization’s representative is aware of the potential problems with email; make sure that all affected employees have the opportunity to pull out anything personal; and speak face-to-face with members of the organization whenever possible, preferably with a demonstration of ePADD. As a result of our experience, I’m developing a set of questions to guide initial conversations about email:

  1. Does your organizations have any official policies related to use of its email accounts? Is email expected to be part of the public record? How are employees notified of this policy and when?
  2. What is the email culture at your organization? Do employees routinely use work email for personal reasons?
  3. What kind of work-related email exchanges take place on a daily, weekly, or monthly basis? Are any of these of a sensitive political nature? Will any of the work-related content need to be restricted? For how long?
  4. Are the accounts of former employees retained? For how long? How long do they retain access to the account after leaving the organization?

Taking email records from individuals who continue to work in the field requires a sensitive touch. I’ll be better prepared next time to deal with the very real difficulties of convincing people to pry open their inboxes. Despite the technical challenges of digital preservation, I’ve discovered that acquisition is sometimes the hardest part of the process.

[1] The organization has been anonymized to prevent further consternation for former employees.

[2] Name changed to protect the harried.

———

Angela WhiteAngela White is the Philanthropic Studies Archivist at IUPUI in Indianapolis. She collects the records of nonprofit organizations and fundraisers to support the work of the Lilly Family School of Philanthropy. She is currently in conversations with a number of individuals about accessioning their email records.

Advertisements

Electronic Records and E-Recs Related Pop-Ups at SAA!

For anyone who would like to see some electronic records pop-up sessions at SAA this year, voting is currently open. Each member can vote for up to five sessions. The ERS Steering Committee has reviewed the list of proposed sessions, and we think that these pop-up sessions address issues of importance to ERS members (and anyone interested in electronic records). Although this is not a formal endorsement, we would like to spotlight these eight proposals:

  1. Archival Records in the Age of Big Data (Richard Marciano and Bill Underwood)
  2. Archives and Digital Inequality (Myles Crowley, Samantha Winn, and Katharina Hering)
  3. Audiovisual Digital Preservation and Access: The Archive of Public Broadcasting (Karen Mariani)
  4. Developing Descriptive Metadata Best Practices for Archived Websites (Jackie Dooley, Allison Jai O’Dell, and Penny Baker)
  5. Fancy Awesome EAD Exports from ArchivesSpace (Mark Custer and Melissa Wisner)
  6. Improving Finding Aid Visibility: What Are Y’all Doing? (Amelia W. Holmes and Eileen Heeran Dewitya)
  7. Practical Options for Incoming Digital Content (Jody DeRidder and Alissa Helms)
  8. The Bits in the Field: A Survey of Digital Forensics Work (Melanie Wisner)

Voting closes Monday, June 20th, so if you haven’t voted there is still time! Vote here.

 

Mecha-Archivists Revisited: An Interview with Trevor Owens and Emily Reynolds

BloggERS! recently reached out to Trevor Owens and Emily Reynolds from IMLS to discuss the role of archives digital processing activities in the context of the IMLS national digital platform funding priority.

This interview was conducted asynchronously in text.

———

BloggERS!: Tell us about the national digital platform funding priority. Why was it introduced, and what are its primary objectives and deliverables?

Trevor: The national digital platform is a framework for approaching all the digital tools, services, infrastructure, and human effort libraries and archives use to meet the needs of their users across the country. In this respect, the platform isn’t an individual thing. It isn’t a piece of software or a website. Instead, it’s the combination of software applications, social and technical infrastructure, and staff expertise that provide library and archive content and services to all users in the US. For more on the concept, see this article Maura Marx and I wrote for American Libraries magazine.

The National Digital Platform concept was developed and refined through two IMLS Focus convenings (reports from the 2014 and 2015 convenings are available online). Those convenings brought together a diverse array of experts and stakeholders from across library and archives contexts who urged IMLS (and other funders) to focus more on making investments in digital tools and services that could have catalytic national impacts. The results of those meetings have informed the development of a specific national digital platform portfolio in the National Leadership Grants for Libraries Program (NLG) and the Laura Bush 21st Century Librarian Program (LB21).

11165316_635880616557059_2540043480761228244_n

Trevor Owens, Senior Program Officer at IMLS, opening the “Defining and Funding the National Digital Platform” panel, April 2015

BloggERS!: What do you think have been the biggest impacts of the national digital platform so far, especially for archivists working with digital materials?

Trevor: It is still rather early to see the range of impacts and outcomes the first projects in this portfolio are and having. The initial four projects funded from the first cycle of grants last year still haven’t been running for a year yet, the second cycle of grants from last year have only been going for about six months, and the first cycle of grants from this year have just been awarded. With that said, I would suggest that the national digital platform as a framework has already made a significant shift in terms of the projects we are funding. In comparison to many previous projects, these efforts have stronger and clearer plans and approaches to building communities of practice and coalitions to work together to tackle challenges. So in that vein, I’m excited about the prospects of librarians and archivists increasingly seeing their work having both local and direct impacts in their institutions but also as part of national and international networks of teams building, refining, documenting and improving the field through their involvement in the tools that enable our work. So if you take any of those individual projects, like EPADD or Hydra-in-a-box and you see a flurry of activity and engagement with a lot of different stakeholders already as parts of these projects.

Emily: In addition to the tools and services that we’ve funded through NLG, we’ve also seen several LB21 projects have an impact in terms of creating training opportunities for librarians and archivists working with digital materials. The LB21 program has a history of funding projects related to digital skills, even before we conceptualized it as part of the national digital platform. The National Digital Stewardship Residency (NDSR) program is one example; at this point 35 residents have participated in the program in DC, Boston, and New York, and we’ve funded several additional cohorts. Those projects have had real impacts on the careers of the residents themselves (full disclosure: I’m an alumna of the program), as well as the institutions they worked in.

BloggERS!: Where are the current gaps in our national digital capacity with regard to the processing of digital materials?

Emily: One of the interesting things about our jobs as Program Officers is that we aren’t necessarily the ones answering that question. We rely on applicants, peer reviewers, and professionals working in the field to let us know what the pain points are, and where additional capacity is needed. Conferences and blogs (like this one!) are incredibly useful for us to see what topics are of interest in the community, so that we can look at those needs in the context of IMLS’ broader strategic goals and grant programs.

To me, our capacity to manage born-digital and digitized audiovisual materials at scale seems like one of the most critical gaps. IMLS has funded a fair amount of work in this area, from oral history projects like OHMS and Oral History in the Digital Age, to computational approaches to providing access to audio from WGBH and Pop Up Archive, to tools like Avalon Media System. Even with all of this great work, it still remains challenging to adequately manage this complex content and provide meaningful access to it.

Trevor: Completely agree with Emily’s first point on this. The national digital platform is, in many ways, a challenge to the field and to applicants to make the case for what those gaps are and to establish and launch the coalitions that are going to fill them. With that noted, alongside Emily’s points about AV materials, I would also add that it seems like things are starting to really begin to converge and coalesce around the potential for open source tools to support emulation and virtualization as modes of access and preservation. I’ll talk a bit more about some of the projects pointing in this direction in a bit.

BloggERS!: How will the national digital platform help to support the wide variety of different tools and technologies used to acquire, process, and preserve digital materials in archives, libraries, and museums?

Emily: Overall, I think there are a few key themes in our approach to the national digital platform. We’re looking for tools and services that can be implemented and used by many different institutions, across the spectrum in terms of size and resources. I think your use of the phrase “wide variety” in the question also points to another important consideration for us: interoperability. So many institutions are using bespoke approaches to the same problems, with slight variations in tools and methods. Creating linkages between tools and building communities of practice will lower barriers to entry and raise baseline capacity.

Trevor: I would also add that, conceptually, all of those tools and technologies that libraries, archives and museums are using with digital content are already part of the national digital platform. I’m not just trying to be cute, or clever in saying that either. When we step back and look at all of those tools and services that exist now and the skills and knowledge it takes to make them work you start to see all those places where we need to interject resources to improve it. So along with Emily’s great points on interoperability and connecting tools and services I would also again stress that a huge part of this is about skilling up the library and archives workforce to be able to use the range of tools, many of which only work at the command line, that we can piece together to do this work.

BloggERS!: Which current projects being funded through the national digital platform priority are you most excited about?

Trevor: I’m excited about all of them! Seriously, our review process is very competitive and anything that makes it though is really exciting work. With that said, there is a good bit of work that we aren’t able to fund that I would also be excited about. I am happy to share some examples of projects that are particularly relevant to archivists.

Several projects are already making important inroads in this area. I’ll mention a few and then I am sure Emily has some in her portfolio that she can share. For each of these projects anyone can read through their proposal narratives online. So I will be brief about them and link out to where you can find the proposals.

The Software Preservation Network (LG-73-15-0133-15) is holding a national forum alongside this year’s SAA conference to work toward establishing a network of archives working together to develop a strategy for using historical software to provide access to and to process digital archival material. In a related effort, through A Re-enactment Tool for Collections of Digital Artifacts (LG-70-16-0079-16), Rhizome, in partnership with Yale University and the University of Freiburg, are working to enhance a set of open source software tools connecting archives of digital artifacts and emulation frameworks. Together these projects are positioned to both help refine the toolset for this kind of work and to build and establish the community of practice and networks necessary to support archivists doing this work.

Through Systems Interoperability and Collaborative Development for Web Archiving (LG-71-15-0174-15) the Internet Archive, with the University of North Texas, Rutgers University, and Stanford University Library are working to improve systems interoperability and to model enhanced access to, and research use of, web archives. This applied research project is well positioned to refine ways for institutions to interact with Archive-It. Given that over 400 some institutions are using Archive-It, improvements to this system will be very useful to many institutions in the field.

The last project I’ll mention, Improving Access to Time-Based Media through Crowdsourcing and Machine Learning (LG-71-15-0208-15) is a neat example of how an applied research project can significantly impact the field. In this project, the archives at WGBH and the Pop-Up Archive are exploring approaches for metadata creation by leveraging scalable computation and engaging the public to improve access through crowdsourcing games for time-based media. This includes an interesting mixture of speech-to-text and audio analysis tools and open source web-based tools to improve transcripts by engaging the public in a crowdsourced, participatory cataloging project. So the process has potential to inform future work, as well as test the tools that are created as part of the project. Lastly, by working with a massive archive of public broadcasting AV content, the project partners are also going to create and distribute a public dataset of audiovisual metadata for use by other projects.

Emily: Like Trevor said, it’s really hard for us to pick only a few projects to highlight, since we’re so excited about all of the work we’re funding. The development of ePADD (LG-70-15-0242-15) is one great example of IMLS-funded work that will help support digital processing activities. I’m happy to see that a few different people have mentioned it already in this blog series, because it really is an exciting advance in archivists’ capacity to manage email archives.

We also have funded a few interesting national forum projects recently. National forum grants support a meeting or series of meetings, bringing together stakeholders and experts in a topic. Those relationships and networks can persist long past the end of the grant. I’m really looking forward to seeing the results of On the Record, All the Time (RE-43-16-0053-16), a national forum grant to UCLA. They’re addressing the management of digital audiovisual evidence used by law enforcement, and I think the project has the potential to bring about some really interesting partnerships and relationships with other sectors. Like I mentioned earlier, audiovisual content is a huge challenge, and this is an interesting subset of it.

Another exciting national forum grant was awarded to the Amistad Research Center for a project called Diversifying the Digital Historical Record (LG-73-16-0003-16). This project will include a series of meetings with participants including community archives practitioners, scholars, community members, and digital collections experts. It’s an incredibly important issue and the range of partners on the project is amazing.

BloggERS!: Trevor, in a 2014 blogpost, Mecha-Archivists: Envisioning the Role of Software in the Future of Archives, you highlighted the potential value of computational techniques, such as topic modeling and named entity recognition, to help “extend and amplify the seasoned judgement, ethics, wisdom, and expertise [of archivists]” to support making materials available to the user. What progress have you seen in this space since 2014, and how would you rate the development of and training around these tools as a priority within IMLS?

Trevor: I see a lot of the ideas I explored in that Radcliffe Workshop on Technology and Archival Processing as fitting very well with the idea of the National Digital Platform. The key concept in that talk is that we need to approach the work of cultural heritage institutions as complex systems which deploy enabling technologies that support, amplify and extend the abilities of archivists, librarians and the curators to do their work. I realize that’s a mouthful. So I can talk through some examples.

All too often, I have seen folks approach some computational tool and say, “Oh, we could use this to automate classification or description” or a variety of other activities. This sets the bar way too high for the machines. It also is part of longstanding, problematic and flawed notions about expertise, efficiency and labor that devalue what it means to be a professional and an expert. The judgement of professionals and experts is really hard to beat, and it isn’t something we should be trying to beat. Instead of erasing or ignoring all of the accumulated wisdom and expertise of professional librarians, archivists and curators, we should be working to build from and amplify it.

In my mind, the solution is rather simple. Instead of trying to replace the work of experts, it is much better to think through how we can enable and extend that judgement through tools. The example I used in the Radcliffe talk involved Topic Modeling, but I think the same process can and should work for things like natural language processing tools named entity extraction tools, or for that matter tools and services that automate deriving data about audio files or image files.

I see all of this fitting into the national digital platform in a few clear ways. First off, the platform is defined not as a set of tools and services, but as the combined effects of those tools and services and the professionals that animate and operate them. In that vein, the platform is as much about empowering, training and supporting professionals to do the work as it is about giving them the tools to enable them to do the work.

BloggERS!: How can our readers learn more about the national digital platform?

Emily: There is a national digital platform page on the IMLS website, where you can see links to related blog posts, press releases, and publications. That page also links to information about the national digital platform convenings we held in 2014 and 2015. When we post Notices of Funding Opportunity for NLG and LB21, those documents will also have specific information about the funding categories and topics of interest for that specific program.

As part of ongoing efforts towards increased transparency, we’ve begun to publish several documents from successful grant applications online. Trevor and I recently did a series of blog posts highlighting recent awards; each of the projects mentioned in these posts has a link to view some of their proposal documents.

Of course, we also strongly encourage potential applicants and others interested in the national digital platform to contact us! We’re happy to talk through project ideas and provide any additional information about IMLS programs.

———

qMpvSkY2.jpegEmily Reynolds is a Program Officer in the Office of Library Services at the Institute of Museum and Library Services. She manages a portfolio of grants within IMLS’ national digital platform priority area, primarily focusing on projects related to the education and training of librarians and archivists. Prior to joining IMLS, Emily was a National Digital Stewardship Resident at the World Bank Group Archives. Emily has a master’s degree in information science from the University of Michigan School of Information, and was the recipient of a 2014 National Digital Stewardship Alliance Innovation Award.

Trevor-0.jpgTrevor Owens serves as the Senior Program Officer responsible for the development of the national digital platform portfolio for the Office of Library Services in the Institute of Museum and Library Services. He steers an overall strategy encompassing research, grant making, and policy agendas, as well as communications initiatives, in support of the development of national digital services and resources in libraries. From 2010-2015, Trevor served as a Digital Archivist with the National Digital Information Infrastructure and Preservation Program (NDIIPP) in the Office of Strategic Initiatives at the Library of Congress. Before that, he was the community manager for the Zotero project at the Center for History and New Media. Trevor has a doctorate in social science research methods and educational technology from the Graduate School of Education at George Mason University, a bachelor’s degree in the history of science from the University of Wisconsin, and a master’s degree in American history with an emphasis on digital history from George Mason University. He teaches graduate seminars on digital curation, digital preservation and digital history for the University of Maryland’s iSchool and American University’s history department. In 2014 the Society for American Archivists gave him the Archival Innovator Award, an award granted annually to recognize the archivist, repository, or organization that best exemplifies the “ability to think outside the professional norm.”