Meet our newest ERS steering committee members!

All this week we’ll be featuring introductions to our newest ERS steering committee members! Today, meet Elizabeth Carron, one of our our new steering committee members.

Tell us a little bit about yourself.

“I graduated from the University of Massachusetts Amherst with a background in Early Modern Literature and French Studies and finished my master’s at Simmons College in 2014. I didn’t take the archives track – rather, I was more focused on subject librarianship and digital scholarship. I made amazing connections in the Five College area as a student and as a librarian; in 2014, shortly after graduating, I was offered a project at the Smith College Archives – and I’ve been in archives and archives management ever since! After my project at Smith ended, I moved to Ann Arbor to be a project archivist in collections development at the Bentley Historical Library. Eventually, the position of Archivist for Records Management was created and I transitioned into that role. It’s been my responsibility to develop the program by establishing and communicating best RIM practices to the University community and to push forward acquisition procedures that will support description, arrangement, and access further down the road.” 

What made you decide you wanted to become an archivist?

“Honestly, no one thing. I studied Early Modern language and literature and got involved with several digital humanities projects, which in turn led to a deeper exposure to libraries and archives. From there, I explored graduate programs while working for a cultural heritage org on the admin side and just felt a click with archival programs. Being an archivist means I get to learn about a variety of topics, to meet new people and communities. It also means I get a hand in history-making. Whether I’m collecting or advocating for resources and partnerships, preservation is a profound responsibility.”

What is one thing you’d like to see the Electronic Records Section accomplish during your time as vice-chair?

“I do a lot of acquiring of electronic/digital records and not so much processing; I’d like to explore this process of acquisition and perhaps work on perspectives to assist with understanding e-records/e-record concerns in this process. “

What three people, alive or dead, would you invite to dinner?

“George Sand and Dolly Parton to keep things lively; and my grampa, who was an amazing cook with a never-ending cache of dad jokes.”

Advertisements

Meet our newest ERS steering committee members!

All this week we’ll be featuring introductions to our newest ERS steering committee members! Today, meet Andrea Belair, one of our our new steering committee members.

“My name is Andrea Belair. I am from rural western Massachusetts, and I earned my BA from Marlboro College in Vermont where my focus was Literature and Creative Writing. After taking time off to travel and work in various jobs, and decided to pursue librarianship and went on to earn my MLIS from Rutgers University in 2012. I am currently the Librarian for Archives and Special Collections at Union College in Schenectady, NY, where I started in July, 2018. Before this current role, I was the Archivist for the Office of the President at Yale University for 5.5 years. I have a broad set of duties here at Union College, since we have numerous collections that include rare books and archival collections, but I have been actively involved in records policy and retention for the campus.”

What made you decide you wanted to become an archivist?

“After a part-time job shelving in the stacks of a large university, I decided to pursue a graduate degree to try for a career as an academic librarian. An archivist was always an ideal position that seemed fascinating but perhaps too much of a dream job, so I acquired many broad skills with archival experience “just in case.” I did ultimately land a job as an assistant archivist, and now I am living the dream.”

What is one thing you’d like to see the Electronic Records Section accomplish during your time as vice-chair?

“I always like to bring the importance of records management to light in the profession, since this subject can be an excellent basis to streamline the rest of the workflows and processes within an archive. Records management is often undervalued and under-rated, or it just seems pretty uninteresting, and archivists do not always take time to understand it fully, which can lead to issues down the road. Perhaps some emphasis on records management and records retention would be interesting to explore.”

What three people, alive or dead, would you invite to dinner?

“The Ghost of Christmas Past, the Ghost of Christmas Present, and the Ghost of Christmas Future.”

Meet our newest ERS steering committee members!

All this week we’ll be featuring introductions to our newest ERS steering committee members! Today, meet Annalise Berdini, our new vice-chair and chair-elect.

Annalise Berdini is the Digital Archivist for University Archives at Princeton University. She is responsible for leading the ongoing development of the University Archives digital curation program. As part of this role, she accessions and processes born-digital collections, offers digital preservation consultation and education to students and staff, and collaborates with Public Services to improve born-digital access practices. She also manages the web archives program, processes analog collections, and provides reference services. She was previously the Digital Archivist for Special Collections and Archives at UC San Diego, where she instituted a brand new digital curation program and co-authored the UC Guidelines for Born-Digital Archival Description.

What made you decide you wanted to become an archivist?

“Honestly, I sort of fell into it. I had just started looking into library school and started researching my options after about 6 years of post-undergrad job hopping, and the program I was most interested in had an archives concentration. I remembered a really great archivist that I encountered during some research I did during undergrad, and started asking questions of archivists about the field and what they did. Mostly, the response I got was that there weren’t many jobs! But the archivists I spoke to were also so passionate about the work they did, and talked about all the ways they felt it was important — and they were so happy to answer my questions and offer help and advice — to connect me with more people in the field. That may actually be the main reason I chose it. Up to that point, my experience in my other career(s) had been that people were generally reluctant to offer help or support. That was never my experience with archivists. Once I started classes and some initial processing work, I knew it was where I wanted to be. Constantly changing work, lively academic discourse, exciting new opportunities in applying technology and leveraging data — it’s exactly the kind of job I hoped I’d find. I’m doing work I never thought I would do, and I get to work with such incredible people who challenge me to do more and better every day.”

What is one thing you’d like to see the Electronic Records Section accomplish during your time as vice-chair?

“I’m really excited about the ongoing work the section is already doing to centralize and make easily discoverable favorite resources for practitioners. I’d also like to see the membership get involved in partnering with other sections to talk about the ways/offer guidance on how electronic records can make more discoverable resources/voices traditionally left out of the archives.”

What three people, alive or dead, would you invite to dinner?

“Janelle Monae, Neil Gaiman, and Carrie Fisher.”

Recap: Emulation in the Archives Workshop – UVA, July 18, 2019

By Brenna Edwards

The Emulation in the Archives workshop took place at the University of Virginia (UVA) July 18, 2019, as part of Software Preservation Network’s Fostering a Community of Practice grant cohort. This one-day workshop explored various aspects of emulation in archives, from the legal challenges to access, and included an overview of what UVA is currently doing in this area. The workshop featured talks from people across departments at UVA, as well as people from the Library of Congress. In addition to the talks, there was also a chance to sign up for wireframe testing for UVA’s current access methods for emulated material in their collections. This process was optional, but people could also sign up for distance testing after the workshop if they preferred. 

The day was split into four different parts: an introduction to software preservation and emulation, including legal information; an overview of UVA’s current work in emulation; a look into the metadata for emulations and video game preservation; and considerations for access and user experience. Breaking up the day into these chunks defined a flow for the day, walking through the steps and considerations needed to emulate software and born digital materials. It also helped contain these topics, though of course certain themes and aspects kept appearing throughout the day in other presentations. 

The first portion of the day covered an introduction to software preservation and emulation, and the legal landscape. After explaining more of what Software Preservation Network’s Fostering a Community of Practice grant is, Lauren Work provided some definitions of emulation, software, and curatorial for use throughout the day. 

  • Emulation: digital technique that allows new computers to run legacy systems so older software appears the way it was originally designed
  • Software: source code, executables, applications, and other related components that are set of instructions for computers
  • Curatorial: responsibility and practice of appraising, acquiring, describing

Work then talked more about the Peter Sheeran papers, a collection from an architectural firm based in Charlottesville and the main collection for this project. As a hybrid collection, there were Computer Aided Design (CAD) files and Building Information Modeling (BIM) software included, which posed the question of what to do with it. The answer? Emulation! Since CAD/BIM files are very dependent on what version of the software and files are being used, UVA first did an inventory of what they had, down to license keys and how compatible it is with other software. To do this, they used the FCOP Collections Inventory Exercise to help guide them through what they needed to consider. They also looked at what potential troubleshooting issues and legal issues they might run into. This led nicely into the next presentation all about the legal landscape for software preservation, presented by Brandon Butler of UVA. Butler talked about copyright and the The Copyright Permissions Culture in Software Preservation and Its Implications for the Cultural Records report done by ARL, as well as the idea of fair use, which is often an underutilized solution. He also talked about digital rights management, and how groups like SPN are bringing people together to ask these questions that haven’t been asked before and working to get exemptions granted every three years to help seek permission to crack locks. Overall, he said that you should be good legally, but to do your research just to be on the safe side. 

This was followed by an overview of what UVA is currently doing. After reiterating “Access is everything” to the room, Michael Durbin demonstrated the current working pieces of their emulation system using Archivematica, Apollo, and a Curio custom display interface. He also demonstrated some of the EaaSI platform (which has a sandbox now available!] demonstrating VectorWorks files and how they might be used. Durbin then explained how UVA, in their transition to ArchivesSpace, plans to use the Digital Object function to link to the external emulation, as well as display the metadata that goes along with it. UVA also is taking into consideration the description that can’t be stored in any of UVA’s systems as of yet and how they might incorporate WikiData in the future. Next was Lauren Work and Elizabeth Wilkinson to talk about the curation workflows for software at UVA, which included a revamped Deed of Gift, as well as additional checklists and questionnaires. Their main advice was to talk with the donors early, early, early to get all the information you can, work with the donor to help make preservation and access decisions, but they  also acknowledged it is not always possible. Work and Wilkinson are still working on integrating these steps into the curation workflow at UVA, but also plan to start working more on their appraisal and processing workflows. Have thoughts on the checklist and questionnaire? Feel free to comment on their documents and make suggestions! 

After lunch, we got more into the technical side of things and talked about metadata! Elizabeth Wilkinson and Jeremy Bartczak presented on how UVA is handling archival metadata for software, including questions of how much is enough information, and if ArchivesSpace would be accommodating to this amount of description. While heavily influenced by the University of California Guidelines for Born-Digital Archival Description, they also consulted the Software Preservation Network Emulation as a Service Infrastructure Metadata Model. The result? UVA Archival Description Strategies for Emulated Software, which presents two different approaches to describing software, and UVA MARC Field Look-up for Software Description in ArchivesSpace, which has suggestions on where to put the description in ArchivesSpace. To find out information about the software, they suggested using Google, WorldCat, and Wikidata (for which Yale has created a guide). 

The second portion of this block was about description and preservation of video games, presented by Laura Drake Davis and David Gibson of the Library of Congress. The LOC has been collecting video games since they were introduced, with the first being PacMan. The copyright registry requires a description of item and some sort of visual documentation or representation of game play (a video, source code, etc.). The LOC keeps the original packaging for the game if possible, and they also collect strategy guides and periodicals related to video games. They also take source code, and  the first and last 25 pages of source code are required to be printed out and sent as documentation. Right now, they are reworking their workflows for processing, cataloging, and describing video games, working on relationships with game developers and distributors and with the LC General Counsel Office to assess risks associated with providing access to actual games, and looking into ways to emulate the games themselves. 

The final part of the day was all about access and user experience. First was Lauren Work and Elizabeth Wilkinson to talk about how UVA is considering user access to emulated environments. As of now, they plan to have reading room access only, taking into consideration staff training required to do this and the computer station requirements. They are also taking into consideration what is important about access via emulated environments, a topic discussed at the Architecture, Design, and Engineering Summit at the Library of Congress in 2017. Currently, they are doing wireframe testing with ArchivesSpace to see how users navigate through ArchivesSpace, as well as what types of information is needed for researchers, such as troubleshooting tips, links to related collections, instructions or a note about what to expect within the emulated environment, and how to cite the emulation

The final talk of the day was by Julia Kim of the Library of Congress. Kim talked about her study on user experience with born digital materials at NYU from 2014 to 2015, and compared it to Tim Walsh’s survey on the same thing at the Canadian Center for Architecture done in 2017. Kim found that there is a very fine line between researcher responsibilities and digital archivist responsibilities, users got frustrated with the slowness of the emulations, and there is a learning curve. Overall, Kim found that it’s only somewhat worth it to do emulations, but thinks the EaaSI project will help with this, as well as a lot of outreach and education on what these materials are and how to use them effectively. 

Overall, I found the workshop to be highly informative and I feel more confident considering emulations for future projects. I feel the use of shared community notes helped everyone ask for clarification without disrupting the presenters and allowed for questions to be typed out to be asked at the end. It’s also been helpful to look back on these notes, as slides and links to resources have been added by both presenters and attendees. It’s nice that there is a cohort of people out there working on this and willing to share resources and talk as needed! If you’d like to learn more about the workshop, you can visit their website here, and if you’d like to see the community notes and presentations, you can click here, with the Twitter stream here


Brenna Edwards is currently Project Digital Archivist at the Stuart A. Rose Library at Emory University, Atlanta, GA. Her main responsibility is imaging and processing born digital materials, while also researching the best tools and practices to make them available. 

ml4arc – Machine Learning, Deep Learning, and Natural Language Processing Applications in Archives

by Emily Higgs


On Friday, July 26, 2019, academics and practitioners met at Wilson Library at UNC Chapel Hill for “ml4arc – Machine Learning, Deep Learning, and Natural Language Processing Applications in Archives.” This meeting featured expert panels and participant-driven discussions about how we can use natural language processing – using software to understand text and its meaning – and machine learning – a branch of artificial intelligence that learns to infer patterns from data – in the archives.

The meeting was hosted by the RATOM Project (Review, Appraisal, and Triage of Mail).  The RATOM project is a partnership between the State Archives of North Carolina and the School of Information and Library Science at UNC Chapel Hill. RATOM will extend the email processing capabilities currently present in the TOMES software and BitCurator environment, developing additional modules for identifying and extracting the contents of email-containing formats, NLP tasks, and machine learning approaches. RATOM and the ml4arc meeting are generously supported by the Andrew W. Mellon Foundation.

Presentations at ml4arc were split between successful applications of machine learning and problems that could potentially be addressed by machine learning in the future. In his talk, Mike Shallcross from Indiana University identified archival workflow pain points that provide opportunities for machine learning. In particular, he sees the potential for machine learning to address issues of authenticity and integrity in digital archives, PII and risk mitigation, aggregate description, and how all these processes are (or are not) scalable and sustainable. Many of the presentations addressed these key areas and how natural language processing and machine learning can lend aid to archivists and records managers. Additionally, attendees got to see presentations and demonstrations from tools for email such as RATOM, TOMES, and ePADD. Euan Cochrane also gave a talk about the EaaSI sandbox and discussed potential relationships between software preservation and machine learning.

The meeting agenda had a strong focus on using machine learning in email archives; collecting and processing emails is a large encumbrance in many archives that can stand to benefit greatly from machine learning tools. For example, Joanne Kaczmarek from the University of Illinois presented a project processing capstone email accounts using an e-discovery and predictive coding software called Ringtail. In partnership with the Illinois State Archives, Kaczmarek used Ringtail to identify groups of “archival” and “non-archival” emails from 62 capstone accounts, and to further break down the “archival” category into “restricted” and “public.” After 3-4 weeks of tagging training data with this software, the team was able to reduce the volume of emails by 45% by excluding “non-archival” messages, and identify 1.8 million emails that met the criteria to be made available to the public. Manually, this tagging process could have easily taken over 13 years of staff time.

After the ml4arc meeting, I am excited to see the evolution of these projects and how natural language processing and machine learning can help us with our responsibilities as archivists and records managers. From entity extraction to PII identification, there are myriad possibilities for these technologies to help speed up our processes and overcome challenges.


Emily Higgs is the Digital Archivist for the Swarthmore College Peace Collection and Friends Historical Library. Before moving to Swarthmore, she was a North Carolina State University Libraries Fellow. She is also the Assistant Team Leader for the SAA ERS section blog.