An Interview With Caitlin Birch — Digital Collections and Oral History Archivist at the Rauner Special Collections Library, Dartmouth

Interview conducted with Caitlin Birch by Juli Folk in March 2019

This is the third post in the Conversations series

Meet Caitlin Birch

Caitlin Birch is the Digital Collections and Oral History Archivist for the Rauner Special Collections Library at Dartmouth College in Hanover, New Hampshire: she sat down with Juli Folk, a graduate student at the University of Maryland-College Park iSchool, who is pursuing an archives-focused MLIS and certificate in Museum Scholarship and Material Culture. Caitlin’s descriptions of her career path, her roles and achievements, and her insights into the challenges she faces helped frame a discussion of helpful skill sets for working with born-digital archival records on a daily basis.

Caitlin’s Career Path

As an undergraduate, Caitlin majored in English, concentrating in journalism with minors in history and Irish studies. After a few years working as a reporter and editor, she began to consider a different career path, looking for other fields that emphasize constant learning, storytelling, and contributions to the historical record. In time, she decided on a dual degree (MA/MSLIS) in history and archives management from Simmons College (now Simmons University). Throughout grad school, her studies focused on both historical methods and original research as well as archival theory and practice.

When asked about the path to her current position, Caitlin responded, “To the extent that my program allowed, I tried to take courses with a digital focus whenever I could. I also completed two internships and worked in several paraprofessional positions, which were really invaluable to preparing me for professional work in the field. I finished my degrees in December 2013 and landed my job at Dartmouth a few months later.” She now works as the Digital Collections and Oral History Archivist for Rauner Special Collections Library, the home of Dartmouth College’s rare books, manuscripts, and archives, compartmentalized within the larger academic research library.

Favorite Aspects of Being an Archivist

For Caitlin, the best aspects of being an archivist are working at the intersection of history and technology; teaching and interacting with people every day; and having new opportunities to create, innovate, and learn. Her position includes roles in both oral history and born-digital records, and on any given day she may be juggling tasks like teaching students oral history methodology, working on the implementation of a digital repository, building Dartmouth’s web archiving program, managing staff, sharing reference desk duty, and staying abreast of the profession via involvement with the SAA and the New England Archivists Executive Board. “I like that no two days are the same,” she shared, adding, “I like that my work can have a positive impact on others.”

Challenges of Being an Archivist

Caitlin pointed out that aspects of the profession change and evolve at a pace that can make it difficult to keep up, especially when job- or project-related tasks demand so much attention. She also noted other challenges: “More and more we’re grappling with issues like the ethical implications of digital archives and the environmental impact of digital preservation.” That said, she finds that “the biggest challenge is also the biggest opportunity: most of what I do hasn’t been done before at Dartmouth. I’m the first digital archivist to be hired at my institution, so everything—infrastructure, policies, workflows, etc.—has been/is being built from the ground up. It’s exciting and often very daunting, especially because this corner of the archives field is dynamic.”

Advice for Students and Young Professionals

As a result, Caitlin emphasized the importance of experimentation and failure. “Traditional archival practice is well-defined and there are standards to guide it, but digital archives present all kinds of unique challenges that didn’t exist until very recently. Out of necessity, you have to innovate and try new things and learn from failure in order to get anywhere.” For this reason, she recommended building a good professional network and finding time to keep up with the professional literature. “It’s really key to cultivate a community of practice with colleagues at other institutions.”

When asked whether she sets aside time specified for these tasks or if she finds that networking and research are natural outputs of her daily work, Caitlin stated that networking comes more easily because of her involvement with professional organizations. However, finding time for professional literature and research proved more difficult, a concern Caitlin brought to her manager. In response, he encouraged her to block 1-2 hours on her calendar at the same time every week to catch up on reading and professional news. She remains grateful for that support: “I would hope that every manager in this profession encourages time for regular professional development. It may seem like it’s taking time away from job responsibilities, but in actuality it’s helping you to build the skills and knowledge you need for future innovation.”


SAA-bloggERS-headshot-Juli_Folk

Juli Folk is finishing the MLIS program at the University of Maryland-College Park iSchool, specializing in Archives and Digital Curation. Previously a corporate editor and project manager, Juli’s graduate work supplements her passions for writing, art, and technology with formal archival training, to refocus her career on cultural heritage institutions.

An Interview with Erica Titkemeyer – Project Director and AV Conservator at the Southern Folklife Collection, UNC

Interview conducted with Erica Titkemeyer by Morgan McKeehan in March 2019.

This is the second post in a new series of conversations between emerging professionals and archivists actively working with digital materials.


Erica is the Project Director and AV Conservator at the Southern Folklife Collection, in Wilson Special Collections Library at the University of North Carolina at Chapel Hill’s University Libraries.Erica Titkemeyer

Tell us a little bit about the path that brought you to your current position.

As an undergrad I majored in Cinema and Photography, which initially put me in contact with many of the analog-based obsolete formats our team at UNC works to digitize now. It was also during this time when I saw how varied new proprietary born-digital formats could be based on camera types, capture settings, and editing environments, and how these files could be just as problematic as film and magnetic-based formats when trying to access content over time. Whether projects originated on something like DVCAM or P2 cards, codec and file format compatibility issues were a daily occurrence in classes. After undergrad I went through NYU’s Moving Image Archiving and Preservation program where courses in digital preservation helped instill a lot of the foundational knowledge I use today.

After grad school, I spent 9 months in the inaugural National Digital Stewardship Residency cohort in Washington, D.C., where I worked at Smithsonian Institution Archives to explore digital preservation needs and challenges of digital media art.

My current position is primarily concerned with the timely digitization, preservation and access of obsolete analog audiovisual formats, but our digital tape-based collections are growing, and there are many born-digital accessions with a myriad of audio and video file formats that we need to make decisions about now to ensure they’re around for the long term.

What type of institution do you currently work at and where is it located?

I work within Wilson Special Collections Library at the University of North Carolina at Chapel Hill’s University Libraries. I am situated in the Southern Folklife Collection, which holds the majority of audiovisual recordings in Wilson Library; however my team has expanded to work with all audiovisual recordings in the building as part of a new Andrew W. Mellon grant, Extending the Reach of Southern Audiovisual Sources: Expansion.

What do you love most about working with AV archival materials?

I’ve always been excited to learn about moving image and sound technologies and how they fit into historical contexts. Even if I know nothing about a collection except for the format, there’s enough there to understand the time and circumstances the recordings were created in. This is just as much the case for born-digital audiovisual files as it is for analog. We’ve seen file formats, codecs, and recording equipment go by the wayside, and so they exist as markers of a particular time.

What’s the biggest challenge affecting your work (and/or the field at large)?

Current and future digital video capabilities can provide a lot of options for documentarians and filmmakers, which is great news for them, but it also means there’s going to be a flood of new file formats with encodings and specifications we have not dealt with, many of which will already be difficult to access by the time they make it to our library because of planned obsolescence. We’ve already started to see these collections come in, and it’s impossible to normalize everything to our audiovisual target preservation specifications while still retaining quality for various reasons. Fortunately, there are a lot of folks thinking about this who are building some precedent when it comes to making decisions about the files. Julia Kim at Library of Congress, Rebecca Fraimow at WGBH, and I have also done a couple panel talks on this and recently put out an article through Code4Lib on this topic (https://journal.code4lib.org/articles/14244).

What advice would you give yourself as a student or professional first delving into digital archives work?

Everything can seem very overwhelming. There are a lot of directions to take in audiovisual preservation and archiving, and digital archiving and preservation is the shiny new frontier, but there’s a lot to gain by starting with what you know and taking it from there. I think building my knowledge and expertise in analog preservation risks inevitably helps me in tackling some of the more challenging aspects of born-digital audiovisual preservation.


Morgan McKeehanMorgan McKeehan is the Digital Collections Specialist in the Repository Services Department, within the University Libraries at the University of North Carolina at Chapel Hill. In this role, she provides support for the management of and access to digitized and born-digital materials from across the Libraries’ special collections units.

An Interview with Amy Berish – Assistant Archivist at the Rockefeller Archive Center

by Georgia Westbrook

This is the first post in a new series of conversations between emerging professionals and archivists actively working with digital materials.

Amy Berish is an Assistant Archivist at the Rockefeller Archive Center in Sleepy Hollow, New York. There, she is a member of the Processing Team, working on processing collections that cover a wide range of philanthropic history and a variety of materials. A recent graduate of the University of Pittsburgh Master of Library and Information Science program, Amy has generously shared her path and experiences with bloggERS!

Amy began working in her local library when she was 14 and went on to major in library and information science as an undergraduate. While there and throughout graduate school, she worked at the university library, took various internships, and worked for school credit at the preservation lab, all in an effort to find her place in the library and archives world.

In her current role at the Rockefeller Archive Center, she works as part of a larger staff to process incoming collections in both paper and digital formats. The Rockefeller Archive Center collects materials related to the Rockefeller family, but also several other large philanthropic organizations, including the Ford Foundation, the Near East Foundation, the Commonwealth Fund, the Rockefeller Brothers Fund, the Henry Luce Foundation, and the W. T. Grant Foundation, among others. While she shied away from working with digital formats and learning coding skills during college, she has had the opportunity to pursue that work in her current role and has embraced the challenges that have come with it.

“I feel like digital work is the biggest challenge right now, in both the work I am doing and the work of the broader archival profession,” she said. “Learning to navigate the technical skills required to do some of the work we are doing can be especially daunting. Having a positive attitude about change and a willingness to learn is often easier said than done – but I also think these two factors could help make this type of work seem more doable.”

Amy has found support in her teams at the Rockefeller Archive Center and in the archives community in and around New York City. For example, Digital Team members at the Rockefeller Archive Center reminded her that it would be ok to break things in the code, and that they would be able to fix it if she wanted to experiment with a new way of scripting. She has also found support in online forums, which have allowed her to connect to others doing related work across the country.

Beyond scripting, part of her position requires her to deal with formats that might be obsolete or nearly so, and to face policy questions regarding proprietary information and copyright. Like coding however, Amy has used her enthusiasm for learning new skills as an asset in facing these challenges.

“I love learning new things and as a processing archivist, it’s part of my job to continue to learn more about various topics through each collection I process,” Amy said. “I also get the opportunity to learn through some of the digital projects I am working on. I have learned to automate processes by writing scripts. I have also had a lot experience lately working with legacy digital media – from optical disks and floppies to zip disks and Bernoulli disks – it has been a challenge trying to get 10-year-old media to function properly!”

As a new professional, Amy was quick to mention some of the challenges that archivists can face at the beginning of their career. Still, she said, a pat on the back for each small step you take is well-deserved. She cited one of her graduate school professors, who encouraged her to cultivate an “ethos of fearlessness” when facing technology; she said the phrase has become a mantra in her current position. Since that, Amy acknowledged, is easier said than done, especially while you’re still in school, she has three other pieces of advice to share for others just starting out in digital archives work: Take the opportunities you’re given, always be ready to learn, and don’t be afraid digital work.


Georgia Westbrook is an MSLIS student at Syracuse University. She’s interested in visual resources, oral histories, digital publishing, and open access. Connect with her on LinkedIn or on her website.

Using R to Migrate Box and Folder Lists into EAD

by Andy Meyer

Introduction

This post is a case study about how I used the statistical programming language R to help export, transform, and load data from legacy finding aids into ArchivesSpace. I’m sharing this workflow in the hopes that another institution might find this approach helpful and could be generalized to other issues facing archives.

I decided to use the programming language R because it is a free and open source programming language that I had some prior experience using. R has a large and active user community as well as a large number of relevant packages that extend the basic functions of R,  including libraries that can deal with Microsoft Word tables and read and write XML. All of the code for this project is posted on Github.

The specific task that sparked this script was when I inherited hundreds of finding aids with minimal collection-level information and very long and detailed box and folder lists. These were all Microsoft Word documents with the box and folder list formatted as a table within the Word document. We recently adopted ArchivesSpace as our archival content management system so the challenge was to reformat this data and upload it into ArchivesSpace. I considered manual approaches but eventually opted to develop this code to automate this work. The code is generally organized into three sections: data export, transforming and cleaning the data, and finally, creating an EAD file to load into ArchivesSpace.

Data Export

After installing the appropriate libraries, the first step of the process was to extract the data from the Microsoft Word tables. Given the nature of our finding aids, I focused on extracting only the box and folder list; collection-level information would be added manually later in the process.

This process was surprisingly straightforward; I created a variable with a path to a Word Document and used the “docx_extract_tbl” function from the docxtractr package to extract the contents of that table into a data.frame in R. Sometimes our finding aids were inconsistent so I occasionally had to tweak the data to rearrange the columns or add missing values. The outcome of this step of the process is four columns that contain folder title, date, box number, and folder number.

This data export process is remarkably flexible. Using other R functions and libraries, I have extended this process to export data from CSV files or Excel spreadsheets. In theory, this process could be extended to receive a wide variety of data including collection-level descriptions and digital objects from a wider variety of sources. There are other tools that can also do this work (Yale’s Excel to EAD process and Harvard’s Aspace Import Excel plugin), but I found this process to be easier for my institution’s needs.

Data Transformation and Cleaning

Once I extracted the data from the Microsoft Word document, I did some minimal data cleanup, a sampling of which included:

  1. Extracting a date range for the collection. Again, past practice focused on creating folder-level descriptions and nearly all of our finding aids lacked collection-level information. From the box/folder list, I tried to extract a date range for the entire collection. This process was messy but worked a fair amount of the time. In cases when the data were not standardized, I defined this information manually.
  2. Standardizing “No Date” text. Over the course of this project, I discovered the following terms for folders that didn’t have dates: “n.d.”,”N.D.”,”no date”,”N/A”,”NDG”,”Various”, “N. D.”,””,”??”,”n. d.”,”n. d. “,”No date”,”-“,”N.A.”,”ND”, “NO DATE”, “Unknown.” For all of these, I updated the date field to “Undated” as a way to standardize this field.
  3. Spelling out abbreviations. Occasionally, I would use regular expressions to spell out words in the title field. This could be standard terms like “Corresp” to “Correspondence” or local terms like “NPU” to “North Park University.”

R is a powerful tool and provides many options for data cleaning. We did pretty minimal cleaning but this approach could be extended to do major transformations to the data.

Create EAD to Load into ArchivesSpace

Lastly, with the data cleaned, I could restructure the data into an XML file. Because the goal of this project was to import into ArchivesSpace, I created an extremely basic EAD file meant mainly to enter the box and folder information into ArchivesSpace; collection-level information would be added manually within ArchivesSpace. In order to get the cleaned data to import, I first needed to define a few collection-level elements including the collection title, collection ID, and date range for the collection. I also took this as an opportunity to apply a standard conditions governing access note for all collections.

Next, I used the XML package in R to create the minimally required nodes and attributes. For this section, I relied on examples from the book XML and Web Technologies for Data Sciences with R by Deborah Nolan and Duncan Temple Lang. I created the basic EAD schema in R using the “newXMLNode” functions from the XML package. This section of code is very minimal, and I would welcome suggestions from the broader community about how to improve it. Lastly, I defined functions that make the title, date, box, and folder nodes, which were then applied to the data exported and transformed in earlier steps. Lastly, this script saves everything as an XML file that I then uploaded into ArchivesSpace.

Conclusion

Although this script was designed to solve a very specific problem—extracting box and folder information from a Microsoft Word table and importing that information into ArchivesSpace—I think this approach could have wide and varied usage. The import process can accept loosely formatted data in a variety of different formats including Microsoft Word, plain text, CSV, and Excel and reformat the underlying data into a standard table. R offers an extremely robust set of packages to update, clean, and reformat this data. Lastly, you can define the export process to reformat the data into a suitable file format. Given the nature of this programming language, it is easy to preserve your original data source as well as document all the transformations you perform.


Andy Meyer is the director (and lone arranger) of the F.M. Johnson Archives and Special Collections at North Park University. He is interested in archival content management systems, digital preservation, and creative ways to engage communities with archival materials.