by Kelly Chatain
This is the fourth post in the BloggERS Embedded Series.
As any archivist will tell you, the closer you can work with creators of digital content, the better. I work for the Institute for Social Research (ISR) at the University of Michigan. To be more precise, I am a part of the Survey Research Center (SRC), one of five centers that comprise the Institute and the largest academic social science research center in the United States. But really, I was hired by the Survey Research Operations (SRO) group, the operational arm of SRC, that conducts surveys all over the world collecting vast and varied amounts of data. In short, I am very close to the content creators. They move fast, they produce an extraordinary amount of content, and they needed help.
Being an ‘embedded’ archivist in this context is not just about the end of the line; it’s about understanding and supporting the entire lifecycle. It’s archives, records management, knowledge management, and more, all rolled into one big job description. I’m a functional group of one interacting with every other functional group within SRO to help manage research records in an increasingly fragmented and prolific digital world. I help to build good practices, relationships, and infrastructure among ourselves and other institutions working towards common scientific goals.
Lofty. Let’s break it down a bit.
Find it, back it up, secure it
When I arrived in 2012, SRO had a physical archive of master study files that had been tended to by survey research staff over the years. These records provide important reference points for sampling and contacting respondents, designing questionnaires, training interviewers, monitoring data collection activities, coding data, and more. After the advent of the digital age, a few building moves, and some server upgrades, they also had an extensive shared drive network and an untold number of removable media containing the history of more recent SRO work. My first task was to centralize the older network files, locate and back up the removable media, and make sure sensitive data was out of reach. Treesize Professional is a great tool for this type of work because it creates detailed reports and clear visualizations of disk space usage. This process also produced SRO’s first retention schedule and an updated collection policy for the archive.
Despite its academic home, SRO operates more like a business. It serves University of Michigan researchers as well as external researchers (national and international), meeting the unique requirements for increasingly complex studies. It maintains a national field staff of interviewers as well as a centralized telephone call center. The University of Michigan moved to Google Apps for Education (now GSuite) shortly after I arrived, which brought new challenges, particularly in security and organization. GSuite is not the only documentation environment in which SRO operates, but training in the Googleverse coincided nicely with establishing guidance on best practices for email, file management, and organization in general. For instance, we try to label important emails by project (increasingly decisions are documented only in email) which can then be archived with the other documentation at the end of the study (IMAP to Thunderbird and export to pdf; or Google export to .mbox, then into Thunderbird). Google Drive files are downloaded to our main projects file server in .zip format at the end of the study.
Metadata, metadata, metadata
A marvelous video on YouTube perfectly captures the struggle of data sharing and reuse when documentation isn’t available. The survey data that SRO collects is delivered to the principal investigator, but SRO also collects and requires documentation for data about the survey process to use for our own analysis and design purposes. Think study-level descriptions, methodologies, statistics, and more. I’m still working on finding that delicate balance of collecting enough metadata to facilitate discovery and understanding while not putting undue burden on study staff. The answer (in progress) is a SQL database that will extract targeted structured data from as many of our administrative and survey systems as possible, which can then be augmented with manually entered descriptive metadata as needed. In addition, I’m looking to the Data Documentation Initiative, a robust metadata standard for documenting a wide variety of data types and formats, to promote sharing and reuse in the future.
The original plan for digital preservation was to implement and maintain our own repository using an existing open-source or proprietary system. Then I found my new family in the International Association for Social Science Information Services & Technology (IASSIST) and realized I don’t have to do this alone. In fact, just across the hall from SRO is the Inter-University Consortium for Political and Social Research (ICPSR), who recently launched a new platform called Archonnex for their data archive(s). Out of the box, Archonnex already delivers much of the basic functionality SRO is looking for, including support for the ever-evolving preservation needs of digital content, but it can also be customized to serve the particular needs of a university, journal, research center, or individual department like SRO.
The embedded archivist incorporates a big picture perspective with the specific daily challenges of managing records in ways that not many positions allow. And you never know what you might be working on next…