Bits and Baby Steps: RAO Engages with Electronic Collection Material

By Stacey Lavender and Rachael Dreyer

This post is the fourteenth in a bloggERS series about access to born-digital materials.

Officially formed and charged in October 2014, the Society of American Archivists Reference, Access, and Outreach (RAO) Section’s Access to Electronic Records Working Group aimed to evaluate current practices and approaches to providing researchers with access to born-digital and electronic material. There were four initial parts of the working group’s charge:

  1. Conduct initial research to determine on which key focus areas related to reference, access, outreach, and preservation work the working group shall focus its efforts.
  2. Compile a bibliography of key resources, including publications, presentations, and workshops, which explore how archival institutions provide access to born-digital and electronic records. Other organizations active with electronic records will also be included in this resource list.
  3. Conduct a survey of the archival profession regarding current practices and attitudes towards providing access to born-digital and electronic records.
  4. Compile and analyze the survey data in order to identify challenges and opportunities which RAO can address.

The fourth part of the charge is currently underway, and while the data analysis isn’t complete, some big-picture trends have emerged.

While just about every respondent indicated that their institution was providing some access to electronic content (both digitized and born-digital), 89.5% of respondents indicated that at least some of their electronic materials are currently inaccessible to patrons.

A significant portion of this inaccessibility can be attributed to the same reasons that most institutions have some inaccessible analog materials. 18% of respondents reported a lack of time and staff resources as the cause of their electronic background, and 16% cited donor and/or legal restrictions as a contributing factor. However, the most common response by far (62%) came from those having trouble providing access to specific formats of materials. This problem of formats (dealing with obsolete media, obsolete hardware, and the threat of media degradation) as a prevalent and ongoing problem in providing access to electronic records was perhaps the strongest trend revealed in the survey.

Another trend that the survey highlighted was the simple fact that respondents are on the lookout for resources and education opportunities related to access to electronic records, and they’re open to using many different options.

Large percentages of respondents indicated an interest in participating in workshops (48.8%), viewing web resources (40.7%), standards/guidelines (34.9%), and professional assistance from archivists or IT professionals (43%). So it’s clear that the interest in educational opportunities and resources is there, we just need to figure out how best to meet that need.

It is also worth noting that the desire to develop partnerships with the IT professionals in our institutions was something that came up more than once in the survey.

In addition to the 43% of respondents mentioned above that were interested in professional assistance from IT professionals, about 84% cited lack of IT support as an obstacle of some concern when it came to providing access to their electronic materials.

We’ll delve even further into these trends (and some others!) in our survey report, which we plan to have out in the next couple of months. Overall it was very heartening to see that for the most part we’re dealing with similar problems, which means we can tackle them together!

Since so many of the concerns around access to born-digital materials focus on the technological constraints and requirements, end-user access has been relegated to a lower rung on the ladder. But here’s the thing: to get to the higher rungs on the ladder, you have to have a stable base with those lower rungs! So, increasingly, the focus has shifted to the end-users’ needs, as well as the need for appropriate levels of arrangement and description. Public-services archivists are keenly aware of the back-end processes—good arrangement and description is essential to assist researchers in navigating those records.

The working group hopes to help RAO archivists, as well as anyone else in the profession, to take concrete steps toward providing research access to born-digital and electronic records at their institutions. If you have ideas or projects that you would like to see the RAO working group take on, we would love to hear from you!

Rachael Dreyer is currently Head of Research Services for Special Collections at the Pennsylvania State University. She was formerly a reference archivist at the American Heritage Center at the University of Wyoming. She’s very interested in balancing the needs of researchers with the technological “challenges” that born-digital collections present. She is Co-Chair of the RAO Access to Electronic Records Working Group.

Stacey Lavender recently completed a two-year stint as the Houston Arts and History Archives Fellow at the University of Houston. She’s most interested in working with born-digital materials, finding new and technologically innovative ways to provide access, and participating in public outreach initiatives to promote collections. She is Co-Chair of the RAO Access to Electronic Records Working Group.

Born-Digital and in the Virtual Reading Room

By Christine Kim

This post is the thirteenth in a bloggERS series about access to born-digital materials.

At the University of California, Irvine (UCI), we wanted to provide access to the rapidly growing volume of born-digital records but couldn’t really justify the time it would take to look through every single item to evaluate its relevance. This balance of providing access to researchers while simultaneously evaluating the digital content is a huge challenge we’re faced with as we justify collecting and preserving with promoting access to information.

What did we come up with? The Virtual Reading Room (VRR).

In order to for me to properly introduce the VRR and explain how it truly fits into our growing suite of digital resources, we need to back up a few paces. Let’s start from our digital collection page, UCIspace @ the Libraries, which we like to call UCIspace for short. UCIspace is where our digital collections currently live (though we have been working on a big move with the California Digital Library to transition to Calisphere) and includes a variety of materials, such as digitally reformatted collections (images, audio-visual files, oral histories, pdf documents, etc.), as well as born-digital collections. UCIspace is powered by DSpace, a highly customizable open source software package that preserves and enables open access to all sorts of digital files and is administered by the super amazing and incredibly talented IT folks at the UCI Libraries.

We currently have 13 collections on UCIspace, of which six collections include born-digital materials. The six collections may include both digitally reformatted as well as born-digital items, which all co-habitate in peace. They are all now digital and treated with equal amounts of care.

These six collections include born digital materials. Total views from date of creation to February 1, 2016.
These six collections include born-digital materials. Total views from date of creation to February 1, 2016.


Okay, so we have digitally reformatted and born-digital materials available through UCIspace. Then what’s the VRR all about? Well, our VRR is a virtual space that resides within UCIspace in order to provide an extra layer of security for certain items or sub-collections.

We were acquiring volumes and terabytes of hard drives and digital files and raw footage, and our collecting pace was not going to slow down to let us catch up with identifying its digital content. We learned from our physical backlog that if we waited, this material would never be available for access. But we also knew that we couldn’t just make this stuff available. Could we apply MPLP practice to born-digital materials?

That’s where the VRR comes in. We thought, hey, what if we have people agree to certain terms and conditions, and then put these items behind a login so that folks can have access once they agree to some conditions? We like to dream big. And so with that thought, we investigated with our IT unit to see what the possibilities and limitations were. As it turns out, they like to turn dreams into reality.

Any item within the VRR has a certain level of privacy–you can’t just go into the collection page and see the image. The item is “locked” and resides behind a login screen.

Items from the Mark Poster born digital files, 1985-2009. Top item is unrestricted. Bottom item indicates it is accessible through the Virtual Reading Room.
Items from the Mark Poster born-digital files, 1985-2009. Top item is unrestricted. Bottom item indicates it is accessible through the Virtual Reading Room.

Selecting an item within the VRR will prompt a login screen. In order to gain access, both remote researchers and those using reading room public workstations must request access via the VRR Registration Form.

Application for the Virtual Reading Room in UCISpace @ the Libraries.
Application for the Virtual Reading Room in UCISpace @ the Libraries.

Submitting this form provides the user with a login and password, but the most important factor is that the researcher agrees to certain terms and conditions as part of requesting access to VRR materials. These terms are indicated within the “Rules of Use.”

A check-box counts as a digital signature!
A check-box counts as a digital signature!

The “Rules of Use” lists a handful of conditions of use, including the statement: “All digital content in UCIspace @ the Libraries is made publicly available for use in research, teaching, and private study.” The complete “Rules of Use” document is available for all to read.

What type of collections are available in the VRR? For now, we have two collections with items in the VRR, the Mark Poster Born Digital files and the Richard Rorty Born Digital Files. One key aspect is that the entirety of the collection does not have to be in the VRR–DSpace allows us to create a collection, and manage sub-collections so that only selected portions of the collections are behind a login screen.

What are the results of the VRR? Well, here are some use statistics.

From date of creation to February 1, 2016.
From date of creation to February 1, 2016.

How did we make this happen? Having awesome teammates definitely helps. But along those lines, it also helps to have a clear understanding of how a great team operates. Communication, dreaming big, and having a mutual goal of providing excellent public services.

Please reach out to Christine Kim (christik [at] uci [dot] edu) if you have any questions about the Virtual Reading Room, UCIspace, or anything else about UCI Libraries Special Collections & Archives.

Kim_BornDigiandintheVirtualReadingRoom_ERSblog_6Christine Kim is the Public Services Assistant at the UC Irvine Libraries, Special Collections & Archives. She is responsible for connecting researchers with archival and special collections materials, and delights in sharing the resources uniquely available at UC Irvine. She holds an MLIS from San Jose State University and a BA in both History and Film & Media Studies from UC Irvine.

Ensuring Born-Digital Access at the Seeley G. Mudd Manuscript Library

By Rossy Mendez

This post is the twelfth in a bloggERS series about access to born-digital materials.

By exploring the contents of a drive, archivists can obtain information about the contents of folders, the size of files, and details of creation. They can determine what hierarchies are in place by observing a file’s structure and nesting. Despite the richness of metadata available for digital records, archival description remains one of the biggest challenges of processing born-digital collections.

One of the ways that the Seeley G. Mudd Manuscript Library of Princeton University has maximized EAD description has been to explore the definitions of EAD elements. Before I address how the Mudd Manuscript Library is doing this, I would like to provide a bit of a background of the library.

The Seeley G. Mudd Manuscript Library is part of the Rare Books and Special Collections Division at Princeton University. The Library houses and provides access to the university archives and public policy collections. In addition to 30,000+ linear feet of physical records, there are several collections of born-digital records that include student newsletters, the records of a former dean of the graduate school, and more recently, student activism.

Nearly all the staff at Mudd participate in the reference rotation. The benefit to this all-hands-on-deck approach is that the technical services team gains greater insight into how patrons use the collections and finding aids.

At Mudd, patrons can access born-digital collections remotely through the finding aids website.  Individuals can filter out digital material by selecting the “Available online” option from the faceted search menu or by accessing individual folders or items within a collection. By clicking “View Content,” patrons are linked directly to pdfs, images, and even videos. For restricted records, patrons with the appropriate credentials can use their username and password to access files.

Screenshot of Finding Aid for Tiger Hockey Email Newsletters, Princeton University Archives
Screenshot of Finding Aid for Tiger Hockey Email Newsletters, Princeton University Archives

Archivists at Mudd believe that description is an important part of providing access.

Three principles should drive the creation of born-digital description:

  • A user should be able to know what and how much born-digital content exists.
  • A user should be able to know where the digital content lives within the finding aid and have easy access to that content.
  • Moreover, a user should be able to deduct the context of record creation.

One of the most significant changes Mudd archivists made to local EAD description was to the <extent> element. Initially, archivists conceived of the <extent> field as the physical space files occupied in a drive: records were indicated in measures of files and bytes. However, a significant number of patrons do not understand this information. Replacing bytes with “Digital folders” and “Digital files” as a unit of measurement allowed patrons to learn about hierarchies and the arrangement of collections. Furthermore, the inclusion of the word “digital” provided a further indication of the nature of the material.

In addition to <extent>, the <unittitle> element plays an important role in differentiating between digitized and born-digital content. Since the access path is the same for all content, the word “digital” in the title statements at the series and subseries level provides a quick way to tell which sections have born-digital content.

Lastly, the <phystech> element ensures that patrons have as much information about the creation of the digital record as possible (including, for example, the type of computer used) and can address potential compatibility issues.

As archivists we should aim to provide complete access to our records. Good archival description releases the patron’s burden to investigate technical aspects and allows them to focus on what is most important: discovering information and sharing it with others.


Rossy Mendez was formerly a Public Services Project Archivist at the Seeley G. Mudd Manuscript Library. Currently, she is a Project Archivist at the Solomon R. Guggenheim Museum, where she is working on processing the museum’s audio-visual collections and collaborating in the creation of a museum-wide metadata schema and the implementation of Archives Space.

A Multi-Faceted Challenge: Breakouts on Access at the BitCurator User Forum

By Matthew Farrell

This post is the eleventh in a bloggERS series about access to born-digital materials.

On January 15, 2016, the BitCurator Consortium (BCC) held its second annual User Forum at the Louis Round Wilson Library at UNC-Chapel Hill. Representatives from BCC member institutions joined non-member archivists and librarians engaged in digital archival work for a day of discussion regarding born-digital archives and the application of digital forensics to archival practice. Panel descriptions and links to public group notes can be found on the BCC website. A longer recap and reflection of the day is forthcoming on the BCC website.

Participants in the User Forum came from varied contexts–some hold digital archivist job titles, others are researchers active in the development of digital forensics tools, and still others are professionals from curatorial and administrative backgrounds. A major goal of the program committee was to make the User Forum engaging for participants from any background. A breakout session titled “Where Should Access Happen?” leveraged the variety in participants particularly well. The participants broke into four groups, each discussing access to born-digital materials through a different lens: (1) the environment and/or tools that should be available to a researcher in the reading room, (2) what sort of staff resources should be considered, (3) where and how should access be facilitated, and (4) how much and what kinds of metadata should be made available to users prior to and during a research visit.

The group notes are here for posterity. What follows are a few themes that were discussed commonly across multiple groups.

The group discussing the environment and tools that could be available to researchers discussed the amount of control exerted over both collection materials and the access environment. For reading room access, the default discussed tends to be strict controls over materials due to the nature of the content. Strict control as default is shown in another group’s discussion of each participant’s current reading room environment: representatives of Duke University, Penn State University, North Carolina State University, Washington, and the University of Virginia all reported the use of off-network access terminals with controls limiting application access and restricting copying and/or printing. There appears to be an all-or-nothing approach to access. If a set of materials has some necessity for access controls (restricted or sensitive info, rights issues, etc.), access is only available in a heavily mediated environment. On the flip side are those materials that have no restrictions at all, which can (and sometimes are) exposed online.

Further, determining what sort of environment and what toolset to provide researchers in a reading room is hindered by the small number of researchers currently requesting electronic collection materials. The low number of requests is almost certainly affected by the obscurity of those materials in finding aids and catalog records. Without a history of requests for access, it is impossible to determine with any certainty what researchers want to do with collection materials in a reading room environment. Three groups discussed exposing metadata about digital objects via finding aids as principal to offering access in any meaningful way. The group discussing metadata and processing touched on the balance between processing a set of digital objects at a minimum level to make them available and offering the maximum possible amount of metadata about those materials. Discussion generally accepted that over-processing of materials could happen. Though it is unclear what “over-processing” means for a given collection, processing archivists assuming what researchers will want to see and do with materials is a potential indicator.

Over-processing materials, coupled with heavy restrictions to access environments, leaves institutions at risk to providing researchers with glorified e-readers. Such access does little to support digital humanities research. The group discussing staffing for access talked about an example of Emory using Voyant and a poet’s collection of electronic text in a series of instruction sessions (though the project was not officially part of the program at the User Forum, I’m pretty sure Emory archivists would be happy to provide more information). Another group discussed basic tools that could be useful for many types of materials, including versatile viewing applications such as Quickview Plus as well as hex editors. Providing a hex editor to researchers was supported by two independent anecdotes. More advanced tools will likely vary depending on collections, though allowing a researcher to run file identification software or have access to the metadata output of such software is potentially of use. For these scenarios, should repositories be responsible for vetting additional tools and applications? One participant suggested researchers submit code or tools to run across a data set in advance of their visit for inspection by repository staff.

While this is only a small window on the User Forum, I think there are at least a couple of areas discussed above that warrant further immediate work. I’m curious to get a picture of what metadata, exactly, BCC member institutions are exposing in their finding aids, and whether there is room for creating a common set of guidelines for including forensic metadata in descriptive systems. Hopefully more information will come soon on that front. There is a wealth of engaging thoughts in the notes for this session as well as the rest of the User Forum’s group notes, and the BCC website. Any questions about the BitCurator Consortium can be sent to me or Sam Meister.

Matthew Farrell is digital archivist at the David M. Rubenstein Rare Book & Manuscript Library at Duke University and member of the program committee for the 2016 BitCurator User Forum.

Making the Case: Advocacy for Born-Digital Access

By Sam Meister, with help from the Advocacy for Access Hackfest Team

This post is the tenth in a bloggERS series about access to born-digital materials.

"Archery," courtesy of Bryn Mawr College Special Collections.
“Archery,” courtesy of Bryn Mawr College Special Collections.

The Advocacy for Access Hackfest Team members used the preliminary report data from the Born-Digital Access in Archival Repositories study and their own professional experiences in order to develop a project proposal that would provide direct and actionable strategies for making the case to provide born-digital access.

The Team used the preliminary report to find the top three gaps in the categories of Business Analysis, Resource Allocation, and Advocacy from the survey and interview data:

A. Need for additional time and people dedicated to born-digital work
B. Need for more IT resources, staffing, and technical expertise
C. Need for tactics for convincing administrators and funders to allocate greater resources to born-digital archives programs, including long-term sustainability funding

Sparked by these themes, our discussion emphasized the strong interconnection among points A, B, and C. Our group concluded that if one is not able to gain stakeholder buy-in from the administrative or institutional level (point C), points A and B would be very difficult to achieve. The Team concluded that digital stewards need a range of tactics in order to provide administrators with salient information to make the case.

We agreed that a toolkit for an advocacy program, developed through additional survey data and user-submitted strategies, would provide:

  • An environmental scan of tactics, both within the field and outside of it, to advocate for access needs, which would cover:
    • Successes and failures
    • Starting the conversation
    • Example elevator speeches
    • Data predictions for the future
  • Discrete information about resources, which would include data about:
    • Storage setup and costs
    • Tools (e.g. workstation) setup and costs
    • Personnel
    • Platforms
  • The ability to match what an institution needs based on location, size, etc.
  • User stories (a direct tie-in to the Understanding Users Hackfest Team!)
  • Benefits of access, including:
    • Donor relations
    • Staying relevant as digital stewards
  • Literature about advocacy

The toolkit would be an online, searchable database that accepts user submissions, has space for online discussions, and is iteratively updated. The project team would include six people to divide and conquer survey design, analyze data, set up the database, seek funding resources, and administer the database from prototype to production. You can review the full proposal here.

We welcome any feedback, comments, or questions in order to provide a comprehensive project scope.

Sam Meister is the Preservation Communities Manager, working with the MetaArchive Cooperative and BitCurator Consortium communities. Previously, he worked as Digital Archivist and Assistant Professor at the University of Montana. Sam holds a Master of Library and Information Science degree from San Jose State University and a B.A. in Visual Arts from the University of California San Diego. Sam is also an Instructor in the Library of Congress Digital Preservation Education and Outreach Program. He can be reached at sam [at] educopia [dot] org.

Researcher Interactions with Born-Digital: Out of the Frying Pan and into the Reading Room

By Julia Kim

This post is the ninth in a bloggERS series about access to born-digital materials.

This past SAA 2015 , I co-presented in the panel “Out of the Frying Pan and into the Reading Room: Approaches to Serving Electronic Records.” My talk focused on a study of researcher interactions with born-digital collections at NYU’s Fales Library and Special Collections, including unprocessed directories of files, emulated works, and migrated software. Given the enormous resources required for born-digital access, our preliminary study sought to understand how archival researchers received this work. This is something that is relatively unexplored in our field thus far.

NYU began processing the Jeremy Blake Papers and the Exit Art collection in fall 2014 as a way to test and model “access-driven” born-digital workflows. Throughout the year, we focused on ensuring researchers would be able to access collections before the year’s end.

Exit Art
The archives of Exit Art, a non-profit cultural center in Manhattan, were donated to NYU’s Fales Library and Special Collections in 2012 and included analog content, time-based materials, and the organization’s 2TB server. Within the 2TB server, we narrowed the focus for our study to a directory called “Alternative Histories,” which was the name of a major 2010 survey exhibition organized by Exit Art. This small sliver of born-digital files measured 47.6 GB, 18,642 files, and 1132 folders. Contents included administrative files and correspondences, photographs, promotional materials, and published works. Due to time constraints, the Alternative Histories files were not arranged.

Exit Art finding aid
Exit Art finding aid

Jeremy Blake
Jeremy Aaron Blake (1971-2007) was an American digital artist known for his digital c-prints and his looped animated sequences that he coined his “time-based paintings.” The Jeremy Blake Papers included approximately 400 pieces of media on optical media, digital linear tape, and jaz drives; as well as external hard drives and copied files. The majority of his working files are in the Adobe Photoshop formats. The Blake archive afforded a detailed analysis of his working process, which in his case was primarily in Adobe Photoshop files created in the late 1990s-mid 2000’s (more background information here).

Researcher Interviews
Five seasoned Fales Library and Special Collections archival researchers participated in evaluating new forms of digital access through an on-site visit to New York University’s Digital Forensics Lab for one to three hours. They were invited to participate due to their extensive familiarity with archival research.

While a few of them had some familiarity with the artistic content of the collections, their disciplinary focuses included contemporary art history, 18th-century literature, digital humanities, platform studies, and computer science.

Each researcher agreed to be audio- and video-recorded and to the use of Think Out Loud Protocol (TOP), in which they verbalized anything they saw, did, or noticed. Afterward, the interviews were transcribed and the recordings were deleted to ensure anonymity.

While we allotted an hour, most researchers stayed longer to explore and discuss. After an initial overview, they were shown the existing Alternative Histories portion of the Exit Art finding aid. They were introduced to QuickView Plus software and the Alternative Histories files on our designated locked down researcher laptop. After 30 minutes, the researchers switched to another laptop that was installed with an emulation to access the Blake collection.

Jeremy Blake emulation
Jeremy Blake emulation

Researchers were encouraged to explore the files using the contemporary Windows PC and contemporary Photoshop program, and the software program Forensic Toolkit (FTK), to make comparisons. FTK included a preliminary “bookmarked” arrangement of the imaged Blake files. Researchers were also able to compare the same files, for comparison and viewing, on older computers.

Jeremy Blake FTK bookmarks
Jeremy Blake FTK bookmarks

Below are a few interesting researcher responses from the interviews:

“This seems to principally be a record of how an institution planned events.”

In exploring “Alternative Histories,” some noted that future interest in the records could be in the realm of organizational management and administrative studies, rather than in the artistic content of the exhibition files. An arrangement, then, might lose this record of how a small arts organization planned and executed a major exhibition. The directory organization, the many drafts of letters, the records of the organization’s own archival research, for example, might be obscured by processing.

The possible value of the “original order” of unprocessed administrative files found in Exit Art was appreciated deeply by the researcher quoted above, in spite of some confusion about how to navigate the unprocessed organizational directories and files. Our evaluation of this researcher’s experience lends credence to the idea of multiple, interactive, and flexible arrangements to support multiple understandings of the collection material. These options are not only increasingly possible, but expeditious and in-keeping with patron expectations of random access, keyword-driven searches.

“Given the choice between emulated access and contemporary access, I would use contemporary unless I was doing a book-length study.”

Most of the researchers emphasized that if it came to partially processed files or emulations and a significant time delay in processing, they would take unprocessed and relatively inauthentic files. Access by any means, and ease of access were stressed by the majority. All found the emulation’s authentically slow-processing speed and instability impediment enough to prefer contemporary computing system access. With the exception of one researcher with prior archival processing experience, authenticity was not a concern. While all researchers appreciated the emulation, hearing that they ultimately did not prefer using it was a surprise to me.

“This type of art, born-digital, is not taught. If it had been available, I might have changed my course of study.”

This researcher was thrilled at the opportunity emulation represented to access born-digital technology dependent artwork. She reflected that had she the opportunity, she might have studied more art “after the 1990s.” Her enthusiasm was both a validation and a call for further work with researchers to study these newly available collections. Because we publicized this user study online in blogs, more researchers have had the opportunity to come in and study with these collections, despite the collections’ “imperfect” and “unfinished” status.

While not conclusive or generalizable by any means, these interviews were necessary first steps to begin understanding how these new forms of access were interpreted and received. More information on this research is coming soon!

Many thanks to Lisa Darms and Donald Mennerich. Their support and encouragement were essential to this project.

Kim_ResearcherInteractionsWithBornDigital_ERSblog_4Julia Kim is the Digital Assets Manager with the American Folklife Center at the Library of Congress. Previously, she was a National Digital Stewardship Member at New York University where she worked on the project detailed in this post. She received her B.A. from Columbia University and her M.A. in Moving Image Archiving and Preservation from New York University. She can be contacted at juliakim [at] loc [dot] gov and her handle is @ jy_kim29.

When It Comes to Born-Digital, How Well Do We Know Our Users?

By Wendy Hagenmaier, with insights and inspirations from the Understanding Users for Access Hackfest Team

This post is the eighth in a bloggERS series about access to born-digital materials.

What do we know about the needs, motivations, and experiences of users of born-digital archival materials?

"Computing laboratories," courtesy of Georgia Tech Archives via DPLA
“Computing laboratories,” courtesy of Georgia Tech Archives via DPLA

The archival profession has been processing and preserving born-digital collections for years, but as we begin to engage in more conversations about designing solutions for providing access to those materials, it seems we need to ask ourselves how much we actually know about what our users want. Or perhaps, how much we don’t know. In an era of user-centered design, what do archivists and their IT allies still need to learn about users of born-digital materials in order to engineer intuitive mechanisms for providing access to those collections? And how can we go about learning those unknowns? Can we gather data that might help anticipate future access needs or encourage more people to discover and reuse our born-digital materials?

These are the complex and captivating questions the members of the Understanding Users for Access Hackfest Team have been tackling since the 2015 SAA Annual Meeting. The Team’s mission was to develop a proposal for a long-term collaborative project that would empower archivists to better understand users of born-digital materials, and would thereby help to address current obstacles to born digital access. Archivists from around the country self-selected to form the Team, and Elizabeth Keathley, owner of Atlanta Metadata Authority, volunteered to serve as our Leader. As a member of the research group behind the Born-Digital Access in Archival Repositories study, I served as the Team’s Researcher. We started by exploring data and themes related to understanding users, gathered through the Born-Digital Access in Archival Repositories study. For example, two anonymized quotes we examined from the study:

“We’ve done a number of usability studies with our finding aids. But I don’t […] know that anybody’s done this yet with born-digital. And part of the barrier might be that not a lot of places are making it available. Or we’ve just not seen the demand for it. I know of other institutions where they already have reading room access to born-digital materials, but nobody’s asking for it, you know. [Our software developer was] very reluctant to do any sort of real development without knowing what it was that users wanted. So […] we definitely see that it’s a need.”

“I would like to know what people are actually using, or interested in using, how […] they know our material exists, and what’s driving them to make their requests in the first place? […] If they are working on an annotated copy of a literary work, knowing that we actually do maybe need to provide them access to as close to original Word documents as possible for literary manuscripts, versus whether they just need access to a fixed form of the document that we could just port to PDF […]. So knowing both their research question and also the larger project they are working on would certainly be helpful.”

Rachael Dreyer, Head of Research Services for Special Collections at the Pennsylvania State University, and Katie Pierce Meyer, Humanities Librarian for Architecture & Planning at the University of Texas at Austin, documented the Team’s discussion as we identified our research questions and debated various strategies for our project proposal.

In the months following SAA, the Team worked together to articulate and iterate over a proposal for a “Community-Wide Mixed-Methods Needs Assessment of Users of Born-Digital Archives.” We invite you to read and comment on our proposal, and to join us in implementing the ideas it outlines.

As described in the proposal, the roughly three-year-long Needs Assessment would include:

  • the development of mixed-methods research instruments to explore the needs, backgrounds, experiences, and skill sets of users of born-digital archives;
  • a community wide data gathering effort;
  • a rigorous analysis of the data;
  • and the publication of findings and next steps for improving born-digital access.

The insights gained through the Needs Assessment would empower practitioners to design interfaces for access that are tailored to user needs, to communicate user needs more effectively to software developers, and to provide improved access services to users. The study would also provide opportunities for busy practitioners to participate in important research in a manageable but meaningful way, and for the community to work together to ensure that archives remain agile, relevant, and ready to meet the needs of 21st-century users.

Products of the Needs Assessment might include, but not be limited to, the following:

  • A website that would act as a hub for information about the project as well as results gathered from it. The website would provide access to project documentation and links to the research instruments, scrubbed data set, and reports (hosted in an open access repository).
  • A web form to serve as the interview script and notes document (hosted by an IRB-certified service, such as a university’s instance of Qualtrics).
  • An online survey (also hosted by an IRB-certified service).
    • Interview and survey questions would be broad enough to apply to a wide variety of institutional contexts yet specific enough to gather useful data to describe user needs and experiences. They could be used before, during, or after accessing born-digital materials–this would be determined by the research team during the first six months of the project.
  • Instructions and training materials for using the interview and survey instruments.
  • Open access, (possibly peer-reviewed) published analysis of the results of the study, along with an anonymized version of the data set. The analysis could also include user personas and user stories created from the generalized data from the study.

The project team could include:

  • Project Leads: An IRB-certified research team, composed of a diverse group of approximately 10 individuals with varying expertise and from several different institutions (archivists, software developers, user experience designers, PhD students, etc.).
  • Project Participants: Practitioners (ideally, at least 25) who opt-in to use the research instruments in their local institutions and contribute data to the project.

Additional details about proposed timeline, budget, dissemination and preservation plans, and sustainability strategy are available (and ready for comment!) in the proposal.

The Team hopes that results from the Needs Assessment would suggest next steps for improving born-digital access and could be cited in the future when archivists apply for further grant funding to expand access to born-digital materials. The research instruments could be revised for a second round of the study, which could be completed in approximately ten years, when many more archives will be providing access to born-digital materials. Results from the first and second rounds could be compared to track born-digital access over time.

Want to help? Let’s do this. We are seeking volunteers for a team that can make this Needs Assessment a reality. Please feel welcome to comment on the proposal and contact me if you’re interested in getting involved (wendy [dot] hagenmaier [at] library [dot] gatech [dot] edu).

Wendy Hagenmaier is the Digital Collections Archivist at the Georgia Tech Archives, where she develops policies and workflows for digital processing, preservation, and access. She received her M.S.I.S. with a focus on digital archives from the University of Texas at Austin. She is Vice President of the Society of Georgia Archivists, Chair of the SAA Issues and Advocacy Roundtable, and steering committee member for the SAA Electronic Records Section and Architectural Records Roundtable.

Processing in Tears Tiers: Applying a Flexible Approach to Born-Digital Materials

By Dorothy Waugh

This post is the seventh in a bloggERS series about access to born-digital materials.

As a digital archivist, I am always looking for ways to streamline my processing workflows. Because, when faced with a multi-gigabyte hard drive or pile of aging 3.5” floppy disks, time is rarely on my side. The diverse nature of our collections, however, can be a hindrance when trying to build workflows based on efficiency—in spite of our best efforts, there are frequently cases where a collection’s particular characteristics demand greater levels of resources and time. During the past year, archivists at the Stuart A. Rose Manuscript, Archives, and Rare Book Library have developed a tiered approach to processing and access designed to flex in response to a collection’s limitations without obscuring its salient characteristics.

First, unprocessed collections are evaluated based on four criteria:

  1. Quality of data, defined by us as the totality, scope, and viability of the acquired material. Our focus here is on the completeness of the acquisition (an entire hard drive, for example, provides more content and context than a directory of files), the number of years spanned, and the extent to which data can be rendered using modern software.
  2. Authenticity of data, which refers to our ability to establish that it was indeed the donor that created, used, or managed the digital content.
  3. The number of donor restrictions and any concern regarding intellectual property rights.
  4. The extent to which we anticipate particularly high levels of use. Crucially, we want to avoid postponing questions about access until processing is complete, instead evaluating expected use up front so that access will inform processing from the very start.

Based on this evaluation, we assign collections to one of three processing tiers:

Tier One: Low complexity, tool-driven
Collections assigned to this tier will primarily include familiar, homogenous file formats and will have few donor restrictions or intellectual property rights concerns. The limited complexity of the data will allow for largely tool-driven processing.

Tier Two: Average complexity, combination of tool-driven and manual approaches
Collections assigned to this tier will primarily include familiar file formats, although there may be instances of more challenging file formats. Some donor restrictions may apply.

Tier Three: High complexity, high manual effort
Collections assigned to this tier will include a large number of heterogeneous and challenging file formats. There may be a high level of donor restrictions. The scope of the collection and anticipated high levels of use may demand a more involved approach to arrangement, description, and access.

This diagram illustrates the application of this approach to born-digital materials acquired as part of Lucille Clifton’s papers:

Waugh_TieredApproachtoProcessing_ERSblogInformation in the three left-hand columns provides a broad assessment of the collection’s born-digital material based on our four criteria. This assessment is then used to determine that material should be processed as a tier three collection, meaning that it is a complex collection and will likely require high manual effort during processing.

At the same time, our assessment guides decision-making about access. We’ve very loosely labelled the three levels at which we make collections available to researchers as standard, emulation only, and optimal—although it is important to note that we define these levels based not only on what we are currently able to do but also on what we hope to be able to implement in the future. As a result, consideration of these levels is as much about processing in such a way that does not preclude improved future models of access as it is about providing access right now. So, while optimal is perhaps more “optimal” as it currently stands, these are collections that we have deemed good candidates for a more advanced access point once we have the necessary resources in place. We determined, based on our assessment, that the Lucille Clifton collection warranted an “optimal” approach to access.

Rather than automating our decision making, the initial assessment and subsequent assignment of a processing tier and access level provides a structured vocabulary by which recurring project considerations can be discussed, and a comprehensive rubric by which new projects can be prioritized and planned. Once identified, these tiers influence decision making at almost every stage of our born-digital workflows, including how processed collections are made available to our researchers. As we continue to apply this approach, we hope too to better track our work in order to more accurately allocate the time and resources required to process a collection at each tier.

Note: This blog post borrows in part from a forthcoming article, “Flexible Processing and Diverse Collections: A Tiered Approach to Delivering Born Digital Archives,” written in collaboration by Dorothy Waugh, Elizabeth Roke, and Erika Farr, to be published in the journal Archives and Records. The article will offer additional information on how the tiered approach described in this post has been applied in practice at Emory.

Dorothy Waugh is Digital Archivist at the Stuart A. Rose Manuscript, Archives, and Rare Book Library at Emory University, where she is responsible for the management of born-digital manuscript and archival material.

Access to Born-Digital Archives at the Russell Library

By Adriane Hanson

This post is the sixth in a bloggERS series about access to born-digital materials.

Institutional Context
The Richard B. Russell Library for Political Research and Studies at the University of Georgia has been providing access to digital archives for about a year and a half. We needed something that was free, web-based (for broader access), and integrated with our existing workflows for paper (to keep it simpler). We ended up using our finding aids for description, our existing circulation system (Aeon) to track requests, and Google Drive to provide access to files.

Finding Aids
Researchers learn about digital files through our finding aids. If there is a series with related papers, we list the digital files at the end of that series. We are trying to balance the need to keep things together intellectually (rather than having a separate “electronic records” series) without taking on the labor-intensive work of integrating the folder list for papers and for digital files.

Container list from the Davison Papers, showing description of digital files.

Digital files are described in the aggregate, not at the item level. For instance, when describing files from a server, only the first 1-2 levels of folders are included in the finding aid. In addition to being a time-saving measure, it makes the finding aid more usable. Researchers get an overview of what the collection contains rather than an overwhelmingly-long list of filenames. When folder titles are insufficient for description, we will add a scope/content note for the folder and/or link to a directory print of the contents of the folder. For examples, see the Eric Johnson Papers or the Eleanor Smith Papers.

The Workflow
The process for providing access to digital files is summarized below and described in more detail in our access policy. This policy will soon be updated to reflect changes in how we use Google Drive.

  1. The researcher requests digital files from the finding aid, just like they do for paper.
  2. The request is routed to a queue in Aeon that I monitor daily.
  3. After some communication with the researcher, I upload a copy of the files to a Google Drive account and share them with the researcher.
  4. The request is changed to “checked out” in Aeon.
  5. The researcher has two weeks to view the files. After that, I delete the files from Drive and mark the request finished in Aeon.

Google Drive as a Virtual Reading Room
In the first iteration of this process, we used Google Drive like a virtual reading room. Permissions were set to view-only and files could not be downloaded or printed. We started with this strategy to address concerns at our library about properly protecting copyright. It worked well for researchers who needed basic access to files but limited the functionality of some file formats (i.e. spreadsheets were frozen as tables) and did not allow researchers to save search results or integrate what they were finding with copies obtained from other institutions.

Photographs from Davison Papers shared with a patron.

Google Drive as a Delivery System
This year, we developed a policy to allow digital cameras in our reading room. During those conversations, we decided that providing copies of born-digital archival materials for personal research use would be permissible under the same fair-use provision of copyright law that allows the cameras. So now, the researcher signs a form agreeing to abide by copyright law and our policies, and then I provide full access to the files via Google Drive, including allowing downloads.

We are happy with this process, at least for now. Ultimately, I would like a system that can pull from our access copies storage automatically and offer researchers tools for viewing and analysis. But while we are working on that, this workflow lets us provide reasonable access to everything in our holdings.

Adriane Hanson is Digital Curation and Processing Archivist at the Richard B. Russell Library for Political Research and Studies at the University of Georgia, a position she has held for 3 years. She can be reached at ahanson [at] uga [dot] edu.

Agile for Access: Iterative Approaches to Solving Born-Digital Access

By Jessica Meyerson

This post is the fifth in a bloggERS series about access to born-digital materials.

At the 2015 SAA conference in Cleveland, the Agile for Access Hackfest Team focused on creating a collaborative project that introduces agile development principles as a strategy for overcoming obstacles to born-digital access.

To start the discussion, the Born-Digital Access Research Team provided a baseline understanding of agile and its growth in popularity throughout the 1990s. The Manifesto for Agile Software Development, written and published in 2001, emphasizes “individuals and interactions,” “working software” (working solutions), “customer collaboration,” and “responding to change.” In its most abstract and broadly applicable form, agile shares many of the basic tenets of design thinking or design research: empowerment, collaboration, rapid/frequent iterations, and continual planning, in place of one monolithic plan executed from start to finish.

Our Hackfest Team consisted of a mix of archivists from different levels of experience and exposure to agile development principles, so one focus of our discussion was how to communicate what agile is and how to apply it in an archival setting. Erin Faulder (Archivist for Digital Collection at Tufts University) volunteered to serve the group as our fearless Hackfest Team Leader, a role responsible for leading the discussion during the in-session activity and working with research team members to complete the proposal during Phase II. Sarah Bost, (Student Success Archivist at the University of Arkansas) and Amy Wickner (Digital Projects Graduate Assistant at the University of Maryland) graciously volunteered to take notes and record observations, which were later compiled into the first draft of the proposal.

The Agile for Access Hackfest Team collaboration resulted in a project proposal entitled “Why Agile Works in Archives.” The purpose of this project will be to provide a set of resources for archivists to learn about and implement agile in their own institutions, emphasizing rapid iteration to improve digital access solutions and embracing “good enough” over “perfect.” In order to make this toolkit useful for archivists,  this project would highlight real world agile case studies and best practices for working with born-digital archival materials, and include the following deliverables:

  • Agile toolkit:
    • Tool for determining whether agile is a good fit for your project–this could take the form of a checklist for project assessment
    • Use cases and case studies covering a range of professional settings, from large government and/or educational institutions to lone arrangers working without the support of information technology professionals
    • Agile quick-start guide, covering fundamental concepts, guidelines, and FAQs
    • Foundational readings
  • Platform for sharing experiences with implementing agile:
    • Reports on outcomes of agile projects in institutions
    • Remixes of the toolkit for particular audiences or contexts

You’re invited to view and comment on the full project proposal here.

Reflecting on my own participation in Phases I and II of the Born-Digital Access Hackfest, I felt that even though it was challenging to balance Phase II participation against other professional commitments, the Hackfest model proved to be an effective way to incubate collaboration–providing a well-defined structure in which Hackfest team members could explore strategies and exchange ideas.

As Daniel Johnson wrote in his Archivist Bootcamp for Access post, “There is still a lot of work to do.”

We are looking for a project team to develop this Agile for Access proposal. This project team will be responsible for developing/designing the agile toolkit; identifying possible hosts/distribution platforms; documenting audience use cases that may correspond to toolkit modules (agile for administration, agile for processing archivists, etc.); designing a project sustainability plan; locating funding sources; and promoting the project. At this time, we are seeking volunteers for the project team, as well as feedback on all aspects of the proposal. We are in the beginning stages of this project and want it to accurately assess the needs of the community to provide access to born-digital materials. Please feel welcome to send comments, ideas, and questions to the Agile for Access Hackfest Team Leader, Erin Faulder (erin.faulder [at] tufts [dot] edu), and Researcher, Jessica Meyerson (j.meyerson [at] austin [dot] utexas [dot] edu).

Many thanks to Agile for Access Hackfest Team member Martin Gengenbach for his contributions to this post.

Jessica Meyerson is the Digital Archivist at the Dolph Briscoe Center for American History at the University of Texas at Austin, focused on research-in-practice and building community infrastructure to support long-term access of digital material on and off campus. Meyerson currently serves as steering committee member for Texas Archival Reseources Online and co-investigator on the IMLS-funded Software Preservation Network project.