Bits and Baby Steps: RAO Engages with Electronic Collection Material

By Stacey Lavender and Rachael Dreyer

This post is the fourteenth in a bloggERS series about access to born-digital materials.

Officially formed and charged in October 2014, the Society of American Archivists Reference, Access, and Outreach (RAO) Section’s Access to Electronic Records Working Group aimed to evaluate current practices and approaches to providing researchers with access to born-digital and electronic material. There were four initial parts of the working group’s charge:

  1. Conduct initial research to determine on which key focus areas related to reference, access, outreach, and preservation work the working group shall focus its efforts.
  2. Compile a bibliography of key resources, including publications, presentations, and workshops, which explore how archival institutions provide access to born-digital and electronic records. Other organizations active with electronic records will also be included in this resource list.
  3. Conduct a survey of the archival profession regarding current practices and attitudes towards providing access to born-digital and electronic records.
  4. Compile and analyze the survey data in order to identify challenges and opportunities which RAO can address.

The fourth part of the charge is currently underway, and while the data analysis isn’t complete, some big-picture trends have emerged.

While just about every respondent indicated that their institution was providing some access to electronic content (both digitized and born-digital), 89.5% of respondents indicated that at least some of their electronic materials are currently inaccessible to patrons.

A significant portion of this inaccessibility can be attributed to the same reasons that most institutions have some inaccessible analog materials. 18% of respondents reported a lack of time and staff resources as the cause of their electronic background, and 16% cited donor and/or legal restrictions as a contributing factor. However, the most common response by far (62%) came from those having trouble providing access to specific formats of materials. This problem of formats (dealing with obsolete media, obsolete hardware, and the threat of media degradation) as a prevalent and ongoing problem in providing access to electronic records was perhaps the strongest trend revealed in the survey.

Another trend that the survey highlighted was the simple fact that respondents are on the lookout for resources and education opportunities related to access to electronic records, and they’re open to using many different options.

Large percentages of respondents indicated an interest in participating in workshops (48.8%), viewing web resources (40.7%), standards/guidelines (34.9%), and professional assistance from archivists or IT professionals (43%). So it’s clear that the interest in educational opportunities and resources is there, we just need to figure out how best to meet that need.

It is also worth noting that the desire to develop partnerships with the IT professionals in our institutions was something that came up more than once in the survey.

In addition to the 43% of respondents mentioned above that were interested in professional assistance from IT professionals, about 84% cited lack of IT support as an obstacle of some concern when it came to providing access to their electronic materials.

We’ll delve even further into these trends (and some others!) in our survey report, which we plan to have out in the next couple of months. Overall it was very heartening to see that for the most part we’re dealing with similar problems, which means we can tackle them together!

Since so many of the concerns around access to born-digital materials focus on the technological constraints and requirements, end-user access has been relegated to a lower rung on the ladder. But here’s the thing: to get to the higher rungs on the ladder, you have to have a stable base with those lower rungs! So, increasingly, the focus has shifted to the end-users’ needs, as well as the need for appropriate levels of arrangement and description. Public-services archivists are keenly aware of the back-end processes—good arrangement and description is essential to assist researchers in navigating those records.

The working group hopes to help RAO archivists, as well as anyone else in the profession, to take concrete steps toward providing research access to born-digital and electronic records at their institutions. If you have ideas or projects that you would like to see the RAO working group take on, we would love to hear from you!

Rachael Dreyer is currently Head of Research Services for Special Collections at the Pennsylvania State University. She was formerly a reference archivist at the American Heritage Center at the University of Wyoming. She’s very interested in balancing the needs of researchers with the technological “challenges” that born-digital collections present. She is Co-Chair of the RAO Access to Electronic Records Working Group.

Stacey Lavender recently completed a two-year stint as the Houston Arts and History Archives Fellow at the University of Houston. She’s most interested in working with born-digital materials, finding new and technologically innovative ways to provide access, and participating in public outreach initiatives to promote collections. She is Co-Chair of the RAO Access to Electronic Records Working Group.

Born-Digital and in the Virtual Reading Room

By Christine Kim

This post is the thirteenth in a bloggERS series about access to born-digital materials.

At the University of California, Irvine (UCI), we wanted to provide access to the rapidly growing volume of born-digital records but couldn’t really justify the time it would take to look through every single item to evaluate its relevance. This balance of providing access to researchers while simultaneously evaluating the digital content is a huge challenge we’re faced with as we justify collecting and preserving with promoting access to information.

What did we come up with? The Virtual Reading Room (VRR).

In order to for me to properly introduce the VRR and explain how it truly fits into our growing suite of digital resources, we need to back up a few paces. Let’s start from our digital collection page, UCIspace @ the Libraries, which we like to call UCIspace for short. UCIspace is where our digital collections currently live (though we have been working on a big move with the California Digital Library to transition to Calisphere) and includes a variety of materials, such as digitally reformatted collections (images, audio-visual files, oral histories, pdf documents, etc.), as well as born-digital collections. UCIspace is powered by DSpace, a highly customizable open source software package that preserves and enables open access to all sorts of digital files and is administered by the super amazing and incredibly talented IT folks at the UCI Libraries.

We currently have 13 collections on UCIspace, of which six collections include born-digital materials. The six collections may include both digitally reformatted as well as born-digital items, which all co-habitate in peace. They are all now digital and treated with equal amounts of care.

These six collections include born digital materials. Total views from date of creation to February 1, 2016.
These six collections include born-digital materials. Total views from date of creation to February 1, 2016.

 

Okay, so we have digitally reformatted and born-digital materials available through UCIspace. Then what’s the VRR all about? Well, our VRR is a virtual space that resides within UCIspace in order to provide an extra layer of security for certain items or sub-collections.

We were acquiring volumes and terabytes of hard drives and digital files and raw footage, and our collecting pace was not going to slow down to let us catch up with identifying its digital content. We learned from our physical backlog that if we waited, this material would never be available for access. But we also knew that we couldn’t just make this stuff available. Could we apply MPLP practice to born-digital materials?

That’s where the VRR comes in. We thought, hey, what if we have people agree to certain terms and conditions, and then put these items behind a login so that folks can have access once they agree to some conditions? We like to dream big. And so with that thought, we investigated with our IT unit to see what the possibilities and limitations were. As it turns out, they like to turn dreams into reality.

Any item within the VRR has a certain level of privacy–you can’t just go into the collection page and see the image. The item is “locked” and resides behind a login screen.

Items from the Mark Poster born digital files, 1985-2009. Top item is unrestricted. Bottom item indicates it is accessible through the Virtual Reading Room.
Items from the Mark Poster born-digital files, 1985-2009. Top item is unrestricted. Bottom item indicates it is accessible through the Virtual Reading Room.

Selecting an item within the VRR will prompt a login screen. In order to gain access, both remote researchers and those using reading room public workstations must request access via the VRR Registration Form.

Application for the Virtual Reading Room in UCISpace @ the Libraries.
Application for the Virtual Reading Room in UCISpace @ the Libraries.

Submitting this form provides the user with a login and password, but the most important factor is that the researcher agrees to certain terms and conditions as part of requesting access to VRR materials. These terms are indicated within the “Rules of Use.”

A check-box counts as a digital signature!
A check-box counts as a digital signature!

The “Rules of Use” lists a handful of conditions of use, including the statement: “All digital content in UCIspace @ the Libraries is made publicly available for use in research, teaching, and private study.” The complete “Rules of Use” document is available for all to read.

What type of collections are available in the VRR? For now, we have two collections with items in the VRR, the Mark Poster Born Digital files and the Richard Rorty Born Digital Files. One key aspect is that the entirety of the collection does not have to be in the VRR–DSpace allows us to create a collection, and manage sub-collections so that only selected portions of the collections are behind a login screen.

What are the results of the VRR? Well, here are some use statistics.

From date of creation to February 1, 2016.
From date of creation to February 1, 2016.

How did we make this happen? Having awesome teammates definitely helps. But along those lines, it also helps to have a clear understanding of how a great team operates. Communication, dreaming big, and having a mutual goal of providing excellent public services.

Please reach out to Christine Kim (christik [at] uci [dot] edu) if you have any questions about the Virtual Reading Room, UCIspace, or anything else about UCI Libraries Special Collections & Archives.

Kim_BornDigiandintheVirtualReadingRoom_ERSblog_6Christine Kim is the Public Services Assistant at the UC Irvine Libraries, Special Collections & Archives. She is responsible for connecting researchers with archival and special collections materials, and delights in sharing the resources uniquely available at UC Irvine. She holds an MLIS from San Jose State University and a BA in both History and Film & Media Studies from UC Irvine.

Ensuring Born-Digital Access at the Seeley G. Mudd Manuscript Library

By Rossy Mendez

This post is the twelfth in a bloggERS series about access to born-digital materials.

By exploring the contents of a drive, archivists can obtain information about the contents of folders, the size of files, and details of creation. They can determine what hierarchies are in place by observing a file’s structure and nesting. Despite the richness of metadata available for digital records, archival description remains one of the biggest challenges of processing born-digital collections.

One of the ways that the Seeley G. Mudd Manuscript Library of Princeton University has maximized EAD description has been to explore the definitions of EAD elements. Before I address how the Mudd Manuscript Library is doing this, I would like to provide a bit of a background of the library.

The Seeley G. Mudd Manuscript Library is part of the Rare Books and Special Collections Division at Princeton University. The Library houses and provides access to the university archives and public policy collections. In addition to 30,000+ linear feet of physical records, there are several collections of born-digital records that include student newsletters, the records of a former dean of the graduate school, and more recently, student activism.

Nearly all the staff at Mudd participate in the reference rotation. The benefit to this all-hands-on-deck approach is that the technical services team gains greater insight into how patrons use the collections and finding aids.

At Mudd, patrons can access born-digital collections remotely through the finding aids website.  Individuals can filter out digital material by selecting the “Available online” option from the faceted search menu or by accessing individual folders or items within a collection. By clicking “View Content,” patrons are linked directly to pdfs, images, and even videos. For restricted records, patrons with the appropriate credentials can use their username and password to access files.

Screenshot of Finding Aid for Tiger Hockey Email Newsletters, Princeton University Archives
Screenshot of Finding Aid for Tiger Hockey Email Newsletters, Princeton University Archives

Archivists at Mudd believe that description is an important part of providing access.

Three principles should drive the creation of born-digital description:

  • A user should be able to know what and how much born-digital content exists.
  • A user should be able to know where the digital content lives within the finding aid and have easy access to that content.
  • Moreover, a user should be able to deduct the context of record creation.

One of the most significant changes Mudd archivists made to local EAD description was to the <extent> element. Initially, archivists conceived of the <extent> field as the physical space files occupied in a drive: records were indicated in measures of files and bytes. However, a significant number of patrons do not understand this information. Replacing bytes with “Digital folders” and “Digital files” as a unit of measurement allowed patrons to learn about hierarchies and the arrangement of collections. Furthermore, the inclusion of the word “digital” provided a further indication of the nature of the material.

In addition to <extent>, the <unittitle> element plays an important role in differentiating between digitized and born-digital content. Since the access path is the same for all content, the word “digital” in the title statements at the series and subseries level provides a quick way to tell which sections have born-digital content.

Lastly, the <phystech> element ensures that patrons have as much information about the creation of the digital record as possible (including, for example, the type of computer used) and can address potential compatibility issues.

As archivists we should aim to provide complete access to our records. Good archival description releases the patron’s burden to investigate technical aspects and allows them to focus on what is most important: discovering information and sharing it with others.

Mendez_EnsuringBDAccessatMudd_ERSblog_2

Rossy Mendez was formerly a Public Services Project Archivist at the Seeley G. Mudd Manuscript Library. Currently, she is a Project Archivist at the Solomon R. Guggenheim Museum, where she is working on processing the museum’s audio-visual collections and collaborating in the creation of a museum-wide metadata schema and the implementation of Archives Space.

A Multi-Faceted Challenge: Breakouts on Access at the BitCurator User Forum

By Matthew Farrell

This post is the eleventh in a bloggERS series about access to born-digital materials.

On January 15, 2016, the BitCurator Consortium (BCC) held its second annual User Forum at the Louis Round Wilson Library at UNC-Chapel Hill. Representatives from BCC member institutions joined non-member archivists and librarians engaged in digital archival work for a day of discussion regarding born-digital archives and the application of digital forensics to archival practice. Panel descriptions and links to public group notes can be found on the BCC website. A longer recap and reflection of the day is forthcoming on the BCC website.

Participants in the User Forum came from varied contexts–some hold digital archivist job titles, others are researchers active in the development of digital forensics tools, and still others are professionals from curatorial and administrative backgrounds. A major goal of the program committee was to make the User Forum engaging for participants from any background. A breakout session titled “Where Should Access Happen?” leveraged the variety in participants particularly well. The participants broke into four groups, each discussing access to born-digital materials through a different lens: (1) the environment and/or tools that should be available to a researcher in the reading room, (2) what sort of staff resources should be considered, (3) where and how should access be facilitated, and (4) how much and what kinds of metadata should be made available to users prior to and during a research visit.

The group notes are here for posterity. What follows are a few themes that were discussed commonly across multiple groups.

The group discussing the environment and tools that could be available to researchers discussed the amount of control exerted over both collection materials and the access environment. For reading room access, the default discussed tends to be strict controls over materials due to the nature of the content. Strict control as default is shown in another group’s discussion of each participant’s current reading room environment: representatives of Duke University, Penn State University, North Carolina State University, Washington, and the University of Virginia all reported the use of off-network access terminals with controls limiting application access and restricting copying and/or printing. There appears to be an all-or-nothing approach to access. If a set of materials has some necessity for access controls (restricted or sensitive info, rights issues, etc.), access is only available in a heavily mediated environment. On the flip side are those materials that have no restrictions at all, which can (and sometimes are) exposed online.

Further, determining what sort of environment and what toolset to provide researchers in a reading room is hindered by the small number of researchers currently requesting electronic collection materials. The low number of requests is almost certainly affected by the obscurity of those materials in finding aids and catalog records. Without a history of requests for access, it is impossible to determine with any certainty what researchers want to do with collection materials in a reading room environment. Three groups discussed exposing metadata about digital objects via finding aids as principal to offering access in any meaningful way. The group discussing metadata and processing touched on the balance between processing a set of digital objects at a minimum level to make them available and offering the maximum possible amount of metadata about those materials. Discussion generally accepted that over-processing of materials could happen. Though it is unclear what “over-processing” means for a given collection, processing archivists assuming what researchers will want to see and do with materials is a potential indicator.

Over-processing materials, coupled with heavy restrictions to access environments, leaves institutions at risk to providing researchers with glorified e-readers. Such access does little to support digital humanities research. The group discussing staffing for access talked about an example of Emory using Voyant and a poet’s collection of electronic text in a series of instruction sessions (though the project was not officially part of the program at the User Forum, I’m pretty sure Emory archivists would be happy to provide more information). Another group discussed basic tools that could be useful for many types of materials, including versatile viewing applications such as Quickview Plus as well as hex editors. Providing a hex editor to researchers was supported by two independent anecdotes. More advanced tools will likely vary depending on collections, though allowing a researcher to run file identification software or have access to the metadata output of such software is potentially of use. For these scenarios, should repositories be responsible for vetting additional tools and applications? One participant suggested researchers submit code or tools to run across a data set in advance of their visit for inspection by repository staff.

While this is only a small window on the User Forum, I think there are at least a couple of areas discussed above that warrant further immediate work. I’m curious to get a picture of what metadata, exactly, BCC member institutions are exposing in their finding aids, and whether there is room for creating a common set of guidelines for including forensic metadata in descriptive systems. Hopefully more information will come soon on that front. There is a wealth of engaging thoughts in the notes for this session as well as the rest of the User Forum’s group notes, and the BCC website. Any questions about the BitCurator Consortium can be sent to me or Sam Meister.

Matthew Farrell is digital archivist at the David M. Rubenstein Rare Book & Manuscript Library at Duke University and member of the program committee for the 2016 BitCurator User Forum.

Making the Case: Advocacy for Born-Digital Access

By Sam Meister, with help from the Advocacy for Access Hackfest Team

This post is the tenth in a bloggERS series about access to born-digital materials.

"Archery," courtesy of Bryn Mawr College Special Collections.
“Archery,” courtesy of Bryn Mawr College Special Collections.

The Advocacy for Access Hackfest Team members used the preliminary report data from the Born-Digital Access in Archival Repositories study and their own professional experiences in order to develop a project proposal that would provide direct and actionable strategies for making the case to provide born-digital access.

The Team used the preliminary report to find the top three gaps in the categories of Business Analysis, Resource Allocation, and Advocacy from the survey and interview data:

A. Need for additional time and people dedicated to born-digital work
B. Need for more IT resources, staffing, and technical expertise
C. Need for tactics for convincing administrators and funders to allocate greater resources to born-digital archives programs, including long-term sustainability funding

Sparked by these themes, our discussion emphasized the strong interconnection among points A, B, and C. Our group concluded that if one is not able to gain stakeholder buy-in from the administrative or institutional level (point C), points A and B would be very difficult to achieve. The Team concluded that digital stewards need a range of tactics in order to provide administrators with salient information to make the case.

We agreed that a toolkit for an advocacy program, developed through additional survey data and user-submitted strategies, would provide:

  • An environmental scan of tactics, both within the field and outside of it, to advocate for access needs, which would cover:
    • Successes and failures
    • Starting the conversation
    • Example elevator speeches
    • Data predictions for the future
  • Discrete information about resources, which would include data about:
    • Storage setup and costs
    • Tools (e.g. workstation) setup and costs
    • Personnel
    • Platforms
  • The ability to match what an institution needs based on location, size, etc.
  • User stories (a direct tie-in to the Understanding Users Hackfest Team!)
  • Benefits of access, including:
    • Donor relations
    • Staying relevant as digital stewards
  • Literature about advocacy

The toolkit would be an online, searchable database that accepts user submissions, has space for online discussions, and is iteratively updated. The project team would include six people to divide and conquer survey design, analyze data, set up the database, seek funding resources, and administer the database from prototype to production. You can review the full proposal here.

We welcome any feedback, comments, or questions in order to provide a comprehensive project scope.

Sam Meister is the Preservation Communities Manager, working with the MetaArchive Cooperative and BitCurator Consortium communities. Previously, he worked as Digital Archivist and Assistant Professor at the University of Montana. Sam holds a Master of Library and Information Science degree from San Jose State University and a B.A. in Visual Arts from the University of California San Diego. Sam is also an Instructor in the Library of Congress Digital Preservation Education and Outreach Program. He can be reached at sam [at] educopia [dot] org.