Integrating Environmental Sustainability into Policies and Workflows

by Keith Pendergrass

This is the first post in the BloggERS Another Kind of Glacier series.


Background and Challenges

My efforts to integrate environmental sustainability and digital preservation in my organization—Baker Library Special Collections at Harvard Business School—began several years ago when we were discussing the long-term preservation of forensic disk images in our collections. We came to the conclusion that keeping forensic images instead of (or in addition to) the final preservation file set can have ethical, privacy, and environmental issues. We decided that we would preserve forensic images only in use cases where there was a strong need to do so, such as a legal mandate in our records management program. I talked about our process and results at the BitCurator Users Forum 2017.

From this presentation grew a collaboration with three colleagues who heard me speak that day: Walker Sampson, Tessa Walsh, and Laura Alagna. Together, we reframed my initial inquiry to focus on environmental sustainability and enlarged the scope to include all digital preservation practices and the standards that guide them. The result was our recent article and workshop protocol.

During this time, I began aligning our digital archives work at Baker Library with this research as well as our organization-wide sustainability goals. My early efforts mainly took the form of the stopgap measures that we suggest in our article: turning off machines when not in use; scheduling tasks for off-peak network and electricity grid periods; and purchasing renewable energy certificates that promote additionality, which is done for us by Harvard University as part of its sustainability goals. As these were either unilateral decisions or were being done for me, they were straightforward and quick to implement.

To make more significant environmental gains along the lines of the paradigm shift we propose in our article, however, requires greater change. This, in turn, requires more buy-in and collaboration within and across departments, which often slows the process. In the face of immediate needs and other constraints, it can be easy for decision makers to justify deprioritizing the work required to integrate environmental sustainability into standard practices. With the urgency of the climate and other environmental crises, this can be quite frustrating. However, with repeated effort and clear reasoning, you can make progress on these larger sustainability changes. I found success most often followed continual reiteration of why I wanted to change policy, procedure, or standard practice, with a focus on how the changes would better align our work and department with organizational sustainability goals. Another key argument was showing how our efforts for environmental sustainability would also result in financial and staffing sustainability.

Below, I share examples of the work we have done at Baker Library Special Collections to include environmental sustainability in some of our policies and workflows. While the details may be specific to our context, the principles are widely applicable: integrate sustainability into your policies so that you have a strong foundation for including environmental concerns in your decision making; and start your efforts with appraisal as it can have the most impact for the time that you put in.

Policies

The first policy in which we integrated environmental sustainability was our technology change management policy, which controls our decision making around the hardware and software we use in our digital archives workflows. The first item we added to the policy was that we must dispose of all hardware following environmental standards for electronic waste and, for items other than hard drives, that we must donate them for reuse whenever possible. The second item involved more collaboration with our IT department, which controls computer refresh cycles, so that we could move away from the standard five-year replacement timeframe for desktop computers. The workstations that we use to capture, appraise, and process digital materials are designed for long service lives, heavy and sustained workloads, and easy component change out. We made our case to IT—as noted above, this was an instance where the complementarity of environmental and financial sustainability was key—and received an exemption for our workstations, which we wrote into our policy to ensure that it becomes standard practice.

We can now keep the workstations as long as they remain serviceable and work with IT to swap out components as they fail or need upgrading. For example, we replaced our current workstations’ six-year-old spinning disk drives with solid state drives when we updated from Windows 7 to Windows 10, improving performance while maintaining compliance with IT’s security requirements. Making changes like this allows us to move from the standard five-year to an expected ten-year service life for these workstations (they are currently at 7.5 years). While the policy change and subsequent maintenance actions are small, they add up over time to provide substantial reductions in the full life-cycle environmental and financial costs of our hardware.

We also integrated environmental sustainability into our new acquisition policy. The policy outlines the conditions and terms of several areas that affect the acquisition of materials in any format: appraisal, agreements, transfer, accessioning, and documentation. For appraisal, we document the value and costs of a potential acquisition, but previously had been fairly narrow in our definition of costs. With the new policy, we broadened the costs that were in scope for our acquisition decisions and as part of this included environmental costs. While only a minor point in the policy, it allows us to determine environmental costs in our archival and technical appraisals, and then take those costs into account when making an acquisition decision. Our next step is to figure out how best to measure or estimate environmental impacts for consistency across potential acquisitions. I am hopeful that explicitly integrating environmental sustainability into our first decision point—whether to acquire a collection—will make it easier to include sustainability in other decision points throughout the collection’s life cycle.

Workflows

In a parallel track, we have been integrating environmental sustainability into our workflows, focusing on the appraisal of born-digital and audiovisual materials. This is a direct result of the research article noted above, in which we argue that focusing on selective appraisal can be the most consequential action because it affects the quantity of digital materials that an organization stewards for the remainder of those materials’ life cycle and provides an opportunity to assign levels of preservation commitment. While conducting in-depth appraisal prior to physical or digital transfer is ideal, it is not always practical, so we altered our workflows to increase the opportunities for appraisal after transfer.

For born-digital materials, we added an appraisal point during the initial collection inventory, screening out storage media whose contents are wholly outside of our collecting policy. We then decide on a capture method based on the type of media: we create disk images of smaller-capacity media but often package the contents of larger-capacity media using the bagit specification (unless we have a use case that requires a forensic image) to reduce the storage capacity needed for the collection and to avoid the ethical and privacy issues previously mentioned. When we do not have control of the storage media—for network attached storage, cloud storage, etc.—we make every attempt to engage with donors and departments to conduct in-depth appraisal prior to capture, streamlining the remaining appraisal decision points.

After capture, we conduct another round of appraisal now that we can more easily view and analyze the digital materials across the collection. This tends to be a higher-level appraisal during which we make decisions about entire disk images or bagit bags, or large groupings within them. Finally (for now), we conduct our most granular and selective appraisal during archival processing when processing archivists, curators, and I work together to determine what materials should be part of the collection’s preservation file set. As our digital archives program is still young, we have not yet explored re-appraisal at further points of the life cycle such as access, file migration, or storage refresh.

For audiovisual materials, we follow a similar approach as we do for born-digital materials. We set up an audiovisual viewing station with equipment for reviewing audiocassettes, microcassettes, VHS and multiple Beta-formatted video tapes, multiple film formats, and optical discs. We first appraise the media items based on labels and collection context, and with the viewing station can now make a more informed appraisal decision before prioritizing for digitization. After digitization, we appraise again, making decisions on retention, levels of preservation commitment, and access methods.

While implementing multiple points of selective appraisal throughout workflows is more labor intensive than simply conducting an initial appraisal, several arguments moved us to take this approach: it is a one-time labor cost that helps us reduce on-going storage and maintenance costs; it allows us to target our resources to those materials that have the most value for our community; it decreases the burden of reappraisal and other information maintenance work that we are placing on future archivists; and, not least, it reduces the on-going environmental impact of our work.


Keith Pendergrass is the digital archivist for Baker Library Special Collections at Harvard Business School, where he develops and oversees workflows for born-digital materials. His research and general interests include integration of sustainability principles into digital archives standard practice, systems thinking, energy efficiency, and clean energy and transportation. He holds an MSLIS from Simmons College and a BA from Amherst College.

Digital Object Modeling

Submitted by Erin Faulder

The Division of Rare and Manuscript Collections (RMC) at Cornell University Library (CUL) was a leader in early digitization endeavors. However, infrastructure to support coordination between archival description and digital material has not kept pace. In 2019, RMC implemented ArchivesSpace and I turned my attention to developing practice to connect archival description and digital object management.

CUL has distributed systems for displaying and preserving digitized content, and RMC has historically refrained from describing and linking to digitized content within EAD. As a result, I’ve taken this opportunity to thoughtfully engage the array of systems that we use in order to model digital objects in ASpace to best take advantage of future technological developments.

I could find almost no information about how other institutions represent their digital content in ASpace. Perhaps other institutions had <dao> elements from EAD that were imported into ASpace or other data structured from legacy systems, and have not critically evaluated, documented, and shared their practice. Further, the ASpace documentation itself makes no recommendations about how to represent digital content in the digital object module, and it’s unclear how widely or consistently the community is using this functionality. 

Given the distributed systems at CUL that store RMC’s digital content, ASpace is the system of record for archival description and basic descriptive information for digital content. It should be the hub that connects physical material to digital surrogates in both delivery environments and preservation systems. To appropriately evaluate the possible representations, I set several goals for our model. The model must support our ability to:

  • batch-create digital objects in ASpace based on systems and rules. No human data entry of digital objects should be required. 
  • represent both digitized and born digital content with clear indications which is which. 
  • bulk update URLs as access systems change. (Preservation systems have permanent identifiers that require less metadata maintenance.)
  • maintain and represent machine-actionable contextual relationships between
    • physical items and digital surrogates;
    • archival collections and digital material that lives in systems that are largely unaware of archival arrangement and description;
    • preservation object in one system and delivery object(s) in another system.
  • enable users, curators, and archivists to answer:
    • Is this thing born digital? 
    • Has this thing been digitized and where is the surrogate?
    • Where do I go to find the version (Preservation vs. Delivery) I want?
    • Where is all of the digital material for this collection?
    • How much of a collection has been digitized?

ASpace is not the system of record for technical, administrative (other than collection-level), or detailed descriptive metadata about our digital objects. Nor does ASpace need to understand how objects are further modeled within delivery or preservation systems. The systems that store the material handle those functions. Setting clear functional boundaries was essential to determining which option would meet my established needs as I balanced flexibility for unimagined future needs and current limited resources to create the digital object records at a large scale.

Given this set of requirements, I drafted four possible modeling scenarios that are represented visually, along with a metadata profile for the digital objects:

I then talked through several real-world examples of digitized material (ex. A/V, single-page image/text, multi-page image/text) for each of these scenarios with CUL colleagues from metadata and digital lifecycle services. Their fresh, non-archivist questions helped clarify my thinking. 

  • Scenario 1: 
  • Pros: 
    • Simple structure.
  • Cons:
    • RMC’s local ID (used to identify media objects in a human-readable form) only exists on the archival object in the component ID field.
    • Preservation and delivery objects only recognize a relationship with each other through the linked archival object. This is a potential break point if the links aren’t established or maintained accurately.
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or by parsing the identifier.
  • Scenario 2: 
  • Pros: 
    • Preservation and delivery objects linked through a single object making the relationship between preservation and delivery object clear.
    • Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
    • Local ID easily attached to the digital object.
  • Cons:
    • No place to store delivery system Identifier if using file version URI for the URL.
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note or parsing the URI structure.
    • Challenging to ensure that the identifier is unique across ASpace given legacy practices of assigning local identifiers.
  • Scenario 3:
  • Pros:
    • Preservation and delivery versions as digital object components linked through a single object make the relationship between preservation and delivery object clear.
    • Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
    • Local ID easily attached to the digital object.
  • Cons:
    • Creating a human-meaningful label or title for a digital object component is time consuming.
    • Challenging to ensure identifiers are unique across ASpace given legacy practices of assigning local identifiers.
  • Scenario 4:
  • Pros:
    • High level of granularity in parsing data to objects, potentially providing extensible functionality in the future.
  • Cons:
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or parsing identifier.
    • Time consuming to create a human-meaningful label or title for the digital object component, particularly for born-digital material.
    • Complex hierarchy that may be more trouble to navigate in an automated fashion with no significant benefit.

Following several conversations exploring the pros, cons, and non-archival interpretations of these representations, I ultimately decided to use scenario 1. It seemed to represent the digital objects in a way that was simplest to batch-create digital objects, once explained to technologists it was most intuitive, and it hacks the ASpace fields from their presumed use the least. 

I made two changes to the scenario to address some of the feedback raised by CUL staff. First, there will be no file-level information in the preservation package objects since that is managed well in the preservation system already and there’s no direct linking into that system. Identifiers stored in ASpace could allow us to add the information later if we find a need for it. Second in order to facilitate identifying whether an object was a preservation or delivery object, I added a user-defined controlled vocabulary field for either “Preservation” or “Delivery” to facilitate machine-actionable identification of object type. Additionally, in order to help users in the ASpace interface identify which record is which when the digital objects titles are identical, I’ll append the title with either [Preservation] or [Delivery]. 

The primary limitation of this model is that there is no way to directly define a relationship between the delivery object and preservation object. If the link between digital object(s) and archival object is broken or incorrect, there will be limited options for restoring contextual understanding of content. This lack of direct referencing means that when a patron requests a high resolution version of an object they found online an archivist must search for the delivery identifier in ASpace, find the digital object representing the delivery object, navigate to the linked archival object, and then to the linked preservation object in order to request retrieval from preservation storage. This is a clunky go-up-to-go-down mechanism that I hope to find a solution for eventually. 

Choosing scenario 1 also means enforcing that digital objects are packaged and managed at the level of archival description. We’ve been moving this direction for a while, but description for existing digitized material described at a level lower than existing archival description must be added to ASpace in order to add and link the digital objects. But that is another blog post entirely.

Erin Faulder, Assistant Director for Digital Strategies for Division of Rare and Manuscript Collections

Recap: Islandora/Fedora Camp, Arizona State University, February 24-26, 2020

At the end of February, I was thrilled to be able to travel to Tempe, Arizona to attend the Islandora and Fedora Camp hosted by Arizona State University. At the Tri-College Consortium, we’re currently working on a migration from ContentDM and DSpace to Islandora 7, with a planned additional migration to Islandora 8 in 1-2 years. With the current state of the world and limited access to on-site resources, understanding and improving our digital collections platforms has become more important than ever.

Notably, this was the first Islandora/Fedora camp that presented a single combined track for both developers and collection managers. Personally, I felt that this new format was a major strength of the camp; it was valuable to be able to interface with developers and committers of the Islandora software as well as colleagues from other implementing institutions who manage digital collections. It was also great to hear stories about how someone got involved as an Islandora committer, which provided some inspirations for viable paths to contributing to the community, and successes and failures from other users’ migrations and installations.

Camp sessions were split between educational overviews, presentations from users, and hands-on tutorials. Tutorials included basic content management in Drupal 8, core functions of Fedora, and bulk ingest processes, among others. Tutorial-givers included Melissa Anez (Islandora Foundation) and David Wilcox (Lyrasis), Bethany Seeger (Johns Hopkins), Daniel Lamb (Islandora Foundation), and Seth Shaw (UNLV). 

Punctuating our full days of learning were discussions amongst implementers from many different types of institutions. I felt amongst the general attendees of the camp that the dominating concern for implementers is migrating from Islandora 7 to Islandora 8. While a number of institutions have forged ahead with this migration, many institutions are waiting and watching for the tools and documentation to smooth out the process.

Another topic of conversation warranting further reflection is how institutions are integrating Islandora and Fedora into larger digital preservation strategies and practices. I learned from Islandora staff that there used to be a working group for digital preservation, but this has mostly fallen by the wayside. If you’re interested in starting that back up, feel free to contact the Islandora staff to learn more about the process!


Emily Higgs is the Digital Archivist for the Friends Historical Library at Swarthmore College. She is the Assistant Team Leader for bloggERS, the blog for SAA’S Electronic Records Section.

Dispatches from a Distance: My Mom, Project Archivist

This is the seventh of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be). Dispatches should be between 200-500 words and can be submitted here.


Katrina Wood

As a part-time government employee, I was left wondering if I could continue historical research and archives inventory tasks once a stay-at-home order was a reality. When that came to pass in North Carolina, I joined the ranks of cultural heritage specialists who were sheltering in place to avoid the spread of COVID-19.

Meanwhile in Georgia, my Mom slowly altered her public and other activities. She told me that she had to go out to buy a television since one of her aging devices had lost picture by late March. Then she limited herself to going to the plant nursery and the grocery store. My mother is indeed among the vulnerable population, but she hasn’t spent a moment longing for the life she’s had to put aside. Instead, she continued to clean out the attic.

Both of my parents had long careers in academia, and like me, they felt the need to keep several years’ worth of papers, binders, notes, and memorabilia. I’m marveling at my Mom’s archivist tendencies—she set up a sorting station in the garage and has inspired my Dad to work through decades of his own papers. I’ve received texts with pictures that harken back to memories made before I was born. The most intriguing attic project in my view is the collection of Time magazines, however. 

Gwen Wood has pared down the ‘attic archives’ in stages, seeming to have begun with my school papers. I’m not sure I could speak to a processing schedule, but she has since employed a neighborhood teen to help comb through the magazines. Tokumo normally assists Gwen with yard work and like me, he is of a quiet sort. I asked if I could write about their (still ongoing) experience since one of the issues they’ve set aside is an issue from April 25, 1983—a somber Senator Claude Pepper graces the cover.

I worked on a collection at the Claude Pepper Library and Museum at Florida State University and will likely reach out to Special Collections to see if they would like the issue. Further connections to existing collections might reveal themselves, yet I’m all too aware of the scenario in which donors are overconfident that their back issues will fulfill each prong of an archive’s collection policy. The Time issues appear to begin in 1968 and neither Gwen nor Tokumo are sure when the collection ends. I heard about this sorting in early April, about the time that my Mom switched to solo yard work and attic archives with Tokumo when he wasn’t going to school online. 

Included above is a photo of the space, cheerfully adorned with a string of lights and series of boxes and cartons. And I’m aware that this archive may too be shuttered as we all enter into the scorching summer months. Until then, my parents’ attic issues of Time and Newsweek will see their first finding aids and the light of day after a prolonged retirement.