Dispatches from a Distance: Work/Work Balance

by Marcella Huggard

This post is part of Dispatches from a Distance, a series of short posts o provide a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. Now that so many of us are returning to full- or part-time on-site work, we’d like to extend this series to include reflections on reopening, returning to work, and other anxieties facing the profession due to COVID-19. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas you’d like to share with other readers of the Electronic Records Section blog. Dispatches should be between 200-500 words and can be submitted here.


My special collections and archives library has started the reopening process, in preparation for the fall semester.  We’re not open to the public yet but expect we will be in a limited fashion for the fall, and in the meantime us staff in processing and conservation are coming into the building regularly to get back to working with the collections.

Transitioning to working strictly from home was one set of processes—physical, emotional, and mental. Transitioning to a hybrid situation is another set of processes. My staff are working approximately 50% in the office, 50% at home.  This means getting back to processing projects they haven’t really looked at since March, and it means continuing data cleanup projects they started in March, or starting new data cleanup projects from home. It means possibly inconsistent schedules depending on when the building is open (for some, this is good—variety is the spice of life!—for others, routine is essential and this is a disruption). It means adjusting to long stretches wearing a mask and getting sweaty extra quickly when schlepping boxes or archival supplies around. It means still not seeing some co-workers in person as we continue to work split shifts to lower the numbers of people in our building.

I’m taking our university administration’s direction to work from home as much as possible seriously, and I find that a lot of my regular work can be done remotely. Reviewing finding aids? Check. Ongoing data cleanup projects? Check. Research involving materials I’ve already retrieved from other archives and from electronically available resources? Check. Meetings with colleagues to plan projects and determine what we’ll do this fall?  Check. Professional reading, conferences, and workshops? Check. Data entry for processing projects? Check. This means extra disruption, though—“I’ll be able to get a full 4-hour shift in processing collections tomorrow afternoon,” I think happily to myself, until somebody schedules a meeting smack in the middle of what would have been that shift, and I’m adjusting yet again.

The guiding principles for this pandemic has been adaptability and flexibility, and I don’t see that changing anytime soon.

Estimating Energy Use for Digital Preservation, Part II

by Bethany Scott

This post is part of our BloggERS Another Kind of Glacier series. Part I was posted last week.


Conclusions

While the findings of the carbon footprint analysis are predicated on our institutional context and practices, and therefore may be difficult to directly extrapolate to other organizations’ preservation programs, there are several actionable steps and recommendations that sustainability-minded digital preservationists can implement right away. Getting in touch with any campus sustainability officers and investigating environmental sustainability efforts currently underway can provide enlightening information – for instance, you may discover that a portion of the campus energy grid is already renewable-powered, or that your institution is purchasing renewable energy credits (RECs). In my case, I was previously not aware that UH’s Office of Sustainability has published an improvement plan outlining its sustainability goals, including a 10% total campus waste reduction, a 15% campus water use reduction, and a 35% reduction in energy expenditures for campus buildings – all of which will require institutional support from the highest level of UH administration as well as partners among students, faculty, and staff across campus. I am proud to consider myself a partner in UH campus sustainability and look forward to promoting awareness of and advocating for our sustainability goals in the future.

As Keith Pendergrass highlighted in the first post of this series, there are other methods by which digital preservation practitioners can reduce their power draw and carbon footprint, thereby increasing the sustainability of their digital preservation programs – from turning off machines when not use or scheduling resource-intensive tasks for off-peak times, to making broader policy changes that incorporate sustainability principles and practices.

At UHL, one such policy change I would like to implement is a tiered approach to file format selection, through which we match the file formats and resolution of files created to the scale and scope of the project, the informational and research value of the content, the discovery and access needs of end users, and so on. Existing digital preservation policy documentation outlines file formats and specifications for preservation-quality archival masters for images, audio, and video files that are created through our digitization unit. However, as UHL conducts a greater number of mass digitization projects – and accumulates an ever larger number of high-resolution archival master files – greater flexibility is needed. By choosing to create lower-resolution files for some projects, we would reduce the total storage for our digital collections, thereby reducing our carbon footprint.

For instance, we may choose to retain large, high-resolution archival TIFFs for each page image of a medieval manuscript book, because researchers study minute details in the paper quality, ink and decoration, and the scribe’s lettering and handwriting. By contrast, a digitized UH thesis or dissertation from the mid-20th century could be stored long-term as one relatively small PDF, since the informational value of its contents (and not its physical characteristics) is what we are really trying to preserve. Similarly, we are currently discussing the workflow implications of providing an entire archival folder as a single PDF in our access system. Although the initial goal of this initiative was to make a larger amount of archival material quickly available online for patrons, the much smaller amount of storage needed to store one PDF vs. dozens or hundreds of high-res TIFF masters would also have a positive impact on the sustainability of the digital preservation and access systems.

UHL’s digital preservation policy also includes requirements for monthly fixity checking of a random sample of preservation packages stored in Archivematica, with a full fixity check of all packages to be conducted every three years during an audit of the overall digital preservation program. Frequent fixity checking is computationally intensive, though, and adds to the total energy expenditure of an institution’s digital preservation program. But in UHL’s local storage infrastructure, storage units run on the ZFS filesystem, which includes self-healing features such as internal checksum checks each time a read/write action is performed. This storage infrastructure was put in place in 2019, but we have not yet updated our policies and procedures for fixity checking to reflect the improved baseline durability of assets in storage.

Best practices calling for frequent fixity checks were developed decades ago – but modern technology like ZFS may be able to passively address our need for file integrity and durability in a less resource-intensive way. Through considered analysis matching the frequency of fixity checking to the features of our storage infrastructure, we may come to the conclusion that less frequent hands-on fixity checks, on a smaller random sample of packages, is sufficient moving forward. Since this is a new area of inquiry for me, I would love to hear thoughts from other digital preservationists about the pros and cons to such an approach – is fixity checking really the end-all, or could we use additional technological elements as part of a broader file integrity strategy over time?

Future work

I eagerly anticipate refining this electricity consumption research with exact figures and values (rather than estimates) when we are able to more consistently return to campus. We would like to investigate overhead costs such as lighting and HVAC in UHL’s server room, and we plan to grab point-in-time values physically from the power distribution units in the racks. Also, there may be additional power statistics that our Sys Admin can capture from the VMware hosts – which would allow us to begin on this portion of the research remotely in the interim. Furthermore, I plan to explore additional factors to provide a broader understanding of the impact of UHL’s energy consumption for digital systems and initiatives. By gaining more details on our total storage capacity, percentage of storage utilization, and GHG emissions per TB, we will be able to communicate about our carbon footprint in a way that will allow other libraries and archives to compare or estimate the environmental impact of their digital programs as well.

I would also like to investigate whether changes in preservation processes, such as the reduced hands-on fixity strategy outlined above, can have a positive impact on our energy expenditure – and whether this strategy can still provide a high level of integrity and durability for our digital assets over time. Finally, as a longer-term initiative I would like to take a deeper look at sustainability factors beyond energy expenditure, such as current practices for recycling e-waste on campus or a possible future life-cycle assessment for our hardware infrastructure. Through these efforts, I hope to help improve the long-term sustainability of UHL’s digital initiatives, and to aid other digital preservationists to undertake similar assessments of their programs and institutions as well.


Bethany Scott is Digital Projects Coordinator at the University of Houston Libraries, where she is a contributor to the development of the BCDAMS ecosystem incorporating Archivematica, ArchivesSpace, Hyrax, and Avalon. As a representative of UH Special Collections, she contributes knowledge on digital preservation, born-digital archives, and archival description to the BCDAMS team.

Estimating Energy Use for Digital Preservation, Part I

by Bethany Scott

This post is part of our BloggERS Another Kind of Glacier series. Part II will be posted next week.


Although the University of Houston Libraries (UHL) has taken steps over the last several years to initiate and grow an effective digital preservation program, until recently we had not yet considered the long-term sustainability of our digital preservation program from an environmental standpoint. As the leader of UHL’s digital preservation program, I aimed to address this disconnect by gathering information on the technology infrastructure used for digital preservation activities and its energy expenditures in collaboration with colleagues from UHL Library Technology Services and the UH Office of Sustainability. I also reviewed and evaluated the requirements of UHL’s digital preservation policy to identify areas where the overall sustainability of the program may be improved in the future by modifying current practices.

Inventory of equipment

I am fortunate to have a close collaborator in UHL’s Systems Administrator, who was instrumental in the process of implementing the technical/software elements of our digital preservation program over the past few years. He provided a detailed overview of our hardware and software infrastructure, both for long-term storage locations and for processing and workflows.

UHL’s digital access and preservation environment is almost 100% virtualized, with all of the major servers and systems for digital preservation – notably, the Archivematica processing location and storage service – running as virtual machines (VMs). The virtual environment runs on VMware ESXi and consists of five physical host servers that are part of a VMware vSAN cluster, which aggregates the disks across all five host servers into a single storage datastore.

VMs where Archivematica’s OS and application data reside may have their virtual disk data spread across multiple hosts at any given time. Therefore, exact resource use for digital preservation processes running via Archivematica is difficult to distinguish or pinpoint from other VM systems and processes, including UHL’s digital access systems. After discussing possible approaches for calculating the energy usage, we decided to take a generalized or blanket approach and include all five hosts. This calculation thus represents the energy expenditure for not only the digital preservation system and storage, but also for the A/V Repository and Digital Collections access systems. At UHL, digital access and preservation are strongly linked components of a single large ecosystem, so the decision to look at the overall energy expenditure makes sense from an ecosystem perspective.

In addition to the VM infrastructure described above, all user and project data is housed in the UHL storage environment. The storage environment includes both local shared network drive storage for digitized and born-digital assets in production, and additional shares that are not accessible to content producers or other end users, where data is processed and stored to be later served up by the preservation and access systems. Specifically, with the Archivematica workflow, preservation assets are processed through a series of automated preservation actions including virus scanning, file format characterization, fixity checking, and so on, and are then transferred and ingested to secure preservation storage.

UHL’s storage environment consists of two servers: a production unit and a replication unit. Archivematica’s processing shares are not replicated, but the end storage share is replicated. Again, for purposes of simplification, we generalized that both of these resources are being used as part of the digital preservation program when analyzing power use. Finally, within UHL’s server room there is a pair of redundant network switches that tie all the virtual and storage components together.

The specific hardware components that make up the digital access and preservation infrastructure described above include:

  • One (1) production storage unit: iXsystems True NAS M40 HA (Intel Xeon Silver 4114 CPU @ 2.2 Ghz and 128 GB RAM)
  • One (1) replication storage unit: iXsystems FreeNAS IXC-4224 P-IXN (Intel Xeon CPU E5-2630 v4 @ 2.2 Ghz and 128 GB RAM)
  • Two (2) disk expansion shelves: iXsystems ES60
  • Five (5) VMware ESXi hosts: Dell PowerEdge R630 (Intel Xeon CPU E5-2640 v4 @ 2.4 Ghz and 192 GB RAM)
  • Two (2) network switches: HPE Aruba 3810M 16SFP+ 2-slot

Electricity usage

Each of the hardware components listed above has two power supplies. However, the power draw is not always running at the maximum available for those power supplies and is dependent on current workloads, how many disks are in the units, and so on. Therefore, the power being drawn can be quantified but will vary over time.

With the unexpected closure of the campus due to COVID-19, I conducted this analysis remotely with the help of the UH campus Sustainability Coordinator. We compared the estimated maximum power draw based on the technical specifications for the hardware components, the draw when idle, and several partial power draw scenarios, with the understanding that the actual numbers will likely fall somewhere in this range.

Estimated power use and greenhouse gas emissions

 Daily Usage Total (Watts)Annual Total (kWh)Annual GHG (lbs)
Max9,09479,663.44124,175.71
95%8,639.375,680.268117,966.92
90%8,184.671,697.096111,758.14
85%7,729.967,713.924105,549.35
80%7,275.263,730.75299,340.565
Idle5,365.4647,001.4373,263.666

The estimated maximum annual greenhouse gas emissions derived from power use for the digital access and preservation hardware is over 124,000 pounds, or approximately 56.3 metric tons. To put this in perspective, it’s equivalent to the GHG emissions from nearly 140,000 miles driven by an average passenger vehicle, and to the carbon dioxide emissions from 62,063 pounds of coal burned or 130 barrels of oil consumed. While I hope to refine this analysis further in the future, for now these figures can serve as an entry point to discussions on the importance of environmental sustainability actions – and our plans to reduce our consumption – with Libraries administration, colleagues in the Office of Sustainability, and other campus leaders.

Part II, including conclusions and future work, will be posted next week.


Bethany Scott is Digital Projects Coordinator at the University of Houston Libraries, where she is a contributor to the development of the BCDAMS ecosystem incorporating Archivematica, ArchivesSpace, Hyrax, and Avalon. As a representative of UH Special Collections, she contributes knowledge on digital preservation, born-digital archives, and archival description to the BCDAMS team.

Call for bloggERS: Blog Posts on the BitCurator Users Forum

With short weeks to go before the virtual 2020 BitCurator Users Forum (October 13-16), bloggERS is seeking attendees who are interested in writing a re-cap or a blog post covering a particular session, theme, or topic relevant to SAA Electronic Records Section members. The program for the Forum is available here.

Please let us know if you are interested in contributing by sending an email to ers.mailer.blog@gmail.com! You can also let us know if you’re interested in writing a general re-cap or if you’d like to cover something more specific.

Writing for bloggERS!

  • We encourage visual representations: Posts can include or largely consist of comics, flowcharts, a series of memes, etc!
  • Written content should be roughly 600-800 words in length
  • Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
  • Align with other editorial guidelines as outlined in the bloggERS! guidelines for writers.

Call for Submissions: Dispatches from a Distance, Returning & Reopening

Earlier this year, many of us were asked to work from home and distance ourselves from colleagues and friends due to the global spread of COVID-19. Some of us are still in this position of working remotely, some of us have returned to our places of work, and some of us are now somewhere in-between or mixing multiple modes of work.

As some small step in lessening the isolation between us, BloggERS! began publishing a series called “Dispatches from a Distance” to provide a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. Now that so many of us are returning to full- or part-time on-site work, we’d like to extend this series to include reflections on reopening, returning to work, and other anxieties facing the profession due to COVID-19. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas you’d like to share with other readers of the Electronic Records Section blog.

Dispatches should be between 200-500 words and can be submitted here. Posts should adhere to the SAA code of ethics for archivists.

We look forward to hearing from all of you!

–The BloggERS! Editorial Subcommittee

An intern’s experience: Preserving Jok Church’s Beakman

by Matt McShane

I seem to have a real penchant for completing my schoolings in the middle of “once-in-a-lifetime” economic crises: first the 2008 housing recession, and now a global pandemic. And while the current situation has somewhat altered the final semester of my MLIS program—not to mention many others’ situations much more intensely—I was still able to have a very engaging and rewarding practicum experience at The Ohio State University Libraries, working on an incredible digital collection accessioned by the Billy Ireland Cartoon Library and Museum. 

U Can with Beakman and Jax might be best known to a lot of people as the predecessor of the short-lived Saturday morning live action science show Beakman’s World, but the comic is arguably more successful than the television show it produced. With an international readership and a run that lasted more than 25 years, it was a success story that entertained and educated readers over generations. It was also the first syndicated newspaper comic to be entirely digitally drawn and distributed. Jok Church, the creator and author, used Adobe Illustrator throughout the run of the comic, and saved and migrated the files in various stages of creation through multiple hard drives. With the exception of a few gaps, the entire run was saved on Jok’s hard drive at the time of his death in April 2016. These are the files we received at University Libraries. 

Richard Bolingbroke, a friend of Jok’s and executor of his estate, donated the collection to the Billy Ireland. He also provided us with an in-progress biography of Jok, which gave insight into who he was as a person beyond his work with Beakman and Jax, as well as a condensed history of the publication. This will be useful as the Billy Ireland creates author metadata and information for the collection.

Richard provided us access to direct copies of twenty-four folders via Dropbox, containing nearly 10,000 files, which we downloaded to our local processing server. Each folder contained a year’s worth of comics, from 1993 to 2016, though the years 1995 and 1996 were empty due to a hard drive failure Jok had experienced. We’re still in the process of possibly hunting down any existing backups from these years. In the existing folders, though, we found not only the many years’ worth of terrific comic content, but also a glimpse into Jok’s creative and organizational process. An initial DROID scan of the contents found over 2,000 duplicate files scattered throughout. After speaking with Richard about this, we determined it to be a mistaken copy/paste issue. Rather than manipulate the existing archival collection, we decided to create a distribution collection better organized for user access to the works, with the intention of maintaining archival integrity of the donated collection. 

Before either of those goals could be reached, though, our second primary issue was that of file extensions. We found nearly 1,300 files without extensions in the collection, which we determined to be due to older Mac OS’s use of files without extensions appended. Adobe Illustrator produces both .ai and .eps file types. There are other file types among the collection, but these are the primary types for each work. It was impossible to determine which files were .ai versus .eps at a batch level, so the EXIF metadata of all files without extensions were manually examined to determine their proper extension. Using Bulk Rename Utility, we were able to semi-batch the extension appending, but it still required a fair amount of manual labor due to the intermingled nature of the different file types within subfolders. 

Even though create dates within EXIF metadata were unreliable because of different versions of Illustrator being used to access files throughout the years, Jok named and organized his files by publication date, which gave us reliable organization metadata for our distribution file. His file and folder organization did shift throughout the years—understandable over two decades and who knows how many machines. This required a fair bit of manual labor in creating and organizing the distribution collection in a standardized file name and subfolder format. The comic was published weekly, albeit with some breaks. Typically there are four different versions: portrait versus landscape and black and white versus color of the finished product. A year\month\date folder tree was created based on how the largest portion of Jok’s files were organized. Once that was completed, we shifted focus to Ohio State’s Accessibility standards, and investigated a batch workflow to convert the comic files to PDF/A. Unfortunately, we could not achieve PDF/A compliance due to the nature of the original files; additionally, the “batch” processing includes a significant human interaction.

Further complicating matters, while we were discovering this, the COVID-19 global pandemic hit Ohio. In response, Ohio State declared all non-essential personnel to move to tele-work, which cut off my access to the server behind the University’s firewall for the remainder of my internship. As a result, we had to put the completion of this project on indefinite hold. Despite these extreme circumstances preventing me from seeing the collection all the way through to public hands, I was able to leave it in an organized state, ready for file conversion and metadata creation. 

I learned a lot by being able to handle the collection from the beginning, untouched. One of the biggest takeaways was the importance of gathering information about the collection and its creator up front. Creating a manifest of the objects within the collection is a clear necessity to knowing how the collection should be preserved, and how it should be accessed, but also allowed us to see gaps in the collection, such as the significant number of duplicates and files without extensions. Having this knowledge up front allowed us to better plan our approach to the collection. I have actually suggested increasing students’ exposure to “messy” digital objects collections to my program’s faculty based on my experience with this project. 

The other key takeaway I discovered was that sometimes it might be best to dirty your hands, and perform tasks manually. Digital preservation can have a lot of automated shortcuts compared to processing its traditional analog cousins, but not everything can or should be done through batch processes. While it may be technically possible to program a process, it may not really be the best use of time or effort. Part of workflow development is recognizing when the creation of an automated solution outweighs the time and effort to manually perform the task. It may have been possible to code a script to identify and append the file extensions for our objects missing them, but the effort and time to learn, write, and troubleshoot that likely would have be greater than the somewhat tedious work of doing it by hand in this instance. Alternatively, it might be worth looking into automated scripting if this were a significantly larger collection of mislabeled or disorganized objects. Having a good understanding of cost and benefit is important when approaching a problem that can have multiple solutions.

My time on-site with The Ohio State University Libraries was a bit shorter than I had intended, but it still provided me with a great experience and helped to solidify my love for the digital preservation process and work. The fact that U Can with Beakman and Jax is the first digitally created syndicated newspaper comic makes the whole experience that much more apt and impactful. Even though some aspects of work are in limbo at the moment, I am confident that this terrific collection of Jok’s work will be available for the public to enjoy and learn from. Even if I am not able to fully carry the work over the finish line, I am thankful for the opportunity to work on it as much as I did. 


Matt McShane, a recent MLIS graduate from Kent State University, is currently focused on landing a role with a cultural heritage institution where he can work hands-on with digital collections, digital preservation, and influence broader preservation policy.

Dispatches from a Distance: New Avenues for Empathy

This is the eighth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be). Dispatches should be between 200-500 words and can be submitted here.


Jamie Patrick-Burns

I have been working from home since mid-March, when the State Archives of North Carolina transitioned to remote work. While I certainly miss the in-person contact, conversations with my office-mate or colleagues just down the hall, I’m finding new routes to connect. In fact, I’ve been pleasantly surprised by how teleworking has made me feel more connected to the State Archive’s users. As Digital Archivist, I don’t usually have a lot of direct contact with the general public. I do engage with the public through social media, documentation, and periodic shifts at the reference desk, but my customer service is largely geared toward my coworkers in the archives and in state agencies. However, I’ve been thinking about our patrons in a new way due to two different experiences during the COVID-19 pandemic. 

First, working remotely has given me a new perspective and more empathy for our patrons as they navigate our online presence. Small shifts in my own context, such as the room I’m occupying or the computer and browser I’m using, are enough to throw me off my well-worn paths of virtual travel. I find myself Googling just a little bit more or searching for information online that I may have previously found in hard copy or an internal document. I’m revisiting a bit of what things feel like for the uninitiated, a perspective I haven’t had since I joined the State Archives of North Carolina in 2017. I’m thinking about our social media content as ever more central to our engagement. I’ve been getting more emails from members of the public who come across my email address, which means more people are indeed turning to our online presence while we’re closed to in-person visitors. I hope that this empathy for remote users carries through to my patron and staff interactions.

Second, the concern and empathy shown by the entire Department of Natural and Cultural Resources through the project Your Story is North Carolina’s Story has been very meaningful to me. The initiative to collect personal materials documenting the COVID-19 pandemic for North Carolinians has required collaboration, creativity, and hard work in our division and I’m inspired by how my colleagues have risen to the challenge. While my role of facilitating the transfer of digital records is only one piece of the puzzle, I am proud to be part of this project. I hope the public feels the deep care, empathy, and human connection that I also feel from this initiative. 

Even as I’m more physically distant from my professional connections than ever before, I’m feeling connected to my colleagues and our patrons in new ways. Whatever and whenever re-emergence from this difficult time looks like, I know I have gained some new perspectives and modes of caring that I plan to carry with me. 

Integrating Environmental Sustainability into Policies and Workflows

by Keith Pendergrass

This is the first post in the BloggERS Another Kind of Glacier series.


Background and Challenges

My efforts to integrate environmental sustainability and digital preservation in my organization—Baker Library Special Collections at Harvard Business School—began several years ago when we were discussing the long-term preservation of forensic disk images in our collections. We came to the conclusion that keeping forensic images instead of (or in addition to) the final preservation file set can have ethical, privacy, and environmental issues. We decided that we would preserve forensic images only in use cases where there was a strong need to do so, such as a legal mandate in our records management program. I talked about our process and results at the BitCurator Users Forum 2017.

From this presentation grew a collaboration with three colleagues who heard me speak that day: Walker Sampson, Tessa Walsh, and Laura Alagna. Together, we reframed my initial inquiry to focus on environmental sustainability and enlarged the scope to include all digital preservation practices and the standards that guide them. The result was our recent article and workshop protocol.

During this time, I began aligning our digital archives work at Baker Library with this research as well as our organization-wide sustainability goals. My early efforts mainly took the form of the stopgap measures that we suggest in our article: turning off machines when not in use; scheduling tasks for off-peak network and electricity grid periods; and purchasing renewable energy certificates that promote additionality, which is done for us by Harvard University as part of its sustainability goals. As these were either unilateral decisions or were being done for me, they were straightforward and quick to implement.

To make more significant environmental gains along the lines of the paradigm shift we propose in our article, however, requires greater change. This, in turn, requires more buy-in and collaboration within and across departments, which often slows the process. In the face of immediate needs and other constraints, it can be easy for decision makers to justify deprioritizing the work required to integrate environmental sustainability into standard practices. With the urgency of the climate and other environmental crises, this can be quite frustrating. However, with repeated effort and clear reasoning, you can make progress on these larger sustainability changes. I found success most often followed continual reiteration of why I wanted to change policy, procedure, or standard practice, with a focus on how the changes would better align our work and department with organizational sustainability goals. Another key argument was showing how our efforts for environmental sustainability would also result in financial and staffing sustainability.

Below, I share examples of the work we have done at Baker Library Special Collections to include environmental sustainability in some of our policies and workflows. While the details may be specific to our context, the principles are widely applicable: integrate sustainability into your policies so that you have a strong foundation for including environmental concerns in your decision making; and start your efforts with appraisal as it can have the most impact for the time that you put in.

Policies

The first policy in which we integrated environmental sustainability was our technology change management policy, which controls our decision making around the hardware and software we use in our digital archives workflows. The first item we added to the policy was that we must dispose of all hardware following environmental standards for electronic waste and, for items other than hard drives, that we must donate them for reuse whenever possible. The second item involved more collaboration with our IT department, which controls computer refresh cycles, so that we could move away from the standard five-year replacement timeframe for desktop computers. The workstations that we use to capture, appraise, and process digital materials are designed for long service lives, heavy and sustained workloads, and easy component change out. We made our case to IT—as noted above, this was an instance where the complementarity of environmental and financial sustainability was key—and received an exemption for our workstations, which we wrote into our policy to ensure that it becomes standard practice.

We can now keep the workstations as long as they remain serviceable and work with IT to swap out components as they fail or need upgrading. For example, we replaced our current workstations’ six-year-old spinning disk drives with solid state drives when we updated from Windows 7 to Windows 10, improving performance while maintaining compliance with IT’s security requirements. Making changes like this allows us to move from the standard five-year to an expected ten-year service life for these workstations (they are currently at 7.5 years). While the policy change and subsequent maintenance actions are small, they add up over time to provide substantial reductions in the full life-cycle environmental and financial costs of our hardware.

We also integrated environmental sustainability into our new acquisition policy. The policy outlines the conditions and terms of several areas that affect the acquisition of materials in any format: appraisal, agreements, transfer, accessioning, and documentation. For appraisal, we document the value and costs of a potential acquisition, but previously had been fairly narrow in our definition of costs. With the new policy, we broadened the costs that were in scope for our acquisition decisions and as part of this included environmental costs. While only a minor point in the policy, it allows us to determine environmental costs in our archival and technical appraisals, and then take those costs into account when making an acquisition decision. Our next step is to figure out how best to measure or estimate environmental impacts for consistency across potential acquisitions. I am hopeful that explicitly integrating environmental sustainability into our first decision point—whether to acquire a collection—will make it easier to include sustainability in other decision points throughout the collection’s life cycle.

Workflows

In a parallel track, we have been integrating environmental sustainability into our workflows, focusing on the appraisal of born-digital and audiovisual materials. This is a direct result of the research article noted above, in which we argue that focusing on selective appraisal can be the most consequential action because it affects the quantity of digital materials that an organization stewards for the remainder of those materials’ life cycle and provides an opportunity to assign levels of preservation commitment. While conducting in-depth appraisal prior to physical or digital transfer is ideal, it is not always practical, so we altered our workflows to increase the opportunities for appraisal after transfer.

For born-digital materials, we added an appraisal point during the initial collection inventory, screening out storage media whose contents are wholly outside of our collecting policy. We then decide on a capture method based on the type of media: we create disk images of smaller-capacity media but often package the contents of larger-capacity media using the bagit specification (unless we have a use case that requires a forensic image) to reduce the storage capacity needed for the collection and to avoid the ethical and privacy issues previously mentioned. When we do not have control of the storage media—for network attached storage, cloud storage, etc.—we make every attempt to engage with donors and departments to conduct in-depth appraisal prior to capture, streamlining the remaining appraisal decision points.

After capture, we conduct another round of appraisal now that we can more easily view and analyze the digital materials across the collection. This tends to be a higher-level appraisal during which we make decisions about entire disk images or bagit bags, or large groupings within them. Finally (for now), we conduct our most granular and selective appraisal during archival processing when processing archivists, curators, and I work together to determine what materials should be part of the collection’s preservation file set. As our digital archives program is still young, we have not yet explored re-appraisal at further points of the life cycle such as access, file migration, or storage refresh.

For audiovisual materials, we follow a similar approach as we do for born-digital materials. We set up an audiovisual viewing station with equipment for reviewing audiocassettes, microcassettes, VHS and multiple Beta-formatted video tapes, multiple film formats, and optical discs. We first appraise the media items based on labels and collection context, and with the viewing station can now make a more informed appraisal decision before prioritizing for digitization. After digitization, we appraise again, making decisions on retention, levels of preservation commitment, and access methods.

While implementing multiple points of selective appraisal throughout workflows is more labor intensive than simply conducting an initial appraisal, several arguments moved us to take this approach: it is a one-time labor cost that helps us reduce on-going storage and maintenance costs; it allows us to target our resources to those materials that have the most value for our community; it decreases the burden of reappraisal and other information maintenance work that we are placing on future archivists; and, not least, it reduces the on-going environmental impact of our work.


Keith Pendergrass is the digital archivist for Baker Library Special Collections at Harvard Business School, where he develops and oversees workflows for born-digital materials. His research and general interests include integration of sustainability principles into digital archives standard practice, systems thinking, energy efficiency, and clean energy and transportation. He holds an MSLIS from Simmons College and a BA from Amherst College.

Digital Object Modeling

Submitted by Erin Faulder

The Division of Rare and Manuscript Collections (RMC) at Cornell University Library (CUL) was a leader in early digitization endeavors. However, infrastructure to support coordination between archival description and digital material has not kept pace. In 2019, RMC implemented ArchivesSpace and I turned my attention to developing practice to connect archival description and digital object management.

CUL has distributed systems for displaying and preserving digitized content, and RMC has historically refrained from describing and linking to digitized content within EAD. As a result, I’ve taken this opportunity to thoughtfully engage the array of systems that we use in order to model digital objects in ASpace to best take advantage of future technological developments.

I could find almost no information about how other institutions represent their digital content in ASpace. Perhaps other institutions had <dao> elements from EAD that were imported into ASpace or other data structured from legacy systems, and have not critically evaluated, documented, and shared their practice. Further, the ASpace documentation itself makes no recommendations about how to represent digital content in the digital object module, and it’s unclear how widely or consistently the community is using this functionality. 

Given the distributed systems at CUL that store RMC’s digital content, ASpace is the system of record for archival description and basic descriptive information for digital content. It should be the hub that connects physical material to digital surrogates in both delivery environments and preservation systems. To appropriately evaluate the possible representations, I set several goals for our model. The model must support our ability to:

  • batch-create digital objects in ASpace based on systems and rules. No human data entry of digital objects should be required. 
  • represent both digitized and born digital content with clear indications which is which. 
  • bulk update URLs as access systems change. (Preservation systems have permanent identifiers that require less metadata maintenance.)
  • maintain and represent machine-actionable contextual relationships between
    • physical items and digital surrogates;
    • archival collections and digital material that lives in systems that are largely unaware of archival arrangement and description;
    • preservation object in one system and delivery object(s) in another system.
  • enable users, curators, and archivists to answer:
    • Is this thing born digital? 
    • Has this thing been digitized and where is the surrogate?
    • Where do I go to find the version (Preservation vs. Delivery) I want?
    • Where is all of the digital material for this collection?
    • How much of a collection has been digitized?

ASpace is not the system of record for technical, administrative (other than collection-level), or detailed descriptive metadata about our digital objects. Nor does ASpace need to understand how objects are further modeled within delivery or preservation systems. The systems that store the material handle those functions. Setting clear functional boundaries was essential to determining which option would meet my established needs as I balanced flexibility for unimagined future needs and current limited resources to create the digital object records at a large scale.

Given this set of requirements, I drafted four possible modeling scenarios that are represented visually, along with a metadata profile for the digital objects:

I then talked through several real-world examples of digitized material (ex. A/V, single-page image/text, multi-page image/text) for each of these scenarios with CUL colleagues from metadata and digital lifecycle services. Their fresh, non-archivist questions helped clarify my thinking. 

  • Scenario 1: 
  • Pros: 
    • Simple structure.
  • Cons:
    • RMC’s local ID (used to identify media objects in a human-readable form) only exists on the archival object in the component ID field.
    • Preservation and delivery objects only recognize a relationship with each other through the linked archival object. This is a potential break point if the links aren’t established or maintained accurately.
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or by parsing the identifier.
  • Scenario 2: 
  • Pros: 
    • Preservation and delivery objects linked through a single object making the relationship between preservation and delivery object clear.
    • Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
    • Local ID easily attached to the digital object.
  • Cons:
    • No place to store delivery system Identifier if using file version URI for the URL.
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note or parsing the URI structure.
    • Challenging to ensure that the identifier is unique across ASpace given legacy practices of assigning local identifiers.
  • Scenario 3:
  • Pros:
    • Preservation and delivery versions as digital object components linked through a single object make the relationship between preservation and delivery object clear.
    • Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
    • Local ID easily attached to the digital object.
  • Cons:
    • Creating a human-meaningful label or title for a digital object component is time consuming.
    • Challenging to ensure identifiers are unique across ASpace given legacy practices of assigning local identifiers.
  • Scenario 4:
  • Pros:
    • High level of granularity in parsing data to objects, potentially providing extensible functionality in the future.
  • Cons:
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or parsing identifier.
    • Time consuming to create a human-meaningful label or title for the digital object component, particularly for born-digital material.
    • Complex hierarchy that may be more trouble to navigate in an automated fashion with no significant benefit.

Following several conversations exploring the pros, cons, and non-archival interpretations of these representations, I ultimately decided to use scenario 1. It seemed to represent the digital objects in a way that was simplest to batch-create digital objects, once explained to technologists it was most intuitive, and it hacks the ASpace fields from their presumed use the least. 

I made two changes to the scenario to address some of the feedback raised by CUL staff. First, there will be no file-level information in the preservation package objects since that is managed well in the preservation system already and there’s no direct linking into that system. Identifiers stored in ASpace could allow us to add the information later if we find a need for it. Second in order to facilitate identifying whether an object was a preservation or delivery object, I added a user-defined controlled vocabulary field for either “Preservation” or “Delivery” to facilitate machine-actionable identification of object type. Additionally, in order to help users in the ASpace interface identify which record is which when the digital objects titles are identical, I’ll append the title with either [Preservation] or [Delivery]. 

The primary limitation of this model is that there is no way to directly define a relationship between the delivery object and preservation object. If the link between digital object(s) and archival object is broken or incorrect, there will be limited options for restoring contextual understanding of content. This lack of direct referencing means that when a patron requests a high resolution version of an object they found online an archivist must search for the delivery identifier in ASpace, find the digital object representing the delivery object, navigate to the linked archival object, and then to the linked preservation object in order to request retrieval from preservation storage. This is a clunky go-up-to-go-down mechanism that I hope to find a solution for eventually. 

Choosing scenario 1 also means enforcing that digital objects are packaged and managed at the level of archival description. We’ve been moving this direction for a while, but description for existing digitized material described at a level lower than existing archival description must be added to ASpace in order to add and link the digital objects. But that is another blog post entirely.

Erin Faulder, Assistant Director for Digital Strategies for Division of Rare and Manuscript Collections

Recap: Islandora/Fedora Camp, Arizona State University, February 24-26, 2020

At the end of February, I was thrilled to be able to travel to Tempe, Arizona to attend the Islandora and Fedora Camp hosted by Arizona State University. At the Tri-College Consortium, we’re currently working on a migration from ContentDM and DSpace to Islandora 7, with a planned additional migration to Islandora 8 in 1-2 years. With the current state of the world and limited access to on-site resources, understanding and improving our digital collections platforms has become more important than ever.

Notably, this was the first Islandora/Fedora camp that presented a single combined track for both developers and collection managers. Personally, I felt that this new format was a major strength of the camp; it was valuable to be able to interface with developers and committers of the Islandora software as well as colleagues from other implementing institutions who manage digital collections. It was also great to hear stories about how someone got involved as an Islandora committer, which provided some inspirations for viable paths to contributing to the community, and successes and failures from other users’ migrations and installations.

Camp sessions were split between educational overviews, presentations from users, and hands-on tutorials. Tutorials included basic content management in Drupal 8, core functions of Fedora, and bulk ingest processes, among others. Tutorial-givers included Melissa Anez (Islandora Foundation) and David Wilcox (Lyrasis), Bethany Seeger (Johns Hopkins), Daniel Lamb (Islandora Foundation), and Seth Shaw (UNLV). 

Punctuating our full days of learning were discussions amongst implementers from many different types of institutions. I felt amongst the general attendees of the camp that the dominating concern for implementers is migrating from Islandora 7 to Islandora 8. While a number of institutions have forged ahead with this migration, many institutions are waiting and watching for the tools and documentation to smooth out the process.

Another topic of conversation warranting further reflection is how institutions are integrating Islandora and Fedora into larger digital preservation strategies and practices. I learned from Islandora staff that there used to be a working group for digital preservation, but this has mostly fallen by the wayside. If you’re interested in starting that back up, feel free to contact the Islandora staff to learn more about the process!


Emily Higgs is the Digital Archivist for the Friends Historical Library at Swarthmore College. She is the Assistant Team Leader for bloggERS, the blog for SAA’S Electronic Records Section.