This is the eighth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
I have been working from home since mid-March, when the State Archives of North Carolina transitioned to remote work. While I certainly miss the in-person contact, conversations with my office-mate or colleagues just down the hall, I’m finding new routes to connect. In fact, I’ve been pleasantly surprised by how teleworking has made me feel more connected to the State Archive’s users. As Digital Archivist, I don’t usually have a lot of direct contact with the general public. I do engage with the public through social media, documentation, and periodic shifts at the reference desk, but my customer service is largely geared toward my coworkers in the archives and in state agencies. However, I’ve been thinking about our patrons in a new way due to two different experiences during the COVID-19 pandemic.
First, working remotely has given me a new perspective and more empathy for our patrons as they navigate our online presence. Small shifts in my own context, such as the room I’m occupying or the computer and browser I’m using, are enough to throw me off my well-worn paths of virtual travel. I find myself Googling just a little bit more or searching for information online that I may have previously found in hard copy or an internal document. I’m revisiting a bit of what things feel like for the uninitiated, a perspective I haven’t had since I joined the State Archives of North Carolina in 2017. I’m thinking about our social media content as ever more central to our engagement. I’ve been getting more emails from members of the public who come across my email address, which means more people are indeed turning to our online presence while we’re closed to in-person visitors. I hope that this empathy for remote users carries through to my patron and staff interactions.
Second, the concern and empathy shown by the entire Department of Natural and Cultural Resources through the project Your Story is North Carolina’s Story has been very meaningful to me. The initiative to collect personal materials documenting the COVID-19 pandemic for North Carolinians has required collaboration, creativity, and hard work in our division and I’m inspired by how my colleagues have risen to the challenge. While my role of facilitating the transfer of digital records is only one piece of the puzzle, I am proud to be part of this project. I hope the public feels the deep care, empathy, and human connection that I also feel from this initiative.
Even as I’m more physically distant from my professional connections than ever before, I’m feeling connected to my colleagues and our patrons in new ways. Whatever and whenever re-emergence from this difficult time looks like, I know I have gained some new perspectives and modes of caring that I plan to carry with me.
My efforts to integrate environmental sustainability and digital preservation in my organization—Baker Library Special Collections at Harvard Business School—began several years ago when we were discussing the long-term preservation of forensic disk images in our collections. We came to the conclusion that keeping forensic images instead of (or in addition to) the final preservation file set can have ethical, privacy, and environmental issues. We decided that we would preserve forensic images only in use cases where there was a strong need to do so, such as a legal mandate in our records management program. I talked about our process and results at the BitCurator Users Forum 2017.
From this presentation grew a collaboration with three colleagues who heard me speak that day: Walker Sampson, Tessa Walsh, and Laura Alagna. Together, we reframed my initial inquiry to focus on environmental sustainability and enlarged the scope to include all digital preservation practices and the standards that guide them. The result was our recent article and workshop protocol.
During this time, I began aligning our digital archives work at Baker Library with this research as well as our organization-wide sustainability goals. My early efforts mainly took the form of the stopgap measures that we suggest in our article: turning off machines when not in use; scheduling tasks for off-peak network and electricity grid periods; and purchasing renewable energy certificates that promote additionality, which is done for us by Harvard University as part of its sustainability goals. As these were either unilateral decisions or were being done for me, they were straightforward and quick to implement.
To make more significant environmental gains along the lines of the paradigm shift we propose in our article, however, requires greater change. This, in turn, requires more buy-in and collaboration within and across departments, which often slows the process. In the face of immediate needs and other constraints, it can be easy for decision makers to justify deprioritizing the work required to integrate environmental sustainability into standard practices. With the urgency of the climate and other environmental crises, this can be quite frustrating. However, with repeated effort and clear reasoning, you can make progress on these larger sustainability changes. I found success most often followed continual reiteration of why I wanted to change policy, procedure, or standard practice, with a focus on how the changes would better align our work and department with organizational sustainability goals. Another key argument was showing how our efforts for environmental sustainability would also result in financial and staffing sustainability.
Below, I share examples of the work we have done at Baker Library Special Collections to include environmental sustainability in some of our policies and workflows. While the details may be specific to our context, the principles are widely applicable: integrate sustainability into your policies so that you have a strong foundation for including environmental concerns in your decision making; and start your efforts with appraisal as it can have the most impact for the time that you put in.
The first policy in which we integrated environmental sustainability was our technology change management policy, which controls our decision making around the hardware and software we use in our digital archives workflows. The first item we added to the policy was that we must dispose of all hardware following environmental standards for electronic waste and, for items other than hard drives, that we must donate them for reuse whenever possible. The second item involved more collaboration with our IT department, which controls computer refresh cycles, so that we could move away from the standard five-year replacement timeframe for desktop computers. The workstations that we use to capture, appraise, and process digital materials are designed for long service lives, heavy and sustained workloads, and easy component change out. We made our case to IT—as noted above, this was an instance where the complementarity of environmental and financial sustainability was key—and received an exemption for our workstations, which we wrote into our policy to ensure that it becomes standard practice.
We can now keep the workstations as long as they remain serviceable and work with IT to swap out components as they fail or need upgrading. For example, we replaced our current workstations’ six-year-old spinning disk drives with solid state drives when we updated from Windows 7 to Windows 10, improving performance while maintaining compliance with IT’s security requirements. Making changes like this allows us to move from the standard five-year to an expected ten-year service life for these workstations (they are currently at 7.5 years). While the policy change and subsequent maintenance actions are small, they add up over time to provide substantial reductions in the full life-cycle environmental and financial costs of our hardware.
We also integrated environmental sustainability into our new acquisition policy. The policy outlines the conditions and terms of several areas that affect the acquisition of materials in any format: appraisal, agreements, transfer, accessioning, and documentation. For appraisal, we document the value and costs of a potential acquisition, but previously had been fairly narrow in our definition of costs. With the new policy, we broadened the costs that were in scope for our acquisition decisions and as part of this included environmental costs. While only a minor point in the policy, it allows us to determine environmental costs in our archival and technical appraisals, and then take those costs into account when making an acquisition decision. Our next step is to figure out how best to measure or estimate environmental impacts for consistency across potential acquisitions. I am hopeful that explicitly integrating environmental sustainability into our first decision point—whether to acquire a collection—will make it easier to include sustainability in other decision points throughout the collection’s life cycle.
In a parallel track, we have been integrating environmental sustainability into our workflows, focusing on the appraisal of born-digital and audiovisual materials. This is a direct result of the research article noted above, in which we argue that focusing on selective appraisal can be the most consequential action because it affects the quantity of digital materials that an organization stewards for the remainder of those materials’ life cycle and provides an opportunity to assign levels of preservation commitment. While conducting in-depth appraisal prior to physical or digital transfer is ideal, it is not always practical, so we altered our workflows to increase the opportunities for appraisal after transfer.
For born-digital materials, we added an appraisal point during the initial collection inventory, screening out storage media whose contents are wholly outside of our collecting policy. We then decide on a capture method based on the type of media: we create disk images of smaller-capacity media but often package the contents of larger-capacity media using the bagit specification (unless we have a use case that requires a forensic image) to reduce the storage capacity needed for the collection and to avoid the ethical and privacy issues previously mentioned. When we do not have control of the storage media—for network attached storage, cloud storage, etc.—we make every attempt to engage with donors and departments to conduct in-depth appraisal prior to capture, streamlining the remaining appraisal decision points.
After capture, we conduct another round of appraisal now that we can more easily view and analyze the digital materials across the collection. This tends to be a higher-level appraisal during which we make decisions about entire disk images or bagit bags, or large groupings within them. Finally (for now), we conduct our most granular and selective appraisal during archival processing when processing archivists, curators, and I work together to determine what materials should be part of the collection’s preservation file set. As our digital archives program is still young, we have not yet explored re-appraisal at further points of the life cycle such as access, file migration, or storage refresh.
For audiovisual materials, we follow a similar approach as we do for born-digital materials. We set up an audiovisual viewing station with equipment for reviewing audiocassettes, microcassettes, VHS and multiple Beta-formatted video tapes, multiple film formats, and optical discs. We first appraise the media items based on labels and collection context, and with the viewing station can now make a more informed appraisal decision before prioritizing for digitization. After digitization, we appraise again, making decisions on retention, levels of preservation commitment, and access methods.
While implementing multiple points of selective appraisal throughout workflows is more labor intensive than simply conducting an initial appraisal, several arguments moved us to take this approach: it is a one-time labor cost that helps us reduce on-going storage and maintenance costs; it allows us to target our resources to those materials that have the most value for our community; it decreases the burden of reappraisal and other information maintenance work that we are placing on future archivists; and, not least, it reduces the on-going environmental impact of our work.
Keith Pendergrass is the digital archivist for Baker Library Special Collections at Harvard Business School, where he develops and oversees workflows for born-digital materials. His research and general interests include integration of sustainability principles into digital archives standard practice, systems thinking, energy efficiency, and clean energy and transportation. He holds an MSLIS from Simmons College and a BA from Amherst College.
The Division of Rare and Manuscript Collections (RMC) at Cornell University Library (CUL) was a leader in early digitization endeavors. However, infrastructure to support coordination between archival description and digital material has not kept pace. In 2019, RMC implemented ArchivesSpace and I turned my attention to developing practice to connect archival description and digital object management.
CUL has distributed systems for displaying and preserving digitized content, and RMC has historically refrained from describing and linking to digitized content within EAD. As a result, I’ve taken this opportunity to thoughtfully engage the array of systems that we use in order to model digital objects in ASpace to best take advantage of future technological developments.
I could find almost no information about how other institutions represent their digital content in ASpace. Perhaps other institutions had <dao> elements from EAD that were imported into ASpace or other data structured from legacy systems, and have not critically evaluated, documented, and shared their practice. Further, the ASpace documentation itself makes no recommendations about how to represent digital content in the digital object module, and it’s unclear how widely or consistently the community is using this functionality.
Given the distributed systems at CUL that store RMC’s digital content, ASpace is the system of record for archival description and basic descriptive information for digital content. It should be the hub that connects physical material to digital surrogates in both delivery environments and preservation systems. To appropriately evaluate the possible representations, I set several goals for our model. The model must support our ability to:
batch-create digital objects in ASpace based on systems and rules. No human data entry of digital objects should be required.
represent both digitized and born digital content with clear indications which is which.
bulk update URLs as access systems change. (Preservation systems have permanent identifiers that require less metadata maintenance.)
maintain and represent machine-actionable contextual relationships between
physical items and digital surrogates;
archival collections and digital material that lives in systems that are largely unaware of archival arrangement and description;
preservation object in one system and delivery object(s) in another system.
enable users, curators, and archivists to answer:
Is this thing born digital?
Has this thing been digitized and where is the surrogate?
Where do I go to find the version (Preservation vs. Delivery) I want?
Where is all of the digital material for this collection?
How much of a collection has been digitized?
ASpace is not the system of record for technical, administrative (other than collection-level), or detailed descriptive metadata about our digital objects. Nor does ASpace need to understand how objects are further modeled within delivery or preservation systems. The systems that store the material handle those functions. Setting clear functional boundaries was essential to determining which option would meet my established needs as I balanced flexibility for unimagined future needs and current limited resources to create the digital object records at a large scale.
Given this set of requirements, I drafted four possible modeling scenarios that are represented visually, along with a metadata profile for the digital objects:
I then talked through several real-world examples of digitized material (ex. A/V, single-page image/text, multi-page image/text) for each of these scenarios with CUL colleagues from metadata and digital lifecycle services. Their fresh, non-archivist questions helped clarify my thinking.
RMC’s local ID (used to identify media objects in a human-readable form) only exists on the archival object in the component ID field.
Preservation and delivery objects only recognize a relationship with each other through the linked archival object. This is a potential break point if the links aren’t established or maintained accurately.
Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or by parsing the identifier.
Preservation and delivery objects linked through a single object making the relationship between preservation and delivery object clear.
Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
Local ID easily attached to the digital object.
No place to store delivery system Identifier if using file version URI for the URL.
Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note or parsing the URI structure.
Challenging to ensure that the identifier is unique across ASpace given legacy practices of assigning local identifiers.
Preservation and delivery versions as digital object components linked through a single object make the relationship between preservation and delivery object clear.
Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
Local ID easily attached to the digital object.
Creating a human-meaningful label or title for a digital object component is time consuming.
Challenging to ensure identifiers are unique across ASpace given legacy practices of assigning local identifiers.
High level of granularity in parsing data to objects, potentially providing extensible functionality in the future.
Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or parsing identifier.
Time consuming to create a human-meaningful label or title for the digital object component, particularly for born-digital material.
Complex hierarchy that may be more trouble to navigate in an automated fashion with no significant benefit.
Following several conversations exploring the pros, cons, and non-archival interpretations of these representations, I ultimately decided to use scenario 1. It seemed to represent the digital objects in a way that was simplest to batch-create digital objects, once explained to technologists it was most intuitive, and it hacks the ASpace fields from their presumed use the least.
I made two changes to the scenario to address some of the feedback raised by CUL staff. First, there will be no file-level information in the preservation package objects since that is managed well in the preservation system already and there’s no direct linking into that system. Identifiers stored in ASpace could allow us to add the information later if we find a need for it. Second in order to facilitate identifying whether an object was a preservation or delivery object, I added a user-defined controlled vocabulary field for either “Preservation” or “Delivery” to facilitate machine-actionable identification of object type. Additionally, in order to help users in the ASpace interface identify which record is which when the digital objects titles are identical, I’ll append the title with either [Preservation] or [Delivery].
The primary limitation of this model is that there is no way to directly define a relationship between the delivery object and preservation object. If the link between digital object(s) and archival object is broken or incorrect, there will be limited options for restoring contextual understanding of content. This lack of direct referencing means that when a patron requests a high resolution version of an object they found online an archivist must search for the delivery identifier in ASpace, find the digital object representing the delivery object, navigate to the linked archival object, and then to the linked preservation object in order to request retrieval from preservation storage. This is a clunky go-up-to-go-down mechanism that I hope to find a solution for eventually.
Choosing scenario 1 also means enforcing that digital objects are packaged and managed at the level of archival description. We’ve been moving this direction for a while, but description for existing digitized material described at a level lower than existing archival description must be added to ASpace in order to add and link the digital objects. But that is another blog post entirely.
Erin Faulder, Assistant Director for Digital Strategies for Division of Rare and Manuscript Collections
At the end of February, I was thrilled to be able to travel to Tempe, Arizona to attend the Islandora and Fedora Camp hosted by Arizona State University. At the Tri-College Consortium, we’re currently working on a migration from ContentDM and DSpace to Islandora 7, with a planned additional migration to Islandora 8 in 1-2 years. With the current state of the world and limited access to on-site resources, understanding and improving our digital collections platforms has become more important than ever.
Notably, this was the first Islandora/Fedora camp that presented a single combined track for both developers and collection managers. Personally, I felt that this new format was a major strength of the camp; it was valuable to be able to interface with developers and committers of the Islandora software as well as colleagues from other implementing institutions who manage digital collections. It was also great to hear stories about how someone got involved as an Islandora committer, which provided some inspirations for viable paths to contributing to the community, and successes and failures from other users’ migrations and installations.
Camp sessions were split between educational overviews, presentations from users, and hands-on tutorials. Tutorials included basic content management in Drupal 8, core functions of Fedora, and bulk ingest processes, among others. Tutorial-givers included Melissa Anez (Islandora Foundation) and David Wilcox (Lyrasis), Bethany Seeger (Johns Hopkins), Daniel Lamb (Islandora Foundation), and Seth Shaw (UNLV).
Punctuating our full days of learning were discussions amongst implementers from many different types of institutions. I felt amongst the general attendees of the camp that the dominating concern for implementers is migrating from Islandora 7 to Islandora 8. While a number of institutions have forged ahead with this migration, many institutions are waiting and watching for the tools and documentation to smooth out the process.
Another topic of conversation warranting further reflection is how institutions are integrating Islandora and Fedora into larger digital preservation strategies and practices. I learned from Islandora staff that there used to be a working group for digital preservation, but this has mostly fallen by the wayside. If you’re interested in starting that back up, feel free to contact the Islandora staff to learn more about the process!
Emily Higgs is the Digital Archivist for the Friends Historical Library at Swarthmore College. She is the Assistant Team Leader for bloggERS, the blog for SAA’S Electronic Records Section.
This is the seventh of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
As a part-time government employee, I was left wondering if I could continue historical research and archives inventory tasks once a stay-at-home order was a reality. When that came to pass in North Carolina, I joined the ranks of cultural heritage specialists who were sheltering in place to avoid the spread of COVID-19.
Meanwhile in Georgia, my Mom slowly altered her public and other activities. She told me that she had to go out to buy a television since one of her aging devices had lost picture by late March. Then she limited herself to going to the plant nursery and the grocery store. My mother is indeed among the vulnerable population, but she hasn’t spent a moment longing for the life she’s had to put aside. Instead, she continued to clean out the attic.
Both of my parents had long careers in academia, and like me, they felt the need to keep several years’ worth of papers, binders, notes, and memorabilia. I’m marveling at my Mom’s archivist tendencies—she set up a sorting station in the garage and has inspired my Dad to work through decades of his own papers. I’ve received texts with pictures that harken back to memories made before I was born. The most intriguing attic project in my view is the collection of Time magazines, however.
Gwen Wood has pared down the ‘attic archives’ in stages, seeming to have begun with my school papers. I’m not sure I could speak to a processing schedule, but she has since employed a neighborhood teen to help comb through the magazines. Tokumo normally assists Gwen with yard work and like me, he is of a quiet sort. I asked if I could write about their (still ongoing) experience since one of the issues they’ve set aside is an issue from April 25, 1983—a somber Senator Claude Pepper graces the cover.
I worked on a collection at the Claude Pepper Library and Museum at Florida State University and will likely reach out to Special Collections to see if they would like the issue. Further connections to existing collections might reveal themselves, yet I’m all too aware of the scenario in which donors are overconfident that their back issues will fulfill each prong of an archive’s collection policy. The Time issues appear to begin in 1968 and neither Gwen nor Tokumo are sure when the collection ends. I heard about this sorting in early April, about the time that my Mom switched to solo yard work and attic archives with Tokumo when he wasn’t going to school online.
Included above is a photo of the space, cheerfully adorned with a string of lights and series of boxes and cartons. And I’m aware that this archive may too be shuttered as we all enter into the scorching summer months. Until then, my parents’ attic issues of Time and Newsweek will see their first finding aids and the light of day after a prolonged retirement.
This is the sixth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
This is hard to write about because my journey starts remotely. I ended two jobs and started a new job from home. I didn’t have to transition to working from home and I wasn’t furloughed (thankfully). I don’t know what “normal” is because I haven’t experienced it yet.
At the beginning of March, I accepted an offer to be the librarian/archivist at a small academic library. After years of grad school and hundreds of job rejections, I finally got an offer. And it came right as Governor Cuomo put New York on “pause,”which left my transition from two part-time library jobs to one academic library/archives done completely remotely.
I had said goodbye to my old colleagues through emails, and texts, and said hello to new ones through Zoom chats. As awkward and disappointing as it was to do normal life events remotely (including my 30th birthday), I am incredibly fortunate to be able to transition so smoothly. The library director at my new job got me set up with a laptop and a couple of small collections I could work on at home.
This first impression of the library director was encouraging. They gave me the tools and support to feel connected while distant, be productive with limited resources, and be professional while wearing sweatpants. What made these actions impressive was that they were done during a pandemic. It would have been easy for the library to ask me to move my start date or even revoke the offer,but this simple act of doing the right thing gave me the impression that I was important and the archives are important.
The last time I stepped foot into the library was during my interview three months ago. Honestly, I don’t remember much except the overwhelming nerves that come with any interview and the rush of adrenaline afterward. While the library director has discussed the layout of the library a few times, I still don’t know where important places are like the archives, my office, the bathroom, or the library on campus.
Not only am I transitioning remotely from part-time jobs to a full-time position, but I’m also transitioning from graduate student/paraprofessional to professional. That transition is already packed with overwhelming emotions, but compressed with “working from home” it is even more difficult. The imposter syndrome hit me hard last week and along with another unreal emotion of temporariness. It’s difficult to explain and honestly, I’m not sure I can explain it.
Working from home doesn’t make working feel productive at all, and starting a new job from home feels like swimming in open waters where each task pulls you up and down like a wave. While everyone else cannot wait to be back in the office, I cannot wait to be in the office. I cannot wait for this awkward mindset of temporariness to be gone. And I cannot wait to master those waves.
bloggERS! is always proud to see our community of archivists learn new skills and progress to new roles that further not just their own careers, but also advance the systems and infrastructures that make up the digital preservation and electronic resources environment. Back in February, Artefactual announced a new hire and we knew we had to get the scoop. Here’s an interview with digital preservationist, Tessa Walsh (@bitarchivist), about her new role and how she has made an impact in the digital preservation landscape.
1. What is your role at Artefactual? What are you doing there now and into the future?
My job title is Software Developer. I spend most of my time on software development-related tasks like programming, reviewing other developers’ code, providing estimates and requirements analyses, writing documentation, helping with client support tasks, and providing training for external developers who want to contribute to Artefactual’s open source projects Archivematica and AtoM, for example through Artefactual’s new Archivematica Product Support Program.
Artefactual re-organized internally a few months ago and I am on the Project Development Team, a new group within the company that is focused on fixed term work—things like new feature development, data migrations, theming, analysis, and consulting. I work on many feature development projects, where we implement new Archivematica or AtoM features that are sponsored by clients and then include and support those new features in future public releases of the software. I write code, tests, and documentation, working closely with a systems archivist who manages communication with the client, refines the requirements for the project, and does quality assurance.
In some recent projects we’ve turned those requirements into “feature files” using the Gherkin syntax that can be used as the basis for automated tests. These automated tests help us improve and maintain Archivematica as a project, even as we make some big scalability and performance improvements that involve touching many parts of the codebase. I might also work with other developers and systems archivists, product managers, systems administrators, and others between the original idea for a feature and its inclusion in a public release. So far, I’ve mostly worked with Archivematica, but I’m looking forward to getting more familiar with AtoM as well.
2. What makes you interested in working with software development for digital preservation and archives?
In part, it’s that this niche is such a great confluence of many of my interests. I’m an archivist by training and I’m invested in carrying the cultural record forward with us for future generations and uses. I’ve also been a computer nerd for about as long as I can remember and find a lot of satisfaction in taking software apart and putting it together. In many ways I think my career has been about finding the right balance of these interests and putting myself where I can best contribute to the field of digital preservation. I want to make common digital preservation and curation tasks easier for people doing the work so that they can focus on the most challenging and important parts of their job, whether that’s figuring out a preservation approach for a difficult file format or doing the policy and advocacy work to firmly establish digital preservation as a core activity within an organization.
I started learning software development in earnest during my MLIS program at Simmons College and in the years after as Digital Archivist at the Canadian Centre for Architecture (CCA) from 2015-2018. This was motivated by personal interest for sure, but was also a reaction to my situation. As I worked on building the digital preservation program at the CCA and later at Concordia University Library, I kept hitting walls where tools I wanted for some basic preservation and curation functions didn’t exist. Or, where tools did exist but were borrowed from other fields and not built with archival users and use cases in mind.
Then and since, when I’ve run into this type of situation and had capacity, I’ve tried to make some of those missing tools and share them with the broader community as free and open source software. By way of example:
Brunnhilde, inspired by a similar project by my Artefactual colleague Ross Spencer, was a response to wanting a user-friendly high-level profiling tool for directories and disk images to help with appraisal, accessioning, and minimal processing.
METSFlask resulted from wanting to make it easier for me and others to browse through our Archivematica METS files and get details about the contents of our AIPs without having to read through very large XML files manually.
SCOPE, a collaboration of Artefactual and the CCA, started from a desire to let users browse and search through processed digital holdings, leveraging the descriptive and technical metadata in our finding aids and Archivematica, and download DIPs directly onto a reading room workstation for access without needing to go through complicated reference workflows.
Bulk Reviewer developed out of conversations at the BitCurator Users Forum a few years ago about wanting to improve workflows for identifying and managing sensitive information in digital archives by making better use of bulk_extractor reports.
As I got better as a developer, I also started to feel more comfortable contributing to bigger open source projects like Archivematica. Being a maintainer myself has really taught me the value of managing open source projects through well-organized communities, via companies like Artefactual that work hand-in-hand with users and member organizations like the BitCurator Consortium or Open Preservation Foundation.
3. Can you tell us about one project you’re working at Artefactual and why it’s exciting for you?
Right now I’m working on a couple new Archivematica features sponsored by Simon Fraser University Archives that I’m excited about, but I’m most excited about a relatively small change: an addition we’re making to the Archivematica transfer interface that allows users to choose the processing configuration they’d like to use with a transfer from a convenient dropdown list. In terms of lines of code this is a tiny feature but it will be a huge user experience improvement for one of the most common tasks for a large number of Archivematica users. I love projects like that because they get to the heart of my desire to make our tools easier and more pleasant to use.
4. What has been the easiest part of transitioning to working at Artefactual?
By far one of the best and easiest things about starting to work at Artefactual has been how well the company’s values and working practices align with my own. Artefactual embraces open source, “open by default”, and erring on the side of more communication, which are all important values for me as well. And, within the company, everyone is so nice and encouraging of each other. I came out as a trans woman recently, and started using they/them pronouns in the months leading up to coming out. Since day one I’ve gotten nothing but respect from my colleagues, and they have been so kind and supportive in relation to my transition. That really goes a long way to making the work week enjoyable!
It’s also so fun to work with other people who like me have one foot in software development land and another in archives and digital preservation. Other “developer-archivist” folks like Ashley Blewer and Ross Spencer, certainly, but not just the three of us. Since Artefactual attracts smart and curious people, many of my colleagues have both domain and technical expertise in lots of different areas that you might not necessarily expect from their job title alone. I’m learning new things from my new coworkers all the time and really enjoying that.
5. What has been the most difficult part of transitioning to working at Artefactual?
Starting a new job in the time of COVID-19 quarantine is strange and difficult. Artefactual has been flexible and generous with its employees in relation to the pandemic and it was my plan from the outset to work remotely from home, so I’ve been less disrupted than many others. But—as I try to remind myself and the people around me regularly—I’m still a human living through collective trauma in relative isolation. I’m not as productive as I normally would be and some days I never quite break through the attendant anxiety and grief. And that’s okay! We’re all doing the best we can in these times, and hopefully trying to take care of ourselves and uplift and help each other out as much as we can.
6. Can you recommend any tips to current archivists who want to get into the computational side of archiving/preservation?
This is a question I get a lot, especially from students and new professionals. I don’t think there are “right” answers, but here are some points that I come back to often:
Start with a project, not a technology: You’ll be much more motivated to learn if you’re working toward something that you care about. Yes, read that book or take that online class, but try to apply what you learn to something that interests you or will make something you have to do often easier. For new digital archivists, investing in learning some command line and bash or Python scripting basics can go a long way toward starting to automate repetitive workflows. If that sounds too boring, start by trying to make some digital art or a fun website, and then figure out how to apply it to your professional life later on (or not!).
Work in the open, invite feedback: Put your code on GitHub or Gitlab or another git hosting site with an open source license, write and present about what you’re doing, ask for help on Twitter or by email, be friendly and helpful with others.
Be patient with yourself: Learning new technology/programming languages is hard and non-linear and occasionally frustrating. When I get stuck on something in my work or learning, I often have to remind myself to step away, take a walk, get some sleep, and give my brain time to come around. 99% of the time when I do that, I end up being able to move past the issue much more quickly than if I just kept staring at it in frustration. And remember: even the most senior developers stop constantly to read the documentation or look up for the thousandth time what the syntax to do x isin a particular language. That’s the nature of the work, not a sign of your skill or aptitude.
7. Where do you see the future of digital preservation going?
I really hope that the future of digital preservation is more inclusive. By that, I mean less intimidating to new professionals, more embracing of new types of organizations and communities outside of the traditional “cultural heritage” bubble, and more diverse and inclusive as a community of practitioners. The archives, library, and digital preservation professions are very white. Bergis Jules spoke about the need to “confront the unbearable whiteness of our profession” in his 2016 NDSA keynote “Confronting Our Failure of Care Around the Legacies of Marginalized People in the Archives,” which should be required reading for anyone working in archives and digital preservation. Michelle Caswell reminded us again last year in her “Whose Digital Preservation?” keynote at iPRES 2019 that this is to the detriment of us all. We collectively and individually lose a lot (not least of which a representative, inclusive, justice-oriented historical record) when our professions are so homogenous. It’s also true that tech-focused “digital” positions that often come with higher salaries are disproportionately filled by men. I think a key part of moving digital preservation forward is addressing some of these structural issues around who is doing the work and how they are treated, by implementing better practices in our organizations, acknowledging and working to dismantle white supremacy in our personal spheres, and promoting and financing groups such as We Here, who support BIPOC archives and library workers.
I also want the future of digital preservation to be more sustainable. I co-authored a paper in a recent issue of American Archivist with Keith Pendergrass, Walker Sampson, and Laura Alagna, in which we suggest changes to our collective thinking around appraisal, permanence, and availability that could help move our profession toward a more sustainable future. We believe that responsibly preserving our cultural record for the future means doing our best not to contribute to trends that existentially threaten that future. I’ve been so happy to see that many of our colleagues in the field agree and have said that they plan to start explicitly considering environmental sustainability as a factor in digital preservation policies and in decisions on appraisal, file format migration policies, fixity checking practices, storage systems and providers, and methods of delivery, and other areas of our practice.
This isn’t a novel observation, but I think the future of digital preservation work is also going to be focused much more on software and dynamic web-based content, and less on static discrete documents that we can preserve natively as files. This is going to challenge us on technical, organizational, and theoretical levels, but I think it’ll be a great catalyst for growing our conceptual models and software tools in digital preservation and for promoting and proving the value of digital preservation broadly. And, I’m so happy there are folks like the Software Preservation Network who are anticipating these changes and doing a great job of laying the cultural, technological, and legal groundwork to prepare us for that future.
7. How do you pronounce “guymager”?
I say “GAI-mager” out of habit, since that’s what I first heard. But, I think that it’s named after its creator, who is French, so it really should be “GHEE-mager”. Considering the number of hours I’ve put into learning French since moving to Montréal in 2015, I should really do better!
Tessa Walsh is a Software Developer at Artefactual Systems. Previously, Tessa implemented digital preservation programmes at Concordia University Library and the Canadian Centre for Architecture as a Digital Preservation Librarian and Digital Archivist, respectively. She is a recipient of a 2019 NDSA Individual Innovation Award and was a 2018 Summer Fellow at the Library Innovation Lab at Harvard University. Tessa holds an MS in Library and Information Science from Simmons University and a BA in English from the University of Florida. In addition to her work at Artefactual, Tessa is the maintainer of several free and open source software projects that support digital preservation and curation activities, including Brunnhilde, Bulk Reviewer, and METSFlask.
Effective stewardship of digital archival materials and records requires that archivists and digital preservation professionals make decisions that are rooted in sustainability. As Ben Goldman observes in his 2018 essay, we find evidence in all aspects of our work of the classic definition of sustainability: “meeting the needs of the present without compromising the needs of the future.” It is therefore unsurprising, given growing concern about the impact of human activity on our climate and environment, that archivists are rallying around calls to evaluate the environmental sustainability of our work. The changing conditions related to climate change are in direct conflict with our ability to act as stewards of the collections in our care.
This series hopes to highlight current efforts in this area, acknowledge the challenges, and provide opportunities to learn from our peers. Maybe you work for an institution that has already taken steps, whether large or small, to address the environmental impact of digital preservation. Maybe you have encountered obstacles or resistance in the face of such changes. Maybe you have formed partnerships or developed resources to help advocate and support changes in relation to the sustainability of digital preservation. Whatever the case, we want to hear about it!
Writing for bloggERS! “Another Kind of Glacier” Series
We encourage visual representations: Posts can include or largely consist of comics, flowcharts, a series of memes, etc!
Written content should be roughly 600-800 words in length
Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
This is the fifth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
In my position as a librarian and an archivist, I never lack tasks or projects. What I love about my job is that, if I am tired of working on one project, I can always switch to another. This semester I also worked most evenings and weekends, contending with an overload of service commitments. I was hanging on until the end of March when my commitments would scale back.
In March, just as my university was making preparations to move all classes online, I got sick. I was out for over a week, and I emerged to a very different world. Conferences started cancelling, including several presentations I was preparing for. University service work came to a halt, and I began to work entirely from home.
Two things happened. First, all of my normal tasks and routines ended. Then my supervisor, knowing the difficulty I had fitting professional writing into my work life, told me to focus on writing. As a project-oriented introvert whose professional writing goals were neglected, this was a gift. Yet I didn’t anticipate that having this opportunity would be one of the most difficult tasks I have ever attempted to accomplish. Even though I am managing concerns about the virus and the economy fairly well, I have developed a sense of futility about my work and place in the universe, and I have now learned how skilled I am at avoiding writing. Trying to write makes me feel as though I am trying to swim through mud. I am not sure if that is due to my fear of writing, due to the psychological task of trying to write during a pandemic, or both.
Moving forward and being productive is a process of reinvention for me. Here are some things that are helping.
Creating a new daily routine. Routines are grounding, yet my “old” routine is useless. Questions I now have: should I change before beginning work? Is it okay to wake up, make coffee, and go straight to the computer? Does this help me feel like I’m in work mode? Creating a new pre-work and work schedule focuses my intent.
Setting goals: my norm is trying to creatively fit deadlines into limited time slots, like a puzzle. I am finding that without looming deadlines, I’ve lost the sense of urgency, and I need to set goals. One of the ways that I avoid writing is to continue researching, so I have had to set daily goals about what constitutes real progress.
Staying connected. Remaining connected, particularly meetings with coworkers and committees have been important to my sanity. Some meetings are entirely devoted to checking in. Some are routine meetings that provide a sense of normalcy and stability.
I know that my current position is a privileged one—not all information professionals, let alone all individuals, are able to work from home and receive a paycheck. Yet, this is my process, and I am mucking through it.
ArchivesSpace manages all archival description. Accession records and top level description for collections and file series are created directly in ArchivesSpace, while lower-level description, containers, locations, and digital objects are created using asInventory spreadsheets. Overnight, all modified published records are exported using exportPublicData.py and indexed into Solr using indexNewEAD.sh. This Solr index is read by ArcLight.
ArcLight provides discovery and display for archival description exported from ArchivesSpace. It uses URIs from ArchivesSpace digital objects to point to digital content in Hyrax while placing that content in the context of archival description. ArcLight is also really good at systems integration because it allows any system to query it through an unauthenticated API. This allows Hyrax and other tools to easily query ArcLight for description records.
Our preservation storage uses network shares managed by our university data center. We limit write access to the SIP and AIP storage directories to one service account used only by the server that runs the scheduled microservices. This means that only tested automated processes can create, edit, or delete SIPs and AIPs. Archivists have read-only access to these directories, which contain standard bags generated by BagIt-python that are validated against BagIt Profiles. Microservices also place a copy of all SIPs in a processing directory where archivists have full access to work directly with the files. These processing packages have specific subdirectories for master files, derivatives, and metadata. This allows other microservices to be run on them with just the package identifier. So, if you needed to batch create derivatives or metadata files, the microservices know which directories to look in.
The microservices themselves have built-in checks in place, such as they will make sure a valid AIP exists before deleting a SIP. The data center also has some low-level preservation features in place, and we are working to build additional preservation services that will run asynchronously from the rest of our processing workflows. This system is far from perfect, but it works for now, and at the end of the day, we are relying on the permanent positions in our department as well as in Library Systems and university IT to keep these files available long-term.
These microservices are the glue that keeps most of our workflows working together. Most of the links here point to code in our Github page, but we’re also trying to add public information on these processes to our documentation site.
This is a basic Python desktop app for managing lower-level description in ArchivesSpace through Excel spreadsheets using the API. Archivists can place a completed spreadsheet in a designated asInventory input directory and double-click an .exe file to add new archival objects to ArchivesSpace. A separate .exe can export all the child records from a resource or archival object identifier. The exported spreadsheets include the identifier for each archival object, container, and location, so we can easily roundtrip data from ArchivesSpace, edit it in Excel, and push the updates back into ArchivesSpace.
We have since built our born digital description workflow on top of asInventory. The spreadsheet has a “DAO” column and will create a digital object using a URI that is placed there. An archivist can describe digital records in a spreadsheet while adding Hyrax URLs that link to individual or groups of files.
We have been using asInventory for almost 3 years, and it does need some maintenance work. Shifting a lot of the code to the ArchivesSnake library will help make this easier, and I also hope to find a way to eliminate the need for a GUI framework so it runs just like a regular script.
The ArchivesSpace-ArcLight-Workflow Github repository is a set of scripts that keeps our systems connected and up-to-date. exportPublicData.py ensures that all published description in ArchivesSpace is exported each night, and indexNewEAD.sh indexes this description into Solr so it can be used by ArcLight. processNewUploads.py is the most complex process. This script takes all new digital objects uploaded through the Hyrax web interface, stores preservation copies as AIPs, and creates digital object records in ArchivesSpace that points to them. Part of what makes this step challenging is that Hyrax does not have an API, so the script uses Solr and a web scraper as a workaround.
These scripts sound complicated, but they have been relatively stable over the past year or so. I hope we can work on simplifying them too, by relying more on ArchivesSnake and moving some separate functions to other smaller microservices. One example is how the ASpace export script also adds a link for each collection to our website. We can simplify this by moving this task to a separate, smaller script. That way, when one script breaks or needs to be updated, it would not affect the other function.
These scripts process digital records by uploading metadata for them in our systems and moving them to our preservation storage.
ingest.py packages files as a SIP and optionally updates ArchivesSpace accession records by added dates and extents.
We have standard transfer folders for some campus offices with designated paths for new records and log files along with metadata about the transferring office. transferAccession.py runs ingest.py but uses the transfer metadata to create accession records and produces spreadsheet log files so offices can see what they transferred
confluence.py scrapes files from our campus’s Confluence wiki system, so for offices that use Confluence all I need is access to their page to periodically transfer records.
convertImages.py makes derivative files. This is mostly designed for image files, such as batch converting TIFFs to JPGs or PDFs.
listFiles.py is very handy. All it does is create a text file that lists all filenames and paths in a SIP. These can then be easily copied into a spreadsheet.
An archivist can arrange records by creating an asInventory spreadsheet that points to individual or groups of files. buildHyraxUpload.py then creates a TSV file for uploading these files to Hyrax with the relevant ArchivesSpace identifiers.
updateASpace.py takes the output TSV from uploading to Hyrax and updates the same inventory spreadsheets. These can then be uploaded back into ArchivesSpace which will create digital objects that point to Hyrax URLs.
These classes are extensions of the Bagit-python library. They contain a number of methods that are used by other microservices. This lets us easily create() or load() our specific SIP or AIP packages and add files to them. They also include complex things like getting a human-readable extent and date ranges from the filesystem. My favorite feature might be clean() which removes all Thumbs.db, desktop.ini, and .DS_Store files as the package is created.
Example use case
Wild records appear! A university staff member has placed records of the University Senate from the past year in a standard folder share used for transfers.
An archivist runs transferAccession.py, which creates an ArchivesSpace accession record using some JSON in the transfer folder and technical metadata from the filesystem (modified dates and digital extents). It then packages the files using BagIt-python and places one copy in the read-only SIP directory and a working copy in a processing directory.
For outside acquisitions, the archivists usually manually download, export, or image the materials and create an accession record manually. Then, ingest.py packages these materials and adds dates and extents to the accession records when possible.
The archivist makes derivative files for access or preservation. Since there is a designated derivatives directory in the processing package, the archivists can use a variety of manual tools or run other microservices using the package identifier. Scripts such as convertImages.py can batch convert or combine images and PDFs and otherscripts for processing email are still being developed.
The archivist then runs listFiles.py to get a list of file paths and copies them into an asInventory spreadsheet.
The archivist arranges the issues within the University Senate Records. They might create a new subseries and use that identifier in an asInventory spreadsheet to upload a list of files and then download them again to get a list of ref_ids.
The archivist runs buildHyraxUpload.py to create a tab-separated values (TSV) file for uploading files to Hyrax using the description and ref_ids from the asInventory spreadsheet.
After uploading the files to Hyrax, the archivist runs updateASpace.py to add the new Hyrax URLs to the same asInventory spreadsheet and uploads them back to ArchivesSpace. This creates new digital objects that point to Hyrax.
Successes and Challenges
Our set-up will always be a work in progress, and we hope to simplify, replace, or improve most of these processes over time. Since Hyrax and ArcLight have been in place for almost a year, we have noticed some aspects that are working really well and others that we still need to improve on.
I think the biggest success was customizing Hyrax to rely on description pulled from ArcLight. This has proven to be dependable and has allowed us to make significant amounts of born-digital and digitized materials available online without requiring detailed item-level metadata. Instead, we rely on high-level archival description and whatever information we can use at scale from the creator or the file system.
Suddenly we have a backlog. Since description is no longer the biggest barrier to making materials available, the holdup has been the parts of the workflow that require human intervention. Even though we are doing more with each action, large amounts of materials are still held up waiting for a human to process them. The biggest bottlenecks are working with campus offices and donors as well as arrangement and description.
There is also a ton of spreadsheets. I think this is a good thing, as we have discovered many cases where born-digital records come with some kind of existing description, but it often requires data cleaning and transformation. One collection came with authors, titles, and abstracts for each of a few thousand PDF files, but that metadata was trapped in hand-encoded HTML files from the 1990s. Spreadsheets are a really good tool for straddle the divide between automated and manual processes required to save this kind of metadata, and this is a comfortable environment for many archivists to work in.
You may have noticed, but the biggest needs we have now—donor relations, arrangement and description, metadata cleanup—are roles that archivists are really good and comfortable at. It turned out that once we had effective digital infrastructure in place, it created further demands on archivists and traditional archival processes.
This brings us to the biggest challenge we face now. Since our set-up often requires comfort on the command line, we have severely limited the number of archivists who can work on these materials and required non-archival skills to perform basic archival functions. We are trying to mitigate this in some respects by better distributing individual stages for each collection and providing more documentation. Still, this has clearly been a major flaw, as we need to meet users (in this case other archivists) where they are rather than place further demands on them.
Gregory Wiedeman is the university archivist in the M.E. Grenander Department of Special Collections & Archives at the University at Albany, SUNY where he helps ensure long-term access to the school’s public records. He oversees collecting, processing, and reference for the University Archives and supports the implementation and development of the department’s archival systems.