This is the sixth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
This is hard to write about because my journey starts remotely. I ended two jobs and started a new job from home. I didn’t have to transition to working from home and I wasn’t furloughed (thankfully). I don’t know what “normal” is because I haven’t experienced it yet.
At the beginning of March, I accepted an offer to be the librarian/archivist at a small academic library. After years of grad school and hundreds of job rejections, I finally got an offer. And it came right as Governor Cuomo put New York on “pause,”which left my transition from two part-time library jobs to one academic library/archives done completely remotely.
I had said goodbye to my old colleagues through emails, and texts, and said hello to new ones through Zoom chats. As awkward and disappointing as it was to do normal life events remotely (including my 30th birthday), I am incredibly fortunate to be able to transition so smoothly. The library director at my new job got me set up with a laptop and a couple of small collections I could work on at home.
This first impression of the library director was encouraging. They gave me the tools and support to feel connected while distant, be productive with limited resources, and be professional while wearing sweatpants. What made these actions impressive was that they were done during a pandemic. It would have been easy for the library to ask me to move my start date or even revoke the offer,but this simple act of doing the right thing gave me the impression that I was important and the archives are important.
The last time I stepped foot into the library was during my interview three months ago. Honestly, I don’t remember much except the overwhelming nerves that come with any interview and the rush of adrenaline afterward. While the library director has discussed the layout of the library a few times, I still don’t know where important places are like the archives, my office, the bathroom, or the library on campus.
Not only am I transitioning remotely from part-time jobs to a full-time position, but I’m also transitioning from graduate student/paraprofessional to professional. That transition is already packed with overwhelming emotions, but compressed with “working from home” it is even more difficult. The imposter syndrome hit me hard last week and along with another unreal emotion of temporariness. It’s difficult to explain and honestly, I’m not sure I can explain it.
Working from home doesn’t make working feel productive at all, and starting a new job from home feels like swimming in open waters where each task pulls you up and down like a wave. While everyone else cannot wait to be back in the office, I cannot wait to be in the office. I cannot wait for this awkward mindset of temporariness to be gone. And I cannot wait to master those waves.
bloggERS! is always proud to see our community of archivists learn new skills and progress to new roles that further not just their own careers, but also advance the systems and infrastructures that make up the digital preservation and electronic resources environment. Back in February, Artefactual announced a new hire and we knew we had to get the scoop. Here’s an interview with digital preservationist, Tessa Walsh (@bitarchivist), about her new role and how she has made an impact in the digital preservation landscape.
1. What is your role at Artefactual? What are you doing there now and into the future?
My job title is Software Developer. I spend most of my time on software development-related tasks like programming, reviewing other developers’ code, providing estimates and requirements analyses, writing documentation, helping with client support tasks, and providing training for external developers who want to contribute to Artefactual’s open source projects Archivematica and AtoM, for example through Artefactual’s new Archivematica Product Support Program.
Artefactual re-organized internally a few months ago and I am on the Project Development Team, a new group within the company that is focused on fixed term work—things like new feature development, data migrations, theming, analysis, and consulting. I work on many feature development projects, where we implement new Archivematica or AtoM features that are sponsored by clients and then include and support those new features in future public releases of the software. I write code, tests, and documentation, working closely with a systems archivist who manages communication with the client, refines the requirements for the project, and does quality assurance.
In some recent projects we’ve turned those requirements into “feature files” using the Gherkin syntax that can be used as the basis for automated tests. These automated tests help us improve and maintain Archivematica as a project, even as we make some big scalability and performance improvements that involve touching many parts of the codebase. I might also work with other developers and systems archivists, product managers, systems administrators, and others between the original idea for a feature and its inclusion in a public release. So far, I’ve mostly worked with Archivematica, but I’m looking forward to getting more familiar with AtoM as well.
2. What makes you interested in working with software development for digital preservation and archives?
In part, it’s that this niche is such a great confluence of many of my interests. I’m an archivist by training and I’m invested in carrying the cultural record forward with us for future generations and uses. I’ve also been a computer nerd for about as long as I can remember and find a lot of satisfaction in taking software apart and putting it together. In many ways I think my career has been about finding the right balance of these interests and putting myself where I can best contribute to the field of digital preservation. I want to make common digital preservation and curation tasks easier for people doing the work so that they can focus on the most challenging and important parts of their job, whether that’s figuring out a preservation approach for a difficult file format or doing the policy and advocacy work to firmly establish digital preservation as a core activity within an organization.
I started learning software development in earnest during my MLIS program at Simmons College and in the years after as Digital Archivist at the Canadian Centre for Architecture (CCA) from 2015-2018. This was motivated by personal interest for sure, but was also a reaction to my situation. As I worked on building the digital preservation program at the CCA and later at Concordia University Library, I kept hitting walls where tools I wanted for some basic preservation and curation functions didn’t exist. Or, where tools did exist but were borrowed from other fields and not built with archival users and use cases in mind.
Then and since, when I’ve run into this type of situation and had capacity, I’ve tried to make some of those missing tools and share them with the broader community as free and open source software. By way of example:
Brunnhilde, inspired by a similar project by my Artefactual colleague Ross Spencer, was a response to wanting a user-friendly high-level profiling tool for directories and disk images to help with appraisal, accessioning, and minimal processing.
METSFlask resulted from wanting to make it easier for me and others to browse through our Archivematica METS files and get details about the contents of our AIPs without having to read through very large XML files manually.
SCOPE, a collaboration of Artefactual and the CCA, started from a desire to let users browse and search through processed digital holdings, leveraging the descriptive and technical metadata in our finding aids and Archivematica, and download DIPs directly onto a reading room workstation for access without needing to go through complicated reference workflows.
Bulk Reviewer developed out of conversations at the BitCurator Users Forum a few years ago about wanting to improve workflows for identifying and managing sensitive information in digital archives by making better use of bulk_extractor reports.
As I got better as a developer, I also started to feel more comfortable contributing to bigger open source projects like Archivematica. Being a maintainer myself has really taught me the value of managing open source projects through well-organized communities, via companies like Artefactual that work hand-in-hand with users and member organizations like the BitCurator Consortium or Open Preservation Foundation.
3. Can you tell us about one project you’re working at Artefactual and why it’s exciting for you?
Right now I’m working on a couple new Archivematica features sponsored by Simon Fraser University Archives that I’m excited about, but I’m most excited about a relatively small change: an addition we’re making to the Archivematica transfer interface that allows users to choose the processing configuration they’d like to use with a transfer from a convenient dropdown list. In terms of lines of code this is a tiny feature but it will be a huge user experience improvement for one of the most common tasks for a large number of Archivematica users. I love projects like that because they get to the heart of my desire to make our tools easier and more pleasant to use.
4. What has been the easiest part of transitioning to working at Artefactual?
By far one of the best and easiest things about starting to work at Artefactual has been how well the company’s values and working practices align with my own. Artefactual embraces open source, “open by default”, and erring on the side of more communication, which are all important values for me as well. And, within the company, everyone is so nice and encouraging of each other. I came out as a trans woman recently, and started using they/them pronouns in the months leading up to coming out. Since day one I’ve gotten nothing but respect from my colleagues, and they have been so kind and supportive in relation to my transition. That really goes a long way to making the work week enjoyable!
It’s also so fun to work with other people who like me have one foot in software development land and another in archives and digital preservation. Other “developer-archivist” folks like Ashley Blewer and Ross Spencer, certainly, but not just the three of us. Since Artefactual attracts smart and curious people, many of my colleagues have both domain and technical expertise in lots of different areas that you might not necessarily expect from their job title alone. I’m learning new things from my new coworkers all the time and really enjoying that.
5. What has been the most difficult part of transitioning to working at Artefactual?
Starting a new job in the time of COVID-19 quarantine is strange and difficult. Artefactual has been flexible and generous with its employees in relation to the pandemic and it was my plan from the outset to work remotely from home, so I’ve been less disrupted than many others. But—as I try to remind myself and the people around me regularly—I’m still a human living through collective trauma in relative isolation. I’m not as productive as I normally would be and some days I never quite break through the attendant anxiety and grief. And that’s okay! We’re all doing the best we can in these times, and hopefully trying to take care of ourselves and uplift and help each other out as much as we can.
6. Can you recommend any tips to current archivists who want to get into the computational side of archiving/preservation?
This is a question I get a lot, especially from students and new professionals. I don’t think there are “right” answers, but here are some points that I come back to often:
Start with a project, not a technology: You’ll be much more motivated to learn if you’re working toward something that you care about. Yes, read that book or take that online class, but try to apply what you learn to something that interests you or will make something you have to do often easier. For new digital archivists, investing in learning some command line and bash or Python scripting basics can go a long way toward starting to automate repetitive workflows. If that sounds too boring, start by trying to make some digital art or a fun website, and then figure out how to apply it to your professional life later on (or not!).
Work in the open, invite feedback: Put your code on GitHub or Gitlab or another git hosting site with an open source license, write and present about what you’re doing, ask for help on Twitter or by email, be friendly and helpful with others.
Be patient with yourself: Learning new technology/programming languages is hard and non-linear and occasionally frustrating. When I get stuck on something in my work or learning, I often have to remind myself to step away, take a walk, get some sleep, and give my brain time to come around. 99% of the time when I do that, I end up being able to move past the issue much more quickly than if I just kept staring at it in frustration. And remember: even the most senior developers stop constantly to read the documentation or look up for the thousandth time what the syntax to do x isin a particular language. That’s the nature of the work, not a sign of your skill or aptitude.
7. Where do you see the future of digital preservation going?
I really hope that the future of digital preservation is more inclusive. By that, I mean less intimidating to new professionals, more embracing of new types of organizations and communities outside of the traditional “cultural heritage” bubble, and more diverse and inclusive as a community of practitioners. The archives, library, and digital preservation professions are very white. Bergis Jules spoke about the need to “confront the unbearable whiteness of our profession” in his 2016 NDSA keynote “Confronting Our Failure of Care Around the Legacies of Marginalized People in the Archives,” which should be required reading for anyone working in archives and digital preservation. Michelle Caswell reminded us again last year in her “Whose Digital Preservation?” keynote at iPRES 2019 that this is to the detriment of us all. We collectively and individually lose a lot (not least of which a representative, inclusive, justice-oriented historical record) when our professions are so homogenous. It’s also true that tech-focused “digital” positions that often come with higher salaries are disproportionately filled by men. I think a key part of moving digital preservation forward is addressing some of these structural issues around who is doing the work and how they are treated, by implementing better practices in our organizations, acknowledging and working to dismantle white supremacy in our personal spheres, and promoting and financing groups such as We Here, who support BIPOC archives and library workers.
I also want the future of digital preservation to be more sustainable. I co-authored a paper in a recent issue of American Archivist with Keith Pendergrass, Walker Sampson, and Laura Alagna, in which we suggest changes to our collective thinking around appraisal, permanence, and availability that could help move our profession toward a more sustainable future. We believe that responsibly preserving our cultural record for the future means doing our best not to contribute to trends that existentially threaten that future. I’ve been so happy to see that many of our colleagues in the field agree and have said that they plan to start explicitly considering environmental sustainability as a factor in digital preservation policies and in decisions on appraisal, file format migration policies, fixity checking practices, storage systems and providers, and methods of delivery, and other areas of our practice.
This isn’t a novel observation, but I think the future of digital preservation work is also going to be focused much more on software and dynamic web-based content, and less on static discrete documents that we can preserve natively as files. This is going to challenge us on technical, organizational, and theoretical levels, but I think it’ll be a great catalyst for growing our conceptual models and software tools in digital preservation and for promoting and proving the value of digital preservation broadly. And, I’m so happy there are folks like the Software Preservation Network who are anticipating these changes and doing a great job of laying the cultural, technological, and legal groundwork to prepare us for that future.
7. How do you pronounce “guymager”?
I say “GAI-mager” out of habit, since that’s what I first heard. But, I think that it’s named after its creator, who is French, so it really should be “GHEE-mager”. Considering the number of hours I’ve put into learning French since moving to Montréal in 2015, I should really do better!
Tessa Walsh is a Software Developer at Artefactual Systems. Previously, Tessa implemented digital preservation programmes at Concordia University Library and the Canadian Centre for Architecture as a Digital Preservation Librarian and Digital Archivist, respectively. She is a recipient of a 2019 NDSA Individual Innovation Award and was a 2018 Summer Fellow at the Library Innovation Lab at Harvard University. Tessa holds an MS in Library and Information Science from Simmons University and a BA in English from the University of Florida. In addition to her work at Artefactual, Tessa is the maintainer of several free and open source software projects that support digital preservation and curation activities, including Brunnhilde, Bulk Reviewer, and METSFlask.
Effective stewardship of digital archival materials and records requires that archivists and digital preservation professionals make decisions that are rooted in sustainability. As Ben Goldman observes in his 2018 essay, we find evidence in all aspects of our work of the classic definition of sustainability: “meeting the needs of the present without compromising the needs of the future.” It is therefore unsurprising, given growing concern about the impact of human activity on our climate and environment, that archivists are rallying around calls to evaluate the environmental sustainability of our work. The changing conditions related to climate change are in direct conflict with our ability to act as stewards of the collections in our care.
This series hopes to highlight current efforts in this area, acknowledge the challenges, and provide opportunities to learn from our peers. Maybe you work for an institution that has already taken steps, whether large or small, to address the environmental impact of digital preservation. Maybe you have encountered obstacles or resistance in the face of such changes. Maybe you have formed partnerships or developed resources to help advocate and support changes in relation to the sustainability of digital preservation. Whatever the case, we want to hear about it!
Writing for bloggERS! “Another Kind of Glacier” Series
We encourage visual representations: Posts can include or largely consist of comics, flowcharts, a series of memes, etc!
Written content should be roughly 600-800 words in length
Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
This is the fifth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
In my position as a librarian and an archivist, I never lack tasks or projects. What I love about my job is that, if I am tired of working on one project, I can always switch to another. This semester I also worked most evenings and weekends, contending with an overload of service commitments. I was hanging on until the end of March when my commitments would scale back.
In March, just as my university was making preparations to move all classes online, I got sick. I was out for over a week, and I emerged to a very different world. Conferences started cancelling, including several presentations I was preparing for. University service work came to a halt, and I began to work entirely from home.
Two things happened. First, all of my normal tasks and routines ended. Then my supervisor, knowing the difficulty I had fitting professional writing into my work life, told me to focus on writing. As a project-oriented introvert whose professional writing goals were neglected, this was a gift. Yet I didn’t anticipate that having this opportunity would be one of the most difficult tasks I have ever attempted to accomplish. Even though I am managing concerns about the virus and the economy fairly well, I have developed a sense of futility about my work and place in the universe, and I have now learned how skilled I am at avoiding writing. Trying to write makes me feel as though I am trying to swim through mud. I am not sure if that is due to my fear of writing, due to the psychological task of trying to write during a pandemic, or both.
Moving forward and being productive is a process of reinvention for me. Here are some things that are helping.
Creating a new daily routine. Routines are grounding, yet my “old” routine is useless. Questions I now have: should I change before beginning work? Is it okay to wake up, make coffee, and go straight to the computer? Does this help me feel like I’m in work mode? Creating a new pre-work and work schedule focuses my intent.
Setting goals: my norm is trying to creatively fit deadlines into limited time slots, like a puzzle. I am finding that without looming deadlines, I’ve lost the sense of urgency, and I need to set goals. One of the ways that I avoid writing is to continue researching, so I have had to set daily goals about what constitutes real progress.
Staying connected. Remaining connected, particularly meetings with coworkers and committees have been important to my sanity. Some meetings are entirely devoted to checking in. Some are routine meetings that provide a sense of normalcy and stability.
I know that my current position is a privileged one—not all information professionals, let alone all individuals, are able to work from home and receive a paycheck. Yet, this is my process, and I am mucking through it.
ArchivesSpace manages all archival description. Accession records and top level description for collections and file series are created directly in ArchivesSpace, while lower-level description, containers, locations, and digital objects are created using asInventory spreadsheets. Overnight, all modified published records are exported using exportPublicData.py and indexed into Solr using indexNewEAD.sh. This Solr index is read by ArcLight.
ArcLight provides discovery and display for archival description exported from ArchivesSpace. It uses URIs from ArchivesSpace digital objects to point to digital content in Hyrax while placing that content in the context of archival description. ArcLight is also really good at systems integration because it allows any system to query it through an unauthenticated API. This allows Hyrax and other tools to easily query ArcLight for description records.
Our preservation storage uses network shares managed by our university data center. We limit write access to the SIP and AIP storage directories to one service account used only by the server that runs the scheduled microservices. This means that only tested automated processes can create, edit, or delete SIPs and AIPs. Archivists have read-only access to these directories, which contain standard bags generated by BagIt-python that are validated against BagIt Profiles. Microservices also place a copy of all SIPs in a processing directory where archivists have full access to work directly with the files. These processing packages have specific subdirectories for master files, derivatives, and metadata. This allows other microservices to be run on them with just the package identifier. So, if you needed to batch create derivatives or metadata files, the microservices know which directories to look in.
The microservices themselves have built-in checks in place, such as they will make sure a valid AIP exists before deleting a SIP. The data center also has some low-level preservation features in place, and we are working to build additional preservation services that will run asynchronously from the rest of our processing workflows. This system is far from perfect, but it works for now, and at the end of the day, we are relying on the permanent positions in our department as well as in Library Systems and university IT to keep these files available long-term.
These microservices are the glue that keeps most of our workflows working together. Most of the links here point to code in our Github page, but we’re also trying to add public information on these processes to our documentation site.
This is a basic Python desktop app for managing lower-level description in ArchivesSpace through Excel spreadsheets using the API. Archivists can place a completed spreadsheet in a designated asInventory input directory and double-click an .exe file to add new archival objects to ArchivesSpace. A separate .exe can export all the child records from a resource or archival object identifier. The exported spreadsheets include the identifier for each archival object, container, and location, so we can easily roundtrip data from ArchivesSpace, edit it in Excel, and push the updates back into ArchivesSpace.
We have since built our born digital description workflow on top of asInventory. The spreadsheet has a “DAO” column and will create a digital object using a URI that is placed there. An archivist can describe digital records in a spreadsheet while adding Hyrax URLs that link to individual or groups of files.
We have been using asInventory for almost 3 years, and it does need some maintenance work. Shifting a lot of the code to the ArchivesSnake library will help make this easier, and I also hope to find a way to eliminate the need for a GUI framework so it runs just like a regular script.
The ArchivesSpace-ArcLight-Workflow Github repository is a set of scripts that keeps our systems connected and up-to-date. exportPublicData.py ensures that all published description in ArchivesSpace is exported each night, and indexNewEAD.sh indexes this description into Solr so it can be used by ArcLight. processNewUploads.py is the most complex process. This script takes all new digital objects uploaded through the Hyrax web interface, stores preservation copies as AIPs, and creates digital object records in ArchivesSpace that points to them. Part of what makes this step challenging is that Hyrax does not have an API, so the script uses Solr and a web scraper as a workaround.
These scripts sound complicated, but they have been relatively stable over the past year or so. I hope we can work on simplifying them too, by relying more on ArchivesSnake and moving some separate functions to other smaller microservices. One example is how the ASpace export script also adds a link for each collection to our website. We can simplify this by moving this task to a separate, smaller script. That way, when one script breaks or needs to be updated, it would not affect the other function.
These scripts process digital records by uploading metadata for them in our systems and moving them to our preservation storage.
ingest.py packages files as a SIP and optionally updates ArchivesSpace accession records by added dates and extents.
We have standard transfer folders for some campus offices with designated paths for new records and log files along with metadata about the transferring office. transferAccession.py runs ingest.py but uses the transfer metadata to create accession records and produces spreadsheet log files so offices can see what they transferred
confluence.py scrapes files from our campus’s Confluence wiki system, so for offices that use Confluence all I need is access to their page to periodically transfer records.
convertImages.py makes derivative files. This is mostly designed for image files, such as batch converting TIFFs to JPGs or PDFs.
listFiles.py is very handy. All it does is create a text file that lists all filenames and paths in a SIP. These can then be easily copied into a spreadsheet.
An archivist can arrange records by creating an asInventory spreadsheet that points to individual or groups of files. buildHyraxUpload.py then creates a TSV file for uploading these files to Hyrax with the relevant ArchivesSpace identifiers.
updateASpace.py takes the output TSV from uploading to Hyrax and updates the same inventory spreadsheets. These can then be uploaded back into ArchivesSpace which will create digital objects that point to Hyrax URLs.
These classes are extensions of the Bagit-python library. They contain a number of methods that are used by other microservices. This lets us easily create() or load() our specific SIP or AIP packages and add files to them. They also include complex things like getting a human-readable extent and date ranges from the filesystem. My favorite feature might be clean() which removes all Thumbs.db, desktop.ini, and .DS_Store files as the package is created.
Example use case
Wild records appear! A university staff member has placed records of the University Senate from the past year in a standard folder share used for transfers.
An archivist runs transferAccession.py, which creates an ArchivesSpace accession record using some JSON in the transfer folder and technical metadata from the filesystem (modified dates and digital extents). It then packages the files using BagIt-python and places one copy in the read-only SIP directory and a working copy in a processing directory.
For outside acquisitions, the archivists usually manually download, export, or image the materials and create an accession record manually. Then, ingest.py packages these materials and adds dates and extents to the accession records when possible.
The archivist makes derivative files for access or preservation. Since there is a designated derivatives directory in the processing package, the archivists can use a variety of manual tools or run other microservices using the package identifier. Scripts such as convertImages.py can batch convert or combine images and PDFs and otherscripts for processing email are still being developed.
The archivist then runs listFiles.py to get a list of file paths and copies them into an asInventory spreadsheet.
The archivist arranges the issues within the University Senate Records. They might create a new subseries and use that identifier in an asInventory spreadsheet to upload a list of files and then download them again to get a list of ref_ids.
The archivist runs buildHyraxUpload.py to create a tab-separated values (TSV) file for uploading files to Hyrax using the description and ref_ids from the asInventory spreadsheet.
After uploading the files to Hyrax, the archivist runs updateASpace.py to add the new Hyrax URLs to the same asInventory spreadsheet and uploads them back to ArchivesSpace. This creates new digital objects that point to Hyrax.
Successes and Challenges
Our set-up will always be a work in progress, and we hope to simplify, replace, or improve most of these processes over time. Since Hyrax and ArcLight have been in place for almost a year, we have noticed some aspects that are working really well and others that we still need to improve on.
I think the biggest success was customizing Hyrax to rely on description pulled from ArcLight. This has proven to be dependable and has allowed us to make significant amounts of born-digital and digitized materials available online without requiring detailed item-level metadata. Instead, we rely on high-level archival description and whatever information we can use at scale from the creator or the file system.
Suddenly we have a backlog. Since description is no longer the biggest barrier to making materials available, the holdup has been the parts of the workflow that require human intervention. Even though we are doing more with each action, large amounts of materials are still held up waiting for a human to process them. The biggest bottlenecks are working with campus offices and donors as well as arrangement and description.
There is also a ton of spreadsheets. I think this is a good thing, as we have discovered many cases where born-digital records come with some kind of existing description, but it often requires data cleaning and transformation. One collection came with authors, titles, and abstracts for each of a few thousand PDF files, but that metadata was trapped in hand-encoded HTML files from the 1990s. Spreadsheets are a really good tool for straddle the divide between automated and manual processes required to save this kind of metadata, and this is a comfortable environment for many archivists to work in.
You may have noticed, but the biggest needs we have now—donor relations, arrangement and description, metadata cleanup—are roles that archivists are really good and comfortable at. It turned out that once we had effective digital infrastructure in place, it created further demands on archivists and traditional archival processes.
This brings us to the biggest challenge we face now. Since our set-up often requires comfort on the command line, we have severely limited the number of archivists who can work on these materials and required non-archival skills to perform basic archival functions. We are trying to mitigate this in some respects by better distributing individual stages for each collection and providing more documentation. Still, this has clearly been a major flaw, as we need to meet users (in this case other archivists) where they are rather than place further demands on them.
Gregory Wiedeman is the university archivist in the M.E. Grenander Department of Special Collections & Archives at the University at Albany, SUNY where he helps ensure long-term access to the school’s public records. He oversees collecting, processing, and reference for the University Archives and supports the implementation and development of the department’s archival systems.
This is the fourth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
Working from home has its challenges, as many of us have found lately. It is certainly a privilege now to be able to work from home and to remain employed. In some cases, however, this might be an opportunity to disrupt a cycle of toxicity in the work environment. In my case, workplace hostility has been extremely stressful and taken its toll on my mental health, which has in turn affected my physical health. I am looking forward to realigning my focus on what matters in life—myself and my loved ones.
I have not been fully present for family or friends as I would like to be, and I need to feel complete to be able to be there for them. Working from home, now, I am trying to focus on eating well and sleeping again. Last night I slept the entire night without medication. I woke up refreshed, made myself coffee, and felt good enough to start a new writing project. I have been so lost and absorbed with the toxic environment at work that I had been stuck in a loop, ruminating over and over on situations, thinking about the retaliation at my workplace and whether or not I should act on it. I was anxious about going back to work, about sobbing in the bathroom stall in a public restroom, or shedding more tears in front of my colleagues or my supervisor. I have continuously turned words over in my mind, telling myself that I was valid, that I have value to the profession, and that others will recognize my contributions even if they aren’t shared by my supervisors.
Sometimes, we hope that an academic community can save us. We hope that academia understands that we are all colleagues working towards a common goal. We hope that academic freedom and its ideals is a humanist approach, and those who work within its comforts will receive understanding and ethical treatment. I thought that academia would provide me with benefits that I never had, and it delivers health insurance and other benefits which can be hard to secure otherwise. It does not always guarantee respect, of course. I did not realize that academia can be horribly competitive. My experience has led me to consider finding employment outside of academia, although I am sure any economic sector can be similarly toxic. I have considered leaving the profession altogether, and I still consider it.
Right now, I am hoping to find myself again. I have become completely self-absorbed in my situation, which is unfortunate, but it is a cycle that is very hard to break unless something breaks it for you. As I said, we are certainly privileged right now to be able to work at all, especially to work from home. The epidemic has been awful for the larger community, so I hope to do what I can to make myself complete so that I can engage with the larger world again and, hopefully, to be more effective in change that will benefit the larger community as well as myself.
Archivists have made major progress using disk imaging to safely move content off of floppy disks and other external media. These workflows have the best tools with the most complete documentation. Yet external media only makes up a portion of born-digital records and there is less guidance on processing and processing and providing access to other types of digital content. At UAlbany, we certainly have the disk-in-a-box problem that most repositories face, but partly because of our institutional context, we have found that this aspect has become a minor part of “our set-up.”
Most of our born-digital accessions now come in over network shares, cloud storage, or exports from web applications. Additionally, we take in a lot of born-digital content that can be made publically available now, without restrictions. Some of this is because we’ve been taking in a lot of institutional records, which are public records (think minutes, formal reports, and publications), but there is also a surprising amount of unrestricted digital material we collect from outside groups, like political advocacy and activist groups. Since we also digitize many records that need to be described and managed, we needed our infrastructure to support them as well. So, one of the major factors in developing our digital processing infrastructure is that we hope that we can make many records available online soon after we acquire them.
What all that means is that our set-up for processing both born-digital and digitized records is more of an ecosystem than a desktop station or single system. This is great, but makes it difficult to overview, and the result was a bit too long for a single post. So, the BloggERS Editorial Board and I decided to split it into two posts. The first part will focus on the theoretical and technical foundation, the principles behind our decision-making and the servers that run all off the systems. The second post will summarize all the difference systems and microservices we use, provide a sample use-case of how they all work together, and discuss what we’ve learned since all this has been in place and the challenges that remain.
Use one archival descriptive system
Most systems built for managing digital content use bibliographic-style description which makes them challenging to integrate into archival processes. They assume each “thing” gets a record with a series of elements or statements, much like the archetypical library catalog. This really includes everything from Digital Asset Management Systems or “Digital Repositories” down to DFXML. Archival description instead describes groups of things at progressive levels of detail. Since we use archival description for paper materials, using one system means applying archival description to digital records as well.
In practice, this means that every digital object is connected to archival description managed by ArchivesSpace. This does not mean that we list each PDF or disk image, but merely that there is a record somewhere in ArchivesSpace that refers to it. This can be anywhere from a collection-level record that includes a big pile of disk images, or an item-level record that refers to an individual PDF. The identifier from ArchivesSpace can then help provide intellectual control without having to describe every digital file.
Some digital objects get additional detailed item-level metadata, while others rely on an identifier to pull in description of records in ArchivesSpace. Our repository assumes everyone uses a full set of Dublin Core or MODS elements out-of-the-box, but we needed most objects to rely only on the ASpace identifier. So we had to modify our repository to both be able to use less descriptive metadata and to link to metadata records in other systems.
Build networks of systems and limit their use to what they are really good at
We try to keep our systems as boilerplate as possible, avoiding customizations and using their default processes. By systems, I mean software applications, such as ArchivesSpace, Hyrax, DSpace, Solr, or Fedora. These applications might span multiple servers, or multiple systems can run together on a single server. Most systems are really good at performing their core functions, but get less effective the more edge cases you ask them to do. Systems are also challenging to maintain, and sticking to the “main line” that everyone else uses ensures that we will have the easiest possible upgrade path.
This means we use ArchivesSpace for managing archival description, ArcLight for displaying description, and Hyrax as a digital repository. We only ask them to do these things, and use smaller tools to perform other functions. We use asInventory as a spreadsheet tool for listing archival materials instead of ArchivesSpace, ArcLight instead of the ArchivesSpace Public User Interface for discovery and display, and network storage for preservation instead of relying on Hyrax and Fedora.
When we need to adapt systems to local practices, instead of customizing them, we try to bring the customizations outside the boundaries of the system and instead rely on their openness and API connections. We create or adapt what I am calling “microservices” to fulfil these local custom functions. These are small, as-simple-as-possible tools that each perform one specific function. Theoretically, at least, they are easy to build and might not be designed to be maintained. Instead, we will adapt or replace them with another super-simple tool when they get problematic or are no longer useful. Microservices do not store or manage data themselves, so when (not if) they stop working, we are not relying on them to immediately fulfil core functions. We will still have to replace them, but we will not have to drop everything and scramble to fix them to serve the next user who walks in the physical or virtual door. In this way, microservices are sort of like the sacrificial anodes of our digital infrastructure.
Computers don’t preserve bits, people preserve bits
Digital preservation is not something any software can do by itself. Preservation requires human attention and labor. There is no system where you can merely ingest your content to preserve it. Instead of relying on a single ideal “preservation” system, our approach is to get digital content onto “good enough” storage and plan to actively manage and maintain it over time. Preservation systems are tools and their effectiveness depends on their context and use.
While we use Fedora Commons as part of the Hyrax/Samvera stack, we do not consider our instance to be a preservation system and we do not use it as such. Hyrax is really complex and challenging to manage, particularly since it stores data in multiple places: in Fedora, in a database, and on a server’s hard disk. Were we to rely on Hyrax as our preservation system, my biggest fear would be that the database and Fedora get out of sync which will prevent Hyrax from booting or using the Rails console. In this scary scenario, Fedora would still manage the digital content and metadata, but we’d have to try and piece together what “wd/37/6c/90/wd376c90g” means and how it connects to the human-readable metadata.
Instead, we use Hyrax as only an access system. We keep all master files and metadata in standard packages on network shares using BagIt-python. Preservation copies, like uncompressed TIFFs and WAVs, are not uploaded to Hyrax in order to limit data duplication, as we make derivatives prior to ingest. When we add metadata through Hyrax, a microservice adds it to the preservation storage overnight. This preserves the master copies in an environment we are more confident that we can maintain, as simplicity might be more important for preservation than complex functionality. It also lowers the stakes for maintaining Hyrax, as we don’t risk losing materials if something goes wrong.
These are all the servers we use to process and manage digital records. They are all virtual servers that live in the university data center. Our Library Systems department services the Windows servers, and university IT supports the Linux servers.
ArchivesSpace production server
Oracle Linux server, 2 core, 6GB RAM
Runs ArchivesSpace and MySQL
ArchivesSpace development server
Oracle Linux server, 2 core, 4GB RAM
Runs ArchivesSpace and MySQL
Ruby on Rails production server
Oracle Linux, 4 core, 8GB RAM
Runs applications that use the Ruby on Rails web application framework, including ArcLight, Hyrax, Bento search, and Jekyll website
Serves the Ruby on Rails environment using Passenger and nginx as the webserver
Ruby on Rails development server
Oracle Linux, 4 core, 8GB RAM
Has duplicate development instances of all Ruby on Rails-based applications
Runs scheduled microservices
Windows Server 12GB RAM
Runs the Solr search engine application which is used by ArcLight and Hyrax
Windows Server, 2 core, 4GB RAM
Runs Fedora using Apache Tomcat to support Hyrax
Postgres database server
Supports Hyrax and Fedora
I know this list can be very intimidating for many small and medium-size repositories who can struggle to find the support for even one web application, but I think it’s important to be about the technology required to process and actually provide access to digital records. We don’t want to make promises to our donors that we can’t fulfil. There are many administrators, donors, and IT staff members that don’t assume archival repositories require this technology. Even at a major research university, we had to spend years changing the culture of expectations to put these tools in place. I hope that if archivists can be more transparent about these requirements we can help each other make the case for more support.
Gregory Wiedeman is the university archivist in the M.E. Grenander Department of Special Collections & Archives at the University at Albany, SUNY where he helps ensure long-term access to the school’s public records. He oversees collecting, processing, and reference for the University Archives and supports the implementation and development of the department’s archival systems.
This is the third of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
by Sara Mouch
I would say that making the move from campus to working from home has shifted priorities for me as the University of Toledo Archivist at the Ward M. Canaday Center for Special Collections, but everything I do feels like top priority. We’re a large repository (upwards of 10,000 linear feet of collections) with a small shop (2 full-time archivists), so there is never a dearth of things to do, and I claim that all of those things are equally important. However, some of those priorities are currently impossible to address, such as prompt reference assistance, physical collections processing, and digitization projects. Those are priorities that, when on campus, are most present and pressing, distracting from all the others. I have no such distractions now, and the ability to focus on those other, all so important, but neglected priorities is liberating. I’m a troubleshooter by nature and I’m in a position right now to spend more time than usual untangling problems, such as finding aid errors and poor data management. Quality control takes center stage and, frankly, I love the tedious clean-up, the moments when I can afford perfection over progress, an impossibility in most areas of archives management. From my professional standpoint, this is the light amidst all the darkness.
The limitations inherent in the inaccessibility of physical collections has, however, brought concerns to the forefront. Even the remotest possibility of installing an exhibit (as we do annually) for 2020 reduces to nonexistent without the ability to curate and prepare items for display. Our ability to serve researchers is only as good as those collections that are available, in part or in full, in our digital repository. The resulting suspension of archives orientation sessions due to the move to online classes means that we can’t put history in the hands of our students. These concerns, however temporary, loom and linger, even as I’m thrilled to have the luxury to learn the vagaries of ArchivesSpace.
But we readjust, and those limitations become opportunities. We can create online guides to collections earmarked for representation in our exhibit. Digitization needs become clearer and we can re-prioritize the scanning of collections destined for the digital repository, even if the actual scanning must wait. Finally, just as teaching faculty must adapt to reaching their students remotely, so too must the archivists who serve both. An online presentation regarding archival research may not have the same impact as interacting with tangible objects and records, but hopefully will convey that history exists in many formats, that archives are a great resource for research and connection, and archivists want to meet students where they are.
The Alexander Turnbull Library holds the archives and special collections for the National Library of New Zealand Te Puna Mātauranga o Aotearoa (NLNZ). While digital materials have existed in the Turnbull Library’s collections since the 1980s, the National Library began to formalise its digital collecting and digital preservation policies in the early 2000s, and established the first Digital Archivist roles in New Zealand. In 2008, the National Library launched the National Digital Heritage Archive (NDHA), which now holds 27 million files spanning across 222 different formats, and consisting of 311 Terabytes.
Since the launch of the NDHA, there has been a marked increase in the size and complexity of incoming digital collections. Collections currently come to the Library on a combination of obsolete and contemporary media, as well as electronic transfer, such as email or File Transfer Protocol (FTP).
Digital Archivists’ workstation setup and equipment
Most staff at the National Library use either a Windows 10 Microsoft Surface Pro or HP EliteBook i5 at a docking station with two monitors to allow for flexibility in where they work. However, the Library’s Digital Archivists have specialised setups to support their work with large digital collections. The computers and workstations below are listed in order of frequency of usage.
Computers and workstations
HP Z200 i7 workstation tower
The Digital Archivists’ main ingest and processing device is an HP Desktop Z220 i7 workstation tower for processing digital collections. The Z220s have a built-in read/write optical disc drive, as well as USB and FireWire ports.
HP Elitebook i7
The device we use second most-frequently is an HP Elitebook i7, which we use for electronic transfers of contemporary content. Our web archivists also use these for harvesting websites and running social media crawls. As there are only a handful of digital archivists in Aotearoa New Zealand, we do a significant amount of training and outreach to archives and organisations that don’t have a dedicated digital specialist on staff. Having a portable device as well as our desktop setups is extremely useful for meetings and workshops offsite.
MacBook Pro 15inch, 2017
The Alexander Turnbull Library is a collecting institution, and we often receive creative works from authors, composers, and artists. We regularly encounter portable hard drives, floppy disks, zip disks, and even optical discs which have been formatted for a Mac operating system, and are incompatible with our corporate Windows machines. And so, MacBook Pro to the rescue! Unfortunately, the MacBook Pro only has ports for USB-C, so we keep several USB-C to USB adapters on hand. The MacBook has access to staff wifi, but is not connected to the corporate network. We’ve recently begun to investigate using HFS+ for Windows software in order to be able to see Macintosh file structures on our main ingest PCs.
Digital Intelligence FRED Forensic Workstation
If we can’t read content on either or corporate machines or the MacBook Pro, then our friend FRED is our next port of call. FRED is a forensic recovery of evidence device, and includes a variety of ports and drives with write blockers built in. We have a 5.25 inch floppy disk drive attached to the FRED, and also use it to mount internal hard drives removed from donated computers and laptops. We don’t create disk images by default on our other workstations, but if a collection is tricky enough to merit the FRED, we will create disk images for it, generally using FTK Imager. The FRED has its own isolated network connection separate from the corporate network so we can analyse high risk materials without compromising the Library’s security.
Adjacent to the FRED, we had an additional non-networked PC (also an HP Z200 i7 workstation tower) where we can analyse materials, download software, test scripts, and generally experiment separate from the corporate network. These are currently still operating under a Windows 7 build, as some of the drivers we use with legacy media carriers were not compatible with the Windows 10 during the initial testing and rollout of Windows 10 devices to Library staff.
Over the years, the Library has collected vintage computers with a variety of hardware and software capabilities and each machine offers different applications and tools in order to help us process and research legacy digital collections. We are also sometimes gifted computers from donors in order to support the processing of their legacy files, and allow us to see exactly what software and programmes they used, and their file systems.
Kyroflux (located at Archives New Zealand)
And for the really tricky legacy media, we are fortunate to be able to call on our colleagues down the road at Archives New Zealand Te Rua Mahara o te Kāwanatanga, who have a Kyroflux set up in their digital preservation lab to read 3.5 inch and 5.25 inch floppy disks. We recently went over there to try to image a set of double sided, double density, 3.5 inch Macintosh floppy disks from 1986-1989 that we had been unable to read on our legacy Power Macintosh 7300/180. We were able to create disk image files for them using the Kryoflux, but unfortunately, the disks contained bad sectors so we weren’t able to render the files from them.
Drives and accessories
In addition to our hardware and workstation setup, we use a variety of drives and accessories to aid in processing of born-digital materials.
Tableau Forensic USB 3.0 Bridge write blocker
3.5 inch floppy drive
5.25 inch floppy drive
Optical media drive
Memory card readers (CompactFlash cards, Secure Digital (SD) cards, Smart Media cards)
Various connectors and converters
Some of our commonly used software and other processing tools
SafeMover Python script (created in-house at NLNZ to transfer and check fixity for digital collections)
DROID file profiling tool
Karen’s Directory Printer
Free Commander/Double Commander
File List Creator
Hex Editor Neo
HFS+ for Windows
System Centre Endpoint Protection
Valerie Love is the Senior Digital Archivist Kaipupuri Pūranga Matihiko Matua at the Alexander Turnbull Library, National Library of New Zealand Te Puna Mātauranga o Aotearoa
This is the second of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be).Dispatches should be between 200-500 words and can be submitted here.
by Lori Eaton, MLIS, CA
The economic news swirling around the COVID-19 outbreak frequently references how people with the fewest financial resources will bear the brunt of the pandemic-driven recession. As an archivist and records manager working with foundations, I’ve been awed by how quickly the philanthropic community in Michigan has sprung into action. Foundations are distributing emergency funds, coordinating resources to help nonprofits support clients and staff (for an example, see the Council on Foundations COVID-19 Resource Hub, which provides resources for funders and grantees), and working with grantees who provide direct aid to those in our communities who need it most.
On March 16, 2020, the Detroit-based foundation where I’ve been embedded for the last year made the decision to close the office and asked staff to work remotely. Thankfully, the foundation moved to cloud-based file storage almost a year ago and had recently enhanced teleconferencing capabilities. Grants are also managed through a cloud-based tool as are board of trustee resources.
Together with learning and impact staff, I’ve been working to gather and organize a digital library of COVID-19 related resources and records generated by the foundation. We’re collecting files the foundation staff creates but also those of funding partners, grantees, nonprofit support organizations, and state and local government. I’ve taken on the task of naming and describing these files and applying a consistent vocabulary.
In the near term, this resource library will help foundation staff keep track of the deluge of information flooding in through emails, Google docs, websites, and conference calls. In the future, it is our hope that this library will help tell the story of how both the foundation and the philanthropy community in Michigan rose to the challenge presented by this pandemic.