What’s Your Set-up?: Processing Digital Records at UAlbany (part 2)

by Gregory Wiedeman


In the last post I wrote about the theoretical and technical foundations for our born-digital records set-up at UAlbany. Here, I try to show the systems we use and how they work in practice.

Systems

ArchivesSpace

ArchivesSpace manages all archival description. Accession records and top level description for collections and file series are created directly in ArchivesSpace, while lower-level description, containers, locations, and digital objects are created using asInventory spreadsheets. Overnight, all modified published records are exported using exportPublicData.py and indexed into Solr using indexNewEAD.sh. This Solr index is read by ArcLight.

ArcLight

ArcLight provides discovery and display for archival description exported from ArchivesSpace. It uses URIs from ArchivesSpace digital objects to point to digital content in Hyrax while placing that content in the context of archival description. ArcLight is also really good at systems integration because it allows any system to query it through an unauthenticated API. This allows Hyrax and other tools to easily query ArcLight for description records.

Hyrax

Hyrax manages digital objects and item-level metadata. Some objects have detailed Dublin Core-style metadata, while other objects only have an ArchivesSpace identifier. Some custom client-side JavaScript uses this identifier to query ArcLight for more description to contextualize the object and provide links to more items. This means users can discover a file that does not have detailed metadata, such as Minutes, and Hyrax will display the Scope and Content note of the parent series along with links to more series and collection-level description.

Storage

Our preservation storage uses network shares managed by our university data center. We limit write access to the SIP and AIP storage directories to one service account used only by the server that runs the scheduled microservices. This means that only tested automated processes can create, edit, or delete SIPs and AIPs. Archivists have read-only access to these directories, which contain standard bags generated by BagIt-python that are validated against BagIt Profiles. Microservices also place a copy of all SIPs in a processing directory where archivists have full access to work directly with the files. These processing packages have specific subdirectories for master files, derivatives, and metadata. This allows other microservices to be run on them with just the package identifier. So, if you needed to batch create derivatives or metadata files, the microservices know which directories to look in.

The microservices themselves have built-in checks in place, such as they will make sure a valid AIP exists before deleting a SIP. The data center also has some low-level preservation features in place, and we are working to build additional preservation services that will run asynchronously from the rest of our processing workflows. This system is far from perfect, but it works for now, and at the end of the day, we are relying on the permanent positions in our department as well as in Library Systems and university IT to keep these files available long-term.

Microservices

These microservices are the glue that keeps most of our workflows working together. Most of the links here point to code in our Github page, but we’re also trying to add public information on these processes to our documentation site.

asInventory

This is a basic Python desktop app for managing lower-level description in ArchivesSpace through Excel spreadsheets using the API. Archivists can place a completed spreadsheet in a designated asInventory input directory and double-click an .exe file to add new archival objects to ArchivesSpace. A separate .exe can export all the child records from a resource or archival object identifier. The exported spreadsheets include the identifier for each archival object, container, and location, so we can easily roundtrip data from ArchivesSpace, edit it in Excel, and push the updates back into ArchivesSpace. 

We have since built our born digital description workflow on top of asInventory. The spreadsheet has a “DAO” column and will create a digital object using a URI that is placed there. An archivist can describe digital records in a spreadsheet while adding Hyrax URLs that link to individual or groups of files.

We have been using asInventory for almost 3 years, and it does need some maintenance work. Shifting a lot of the code to the ArchivesSnake library will help make this easier, and I also hope to find a way to eliminate the need for a GUI framework so it runs just like a regular script.

Syncing scripts

The ArchivesSpace-ArcLight-Workflow Github repository is a set of scripts that keeps our systems connected and up-to-date. exportPublicData.py ensures that all published description in ArchivesSpace is exported each night, and indexNewEAD.sh indexes this description into Solr so it can be used by ArcLight. processNewUploads.py is the most complex process. This script takes all new digital objects uploaded through the Hyrax web interface, stores preservation copies as AIPs, and creates digital object records in ArchivesSpace that points to them. Part of what makes this step challenging is that Hyrax does not have an API, so the script uses Solr and a web scraper as a workaround.

These scripts sound complicated, but they have been relatively stable over the past year or so. I hope we can work on simplifying them too, by relying more on ArchivesSnake and moving some separate functions to other smaller microservices. One example is how the ASpace export script also adds a link for each collection to our website. We can simplify this by moving this task to a separate, smaller script. That way, when one script breaks or needs to be updated, it would not affect the other function.

Ingest and Processing scripts

These scripts process digital records by uploading metadata for them in our systems and moving them to our preservation storage.

  • ingest.py packages files as a SIP and optionally updates ArchivesSpace accession records by added dates and extents.
  • We have standard transfer folders for some campus offices with designated paths for new records and log files along with metadata about the transferring office. transferAccession.py runs ingest.py but uses the transfer metadata to create accession records and produces spreadsheet log files so offices can see what they transferred
  • confluence.py scrapes files from our campus’s Confluence wiki system, so for offices that use Confluence all I need is access to their page to periodically transfer records.
  • convertImages.py makes derivative files. This is mostly designed for image files, such as batch converting TIFFs to JPGs or PDFs.
  • listFiles.py is very handy. All it does is create a text file that lists all filenames and paths in a SIP. These can then be easily copied into a spreadsheet.
  • An archivist can arrange records by creating an asInventory spreadsheet that points to individual or groups of files. buildHyraxUpload.py then creates a TSV file for uploading these files to Hyrax with the relevant ArchivesSpace identifiers.
  • updateASpace.py takes the output TSV from uploading to Hyrax and updates the same inventory spreadsheets. These can then be uploaded back into ArchivesSpace which will create digital objects that point to Hyrax URLs.

SIP and AIP Package Classes

These classes are extensions of the Bagit-python library. They contain a number of methods that are used by other microservices. This lets us easily create() or load() our specific SIP or AIP packages and add files to them. They also include complex things like getting a human-readable extent and date ranges from the filesystem. My favorite feature might be clean() which removes all Thumbs.db, desktop.ini, and .DS_Store files as the package is created.

Example use case

  1. Wild records appear! A university staff member has placed records of the University Senate from the past year in a standard folder share used for transfers.
  2. An archivist runs transferAccession.py, which creates an ArchivesSpace accession record using some JSON in the transfer folder and technical metadata from the filesystem (modified dates and digital extents). It then packages the files using BagIt-python and places one copy in the read-only SIP directory and a working copy in a processing directory.
    • For outside acquisitions, the archivists usually manually download, export, or image the materials and create an accession record manually. Then, ingest.py packages these materials and adds dates and extents to the accession records when possible.
  3. The archivist makes derivative files for access or preservation. Since there is a designated derivatives directory in the processing package, the archivists can use a variety of manual tools or run other microservices using the package identifier. Scripts such as convertImages.py can batch convert or combine images and PDFs and other scripts for processing email are still being developed.
  4. The archivist then runs listFiles.py to get a list of file paths and copies them into an asInventory spreadsheet.
  5. The archivist arranges the issues within the University Senate Records. They might create a new subseries and use that identifier in an asInventory spreadsheet to upload a list of files and then download them again to get a list of ref_ids.
  6. The archivist runs buildHyraxUpload.py to create a tab-separated values (TSV) file for uploading files to Hyrax using the description and ref_ids from the asInventory spreadsheet.
  7. After uploading the files to Hyrax, the archivist runs updateASpace.py to add the new Hyrax URLs to the same asInventory spreadsheet and uploads them back to ArchivesSpace. This creates new digital objects that point to Hyrax.

Successes and Challenges

Our set-up will always be a work in progress, and we hope to simplify, replace, or improve most of these processes over time. Since Hyrax and ArcLight have been in place for almost a year, we have noticed some aspects that are working really well and others that we still need to improve on.

I think the biggest success was customizing Hyrax to rely on description pulled from ArcLight. This has proven to be dependable and has allowed us to make significant amounts of born-digital and digitized materials available online without requiring detailed item-level metadata. Instead, we rely on high-level archival description and whatever information we can use at scale from the creator or the file system.

Suddenly we have a backlog. Since description is no longer the biggest barrier to making materials available, the holdup has been the parts of the workflow that require human intervention. Even though we are doing more with each action, large amounts of materials are still held up waiting for a human to process them. The biggest bottlenecks are working with campus offices and donors as well as arrangement and description.

There is also a ton of spreadsheets. I think this is a good thing, as we have discovered many cases where born-digital records come with some kind of existing description, but it often requires data cleaning and transformation. One collection came with authors, titles, and abstracts for each of a few thousand PDF files, but that metadata was trapped in hand-encoded HTML files from the 1990s. Spreadsheets are a really good tool for straddle the divide between automated and manual processes required to save this kind of metadata, and this is a comfortable environment for many archivists to work in.[1]

You may have noticed, but the biggest needs we have now—donor relations, arrangement and description, metadata cleanup—are roles that archivists are really good and comfortable at. It turned out that once we had effective digital infrastructure in place, it created further demands on archivists and traditional archival processes.

This brings us to the biggest challenge we face now. Since our set-up often requires comfort on the command line, we have severely limited the number of archivists who can work on these materials and required non-archival skills to perform basic archival functions. We are trying to mitigate this in some respects by better distributing individual stages for each collection and providing more documentation. Still, this has clearly been a major flaw, as we need to meet users (in this case other archivists) where they are rather than place further demands on them.[2]


Gregory Wiedeman is the university archivist in the M.E. Grenander Department of Special Collections & Archives at the University at Albany, SUNY where he helps ensure long-term access to the school’s public records. He oversees collecting, processing, and reference for the University Archives and supports the implementation and development of the department’s archival systems.

Dispatches from a Distance: Disrupting Toxicity at the Workplace

This is the fourth of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be). Dispatches should be between 200-500 words and can be submitted here.


Anonymous

Working from home has its challenges, as many of us have found lately. It is certainly a privilege now to be able to work from home and to remain employed. In some cases, however, this might be an opportunity to disrupt a cycle of toxicity in the work environment. In my case, workplace hostility has been extremely stressful and taken its toll on my mental health, which has in turn affected my physical health. I am looking forward to realigning my focus on what matters in life—myself and my loved ones.

I have not been fully present for family or friends as I would like to be, and I need to feel complete to be able to be there for them. Working from home, now, I am trying to focus on eating well and sleeping again. Last night I slept the entire night without medication. I woke up refreshed, made myself coffee, and felt good enough to start a new writing project. I have been so lost and absorbed with the toxic environment at work that I had been stuck in a loop, ruminating over and over on situations, thinking about the retaliation at my workplace and whether or not I should act on it. I was anxious about going back to work, about sobbing in the bathroom stall in a public restroom, or shedding more tears in front of my colleagues or my supervisor. I have continuously turned words over in my mind, telling myself that I was valid, that I have value to the profession, and that others will recognize my contributions even if they aren’t shared by my supervisors.

Sometimes, we hope that an academic community can save us. We hope that academia understands that we are all colleagues working towards a common goal. We hope that academic freedom and its ideals is a humanist approach, and those who work within its comforts will receive understanding and ethical treatment. I thought that academia would provide me with benefits that I never had, and it delivers health insurance and other benefits which can be hard to secure otherwise. It does not always guarantee respect, of course.  I did not realize that academia can be horribly competitive.  My experience has led me to consider finding employment outside of academia, although I am sure any economic sector can be similarly toxic. I have considered leaving the profession altogether, and I still consider it. 

Right now, I am hoping to find myself again. I have become completely self-absorbed in my situation, which is unfortunate, but it is a cycle that is very hard to break unless something breaks it for you. As I said, we are certainly privileged right now to be able to work at all, especially to work from home. The epidemic has been awful for the larger community, so I hope to do what I can to make myself complete so that I can engage with the larger world again and, hopefully, to be more effective in change that will benefit the larger community as well as myself.

What’s Your Set-up?: Processing Digital Records at UAlbany (part 1)

by Gregory Wiedeman


Archivists have made major progress using disk imaging to safely move content off of floppy disks and other external media. These workflows have the best tools with the most complete documentation. Yet external media only makes up a portion of born-digital records and there is less guidance on processing and processing and providing access to other types of digital content. At UAlbany, we certainly have the disk-in-a-box problem that most repositories face, but partly because of our institutional context, we have found that this aspect has become a minor part of “our set-up.”

Most of our born-digital accessions now come in over network shares, cloud storage, or exports from web applications. Additionally, we take in a lot of born-digital content that can be made publically available now, without restrictions. Some of this is because we’ve been taking in a lot of institutional records, which are public records (think minutes, formal reports, and publications), but there is also a surprising amount of unrestricted digital material we collect from outside groups, like political advocacy and activist groups. Since we also digitize many records that need to be described and managed, we needed our infrastructure to support them as well. So, one of the major factors in developing our digital processing infrastructure is that we hope that we can make many records available online soon after we acquire them.

What all that means is that our set-up for processing both born-digital and digitized records is more of an ecosystem than a desktop station or single system. This is great, but makes it difficult to overview, and the result was a bit too long for a single post. So, the BloggERS Editorial Board and I decided to split it into two posts. The first part will focus on the theoretical and technical foundation, the principles behind our decision-making and the servers that run all off the systems. The second post will summarize all the difference systems and microservices we use, provide a sample use-case of how they all work together, and discuss what we’ve learned since all this has been in place and the challenges that remain.

Principles

Use one archival descriptive system

Most systems built for managing digital content use bibliographic-style description which makes them challenging to integrate into archival processes. They assume each “thing” gets a record with a series of elements or statements, much like the archetypical library catalog.[1] This really includes everything from Digital Asset Management Systems or “Digital Repositories” down to DFXML. Archival description instead describes groups of things at progressive levels of detail. Since we use archival description for paper materials, using one system means applying archival description to digital records as well.

In practice, this means that every digital object is connected to archival description managed by ArchivesSpace.[2] This does not mean that we list each PDF or disk image, but merely that there is a record somewhere in ArchivesSpace that refers to it. This can be anywhere from a collection-level record that includes a big pile of disk images, or an item-level record that refers to an individual PDF. The identifier from ArchivesSpace can then help provide intellectual control without having to describe every digital file.[3]

Some digital objects get additional detailed item-level metadata,[4] while others rely on an identifier to pull in description of records in ArchivesSpace.[5] Our repository assumes everyone uses a full set of Dublin Core or MODS elements out-of-the-box, but we needed most objects to rely only on the ASpace identifier. So we had to modify our repository to both be able to use less descriptive metadata and to link to metadata records in other systems.

Build networks of systems and limit their use to what they are really good at

We try to keep our systems as boilerplate as possible, avoiding customizations and using their default processes. By systems, I mean software applications, such as ArchivesSpace, Hyrax, DSpace, Solr, or Fedora. These applications might span multiple servers, or multiple systems can run together on a single server. Most systems are really good at performing their core functions, but get less effective the more edge cases you ask them to do. Systems are also challenging to maintain, and sticking to the “main line” that everyone else uses ensures that we will have the easiest possible upgrade path.

This means we use ArchivesSpace for managing archival description, ArcLight for displaying description, and Hyrax as a digital repository. We only ask them to do these things, and use smaller tools to perform other functions. We use asInventory as a spreadsheet tool for listing archival materials instead of ArchivesSpace, ArcLight instead of the ArchivesSpace Public User Interface for discovery and display, and network storage for preservation instead of relying on Hyrax and Fedora.

When we need to adapt systems to local practices, instead of customizing them, we try to bring the customizations outside the boundaries of the system and instead rely on their openness and API connections. We create or adapt what I am calling “microservices” to fulfil these local custom functions. These are small, as-simple-as-possible tools that each perform one specific function. Theoretically, at least, they are easy to build and might not be designed to be maintained. Instead, we will adapt or replace them with another super-simple tool when they get problematic or are no longer useful. Microservices do not store or manage data themselves, so when (not if) they stop working, we are not relying on them to immediately fulfil core functions. We will still have to replace them, but we will not have to drop everything and scramble to fix them to serve the next user who walks in the physical or virtual door. In this way, microservices are sort of like the sacrificial anodes of our digital infrastructure.[6]

Computers don’t preserve bits, people preserve bits

Digital preservation is not something any software can do by itself. Preservation requires human attention and labor. There is no system where you can merely ingest your content to preserve it. Instead of relying on a single ideal “preservation” system, our approach is to get digital content onto “good enough” storage and plan to actively manage and maintain it over time. Preservation systems are tools and their effectiveness depends on their context and use.

While we use Fedora Commons as part of the Hyrax/Samvera stack, we do not consider our instance to be a preservation system and we do not use it as such. Hyrax is really complex and challenging to manage, particularly since it stores data in multiple places: in Fedora, in a database, and on a server’s hard disk. Were we to rely on Hyrax as our preservation system, my biggest fear would be that the database and Fedora get out of sync which will prevent Hyrax from booting or using the Rails console. In this scary scenario, Fedora would still manage the digital content and metadata, but we’d have to try and piece together what “wd/37/6c/90/wd376c90g” means and how it connects to the human-readable metadata.

Instead, we use Hyrax as only an access system. We keep all master files and metadata in standard packages on network shares using BagIt-python. Preservation copies, like uncompressed TIFFs and WAVs, are not uploaded to Hyrax in order to limit data duplication, as we make derivatives prior to ingest. When we add metadata through Hyrax, a microservice adds it to the preservation storage overnight. This preserves the master copies in an environment we are more confident that we can maintain, as simplicity might be more important for preservation than complex functionality. It also lowers the stakes for maintaining Hyrax, as we don’t risk losing materials if something goes wrong.

Servers

These are all the servers we use to process and manage digital records. They are all virtual servers that live in the university data center. Our Library Systems department services the Windows servers, and university IT supports the Linux servers.

  • ArchivesSpace production server
    • Oracle Linux server, 2 core, 6GB RAM
    • Runs ArchivesSpace and MySQL
  • ArchivesSpace development server
    • Oracle Linux server, 2 core, 4GB RAM
    • Runs ArchivesSpace and MySQL
  • Ruby on Rails production server
    • Oracle Linux, 4 core, 8GB RAM
    • Runs applications that use the Ruby on Rails web application framework, including ArcLight, Hyrax, Bento search, and Jekyll website 
    • Serves the Ruby on Rails environment using Passenger and nginx as the webserver 
  • Ruby on Rails development server
    • Oracle Linux, 4 core, 8GB RAM
    • Has duplicate development instances of all  Ruby on Rails-based applications
    • Runs scheduled microservices
  • Solr server
    • Windows Server 12GB RAM
    • Runs the Solr search engine application which is used by ArcLight and Hyrax
  • Fedora server
    • Windows Server, 2 core, 4GB RAM
    • Runs Fedora using Apache Tomcat to support Hyrax
  • Postgres database server
    • Windows Server
    • Supports Hyrax and Fedora

I know this list can be very intimidating for many small and medium-size repositories who can struggle to find the support for even one web application, but I think it’s important to be about the technology required to process and actually provide access to digital records. We don’t want to make promises to our donors that we can’t fulfil. There are many administrators, donors, and IT staff members that don’t assume archival repositories require this technology. Even at a major research university, we had to spend years changing the culture of expectations to put these tools in place. I hope that if archivists can be more transparent about these requirements we can help each other make the case for more support.


Gregory Wiedeman is the university archivist in the M.E. Grenander Department of Special Collections & Archives at the University at Albany, SUNY where he helps ensure long-term access to the school’s public records. He oversees collecting, processing, and reference for the University Archives and supports the implementation and development of the department’s archival systems.

Dispatches from a Distance: Flexing Priorities

This is the third of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be). Dispatches should be between 200-500 words and can be submitted here.


by Sara Mouch

I would say that making the move from campus to working from home has shifted priorities for me as the University of Toledo Archivist at the Ward M. Canaday Center for Special Collections, but everything I do feels like top priority. We’re a large repository (upwards of 10,000 linear feet of collections) with a small shop (2 full-time archivists), so there is never a dearth of things to do, and I claim that all of those things are equally important. However, some of those priorities are currently impossible to address, such as prompt reference assistance, physical collections processing, and digitization projects. Those are priorities that, when on campus, are most present and pressing, distracting from all the others. I have no such distractions now, and the ability to focus on those other, all so important, but neglected priorities is liberating. I’m a troubleshooter by nature and I’m in a position right now to spend more time than usual untangling problems, such as finding aid errors and poor data management. Quality control takes center stage and, frankly, I love the tedious clean-up, the moments when I can afford perfection over progress, an impossibility in most areas of archives management. From my professional standpoint, this is the light amidst all the darkness.

The limitations inherent in the inaccessibility of physical collections has, however, brought concerns to the forefront. Even the remotest possibility of installing an exhibit (as we do annually) for 2020 reduces to nonexistent without the ability to curate and prepare items for display. Our ability to serve researchers is only as good as those collections that are available, in part or in full, in our digital repository. The resulting suspension of archives orientation sessions due to the move to online classes means that we can’t put history in the hands of our students. These concerns, however temporary, loom and linger, even as I’m thrilled to have the luxury to learn the vagaries of ArchivesSpace.

But we readjust, and those limitations become opportunities. We can create online guides to collections earmarked for representation in our exhibit. Digitization needs become clearer and we can re-prioritize the scanning of collections destined for the digital repository, even if the actual scanning must wait. Finally, just as teaching faculty must adapt to reaching their students remotely, so too must the archivists who serve both. An online presentation regarding archival research may not have the same impact as interacting with tangible objects and records, but hopefully will convey that history exists in many formats, that archives are a great resource for research and connection, and archivists want to meet students where they are. 

That backlog, though….

What’s Your Setup?: National Library of New Zealand Te Puna Mātauranga o Aotearoa

By Valerie Love

Introduction

The Alexander Turnbull Library holds the archives and special collections for the National Library of New Zealand Te Puna Mātauranga o Aotearoa (NLNZ). While digital materials have existed in the Turnbull Library’s collections since the 1980s, the National Library began to formalise its digital collecting and digital preservation policies in the early 2000s, and established the first Digital Archivist roles in New Zealand. In 2008, the National Library launched the National Digital Heritage Archive (NDHA), which now holds 27 million files spanning across 222 different formats, and consisting of 311 Terabytes.

Since the launch of the NDHA, there has been a marked increase in the size and complexity of incoming digital collections. Collections currently come to the Library on a combination of obsolete and contemporary media, as well as electronic transfer, such as email or File Transfer Protocol (FTP).

Digital Archivists’ workstation setup and equipment

Most staff at the National Library use either a Windows 10 Microsoft Surface Pro or HP EliteBook i5 at a docking station with two monitors to allow for flexibility in where they work. However, the Library’s Digital Archivists have specialised setups to support their work with large digital collections. The computers and workstations below are listed in order of frequency of usage. 

Computers and workstations

  1. HP Z200 i7 workstation tower

The Digital Archivists’ main ingest and processing device is an HP Desktop Z220 i7 workstation tower for processing digital collections. The Z220s have a built-in read/write optical disc drive, as well as USB and FireWire ports.

  1. HP Elitebook i7

The device we use second most-frequently is an HP Elitebook i7, which we use for electronic transfers of contemporary content. Our web archivists also use these for harvesting websites and running social media crawls. As there are only a handful of digital archivists in Aotearoa New Zealand, we do a significant amount of training and outreach to archives and organisations that don’t have a dedicated digital specialist on staff. Having a portable device as well as our desktop setups is extremely useful for meetings and workshops offsite. 

  1. MacBook Pro 15inch, 2017

The Alexander Turnbull Library is a collecting institution, and we often receive creative works from authors, composers, and artists. We regularly encounter portable hard drives, floppy disks, zip disks, and even optical discs which have been formatted for a Mac operating system, and are incompatible with our corporate Windows machines. And so, MacBook Pro to the rescue! Unfortunately, the MacBook Pro only has ports for USB-C, so we keep several USB-C to USB adapters on hand. The MacBook has access to staff wifi, but is not connected to the corporate network. We’ve recently begun to investigate using HFS+ for Windows software in order to be able to see Macintosh file structures on our main ingest PCs.

  1. Digital Intelligence FRED Forensic Workstation

If we can’t read content on either or corporate machines or the MacBook Pro, then our friend FRED is our next port of call. FRED is a forensic recovery of evidence device, and includes a variety of ports and drives with write blockers built in. We have a 5.25 inch floppy disk drive attached to the FRED, and also use it to mount internal hard drives removed from donated computers and laptops. We don’t create disk images by default on our other workstations, but if a collection is tricky enough to merit the FRED, we will create disk images for it, generally using FTK Imager. The FRED has its own isolated network connection separate from the corporate network so we can analyse high risk materials without compromising the Library’s security. 

  1. Standalone PC 

Adjacent to the FRED, we had an additional non-networked PC (also an HP Z200 i7 workstation tower) where we can analyse materials, download software, test scripts, and generally experiment separate from the corporate network. These are currently still operating under a Windows 7 build, as some of the drivers we use with legacy media carriers were not compatible with the Windows 10 during the initial testing and rollout of Windows 10 devices to Library staff. 

  1. A ragtag bunch of computer misfits 

[link to https://natlib.govt.nz/blog/posts/a-ragtag-bunch-of-computer-misfits]

Over the years, the Library has collected vintage computers with a variety of hardware and software capabilities and each machine offers different applications and tools in order to help us process and research legacy digital collections. We are also sometimes gifted computers from donors in order to support the processing of their legacy files, and allow us to see exactly what software and programmes they used, and their file systems.

  1. Kyroflux (located at Archives New Zealand)

And for the really tricky legacy media, we are fortunate to be able to call on our colleagues down the road at Archives New Zealand Te Rua Mahara o te Kāwanatanga, who have a Kyroflux set up in their digital preservation lab to read 3.5 inch and 5.25 inch floppy disks. We recently went over there to try to image a set of double sided, double density, 3.5 inch Macintosh floppy disks from 1986-1989 that we had been unable to read on our legacy Power Macintosh 7300/180. We were able to create disk image files for them using the Kryoflux, but unfortunately, the disks contained bad sectors so we weren’t able to render the files from them. 

Drives and accessories

In addition to our hardware and workstation setup, we use a variety of drives and accessories to aid in processing of born-digital materials.

  1. Tableau Forensic USB 3.0 Bridge write blocker
  2. 3.5 inch floppy drive
  3. 5.25 inch floppy drive
  4. Optical media drive 
  5. Zip drive 
  6. Memory card readers (CompactFlash cards, Secure Digital (SD) cards, Smart Media cards)
  7. Various connectors and converters

Some of our commonly used software and other processing tools

  1. SafeMover Python script (created in-house at NLNZ to transfer and check fixity for digital collections)
  2. DROID file profiling tool
  3. Karen’s Directory Printer
  4. Free Commander/Double Commander
  5. File List Creator
  6. FTK Imager
  7. OSF Mount
  8. IrfanView
  9. Hex Editor Neo
  10. Duplicate Cleaner
  11. ePADD
  12. HFS+ for Windows
  13. System Centre Endpoint Protection


Valerie Love is the Senior Digital Archivist Kaipupuri Pūranga Matihiko Matua at the Alexander Turnbull Library, National Library of New Zealand Te Puna Mātauranga o Aotearoa

Dispatches from a Distance: Dispatch from a Detroit foundation archivist

This is the second of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be). Dispatches should be between 200-500 words and can be submitted here.


by Lori Eaton, MLIS, CA

The economic news swirling around the COVID-19 outbreak frequently references how people with the fewest financial resources will bear the brunt of the pandemic-driven recession. As an archivist and records manager working with foundations, I’ve been awed by how quickly the philanthropic community in Michigan has sprung into action. Foundations are distributing emergency funds, coordinating resources to help nonprofits support clients and staff (for an example, see the Council on Foundations COVID-19 Resource Hub, which provides resources for funders and grantees), and working with grantees who provide direct aid to those in our communities who need it most. 

On March 16, 2020, the Detroit-based foundation where I’ve been embedded for the last year made the decision to close the office and asked staff to work remotely. Thankfully, the foundation moved to cloud-based file storage almost a year ago and had recently enhanced teleconferencing capabilities. Grants are also managed through a cloud-based tool as are board of trustee resources. 

Together with learning and impact staff, I’ve been working to gather and organize a digital library of COVID-19 related resources and records generated by the foundation. We’re collecting files the foundation staff creates but also those of funding partners, grantees, nonprofit support organizations, and state and local government. I’ve taken on the task of naming and describing these files and applying a consistent vocabulary. 

In the near term, this resource library will help foundation staff keep track of the deluge of information flooding in through emails, Google docs, websites, and conference calls. In the future, it is our hope that this library will help tell the story of how both the foundation and the philanthropy community in Michigan rose to the challenge presented by this pandemic.