Dispatches from a Distance: Losses and Gains

This is the first of our Dispatches from a Distance, a series of short posts intended as a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas which, on any other day, you might have discussed with your deskmate or a co-worker during lunch. These don’t have to be directly in response to the Covid-19 outbreak (although they can be). Dispatches should be between 200-500 words and can be submitted here.


by Jordan Meyerl

Working from home has its challenges and its benefits, or as I’ve begun thinking of them, its losses and gains. As a graduate student who is graduating in May, the losses I am experiencing feel debilitating. While I have met the minimum requirements for my capstone, I had hoped to process more linear feet of material. While I can still engage in meaningful projects as part of my graduate assistantship with the University of Massachusetts Boston University Archives and Special Collection, the exhibit I so painstakingly helped curate has been delayed until next year. While I am grateful it has not been outright cancelled, the sense of disappointment and loss still hangs over me.

I am working to balance this feeling of loss with the gains I have made. I have gained more time to work on the written portion of my capstone. I have gained the opportunity to be a curator for A Journal of the Plague Year: An Archive of COVID-19. In the same vein, I have gained the ability to work on more digital projects through my assistantship and foster skills that make me marketable. I have also gained the chance to spend more time with my partner and focus on me, something I haven’t been able to do in a while.

Since I started graduate school at the University of Massachusetts Boston, I have been career driven. I am deeply passionate about being an archivist, and I have worked hard to complete my coursework to the best of my ability while also establishing myself within the professional communities. I have been so focused on these that I have failed to care for myself. And while I am still career driven and am taking advantage of new opportunities that have cropped up as a result of COVID-19, my greatest gain is definitely the chance to focus on me.

What’s Your Set-up? Born-Digital Processing at NC State University Libraries

by Brian Dietz


Background

Until January 2018 the NC State University Libraries did our born-digital processing using the BitCurator VM running on a Windows 7 machine. The BCVM bootstrapped our operations, and much of what I think we’ve accomplished over the last several years would not have been possible without this set up. Two years ago, we shifted our workflows to be run mostly at the command line on a Mac computer. The desire to move to CLI meant a need for a nix environment. Cygwin for Windows is not a realistic option, and the Linux subsystem, available on Windows 10, had not been released. A dedicated Linux computer wasn’t an ideal option due to IT support. I no longer wanted to manage virtual machine distributions, and a dual boot machine seemed too inefficient. Also, of the three major operating systems, I’m most familiar and comfortable with Mac OSX, which is UNIX under the hood, and certified as such. Additionally, Homebrew, a package manager for Mac, made installing and updating the programs we needed, as well as their dependencies, relatively simple. In addition to Homebrew, we use pip to update brunnhilde; and freshclam, included in ClamAV, to keep the virus database up to date. HFS Explorer, necessary for exploring Mac-formatted disks, is a manual install and update, but it might be the main pain point (and not too painful yet). With the exception of HFS Explorer, updating is done at the time of processing, so the environment is always fresh.

Current workstation

We currently have one workstation where we process born-digital materials. We do our work on a Mac Pro:

  • macOS X 10.13 High Sierra
  • 3.7 GHz processor
  • 32GB memory
  • 1TB hard drive
  • 5TB NFS-mounted networked storage
  • 5TB Western Digital external drive

We have a number of peripherals:

  • 2 consumer grade Blu-ray optical drives (LG and Samsung)
  • 2 iomega USB-powered ZIP drives (100MB and 250MB)
  • Several 3.5” floppy drives (salvaged from surplused computers), but our go-to is a Sony 83 track drive (model MPF920)
  • One TEAC 5.25” floppy drive (salvaged from a local scrap exchange)
  • Kryoflux board with power supply and ribbon cable with various connectors
  • Wiebetech USB and Forensic UltraPack v4 write blockers
  • Apple iPod (for taking pictures of media, usually transferred via AirDrop)

The tools that we use for exploration/appraisal, extraction, and reporting are largely command line tools:

Exploration

  • diskutil (finding where a volume is mounted)
  • gls (finding volume name, where the GNU version shows escapes (“\”) in print outs)
  • hdiutil (mounting disk image files)
  • mmls (finding partition layout of disk images)
  • drutil status (showing information about optical media)

Packaging

  • tar (packaging content from media not being imaged)
  • ddrescue (disk imaging)
  • cdparanoia (packaging content from audio discs)
  • KryoFlux GUI (floppy imaging)

Reporting

  • brunnhilde (file and disk image profiling, duplication)
  • bulk_extractor (PII scanning)
  • clamav (virus scanning)
  • Exiftool (metadata)
  • Mediainfo (metadata)

Additionally, we perform archival description using ArchivesSpace, and we’ve developed an application called DAEV (“Digital Assets of Enduring Value”) that, among other things, guides processors through a session and interacts with ArchivesSpace to record certain descriptive metadata. 

Working with IT

We have worked closely with our Libraries Information Technology department to acquire and maintain hardware and peripherals, just as we have worked closely with our Digital Library Initiatives department on the development and maintenance of DAEV. For purchasing, we submit larger requests, with justifications, to IT annually, and smaller requests as needs arise, e.g., our ZIP drive broke and we need a new one. Our computer is on the refresh cycle, meaning once it reaches a certain age, it will be replaced with a comparable computer. Especially with peripherals, we provide exact technical specifications and anticipated costs, e.g., iomega 250MB ZIP drive, and IT determines the purchasing process.

I think it’s easy to assume that, because people in IT are among our most knowledgeable colleagues about computing technology, they understand what it is we’re trying to do and what it is we’ll need to do it. I think that, while they are capable of understanding our needs, their specializations lay elsewhere, and it’s a bad assumption which can result in a less than ideal computing situation. My experience is that my coworkers in IT are eager to understand our problems and to help us to solve them, but that they really don’t know what our problems are. 

The counter assumption is that we ourselves are supposed to know everything about computing. That’s probably more counterproductive than assuming IT knows everything, because 1) we feel bad when we don’t know everything and 2) in trying to hide what we don’t know, we end up not getting what we need. I think the ideal situation is for us to know what processes we need to run (and why), and to share those with IT, who should be able to say what sort of processor and how RAM is needed. If your institution has a division of labor, i.e., specializations, take advantage of it. 

So, rather than saying, “we need a computer to support digital archiving,” or “I need a computer with exactly these specs,” we’ll be better off requesting a consultation and explaining what sort of work we need a computer to support. Of course, the first computer we requested for a born-digital workstation, which was intended to support a large initiative, came at a late hour and was in the form of “We need a computer to support digital archiving,” with the additional assumption of “I thought you knew this was happening.” We got a pretty decent Windows 7 computer that worked well enough.

I also recognize that I may be describing a situation that does not exist in man other institutions. In those cases, perhaps that’s something to be worked toward, through personal and inter-departmental relationship building. At any rate, I recognize and am grateful for the support my institution has extended to my work. 

Challenges and opportunities

I’ve got two challenges coming up. Campus IT has required that all Macs be upgraded to macOS Mojave to “meet device security requirements.” From a security perspective, I’m all onboard for this. However, in our testing the Kryoflux is not compatible with Mojave. This appears to be related to a security measure Mojave has in place for controlling USB communication. After several conversations with Libraries IT, they’ve recommended assigning us a Windows 10 computer for use with the Kryoflux. Beyond having two computers, I see obvious benefits to this. One is that I’ll be able to install the Linux subsystem on Windows 10 and explore whether going full-out Linux might be an option for us. Another is that I’ll have ready access to FTK Imager again, which comes in handy from time to time. 

The other challenge we have is working with our optical drives. We have consumer grade drives, and they work inconsistently. While Drive 1 may read Disc X but not Disc Y, Drive 2 will do the obverse. At the 2019 BitCurator Users Forum, Kam Woods discussed higher grade optical drives in the “There Are No Dumb Questions” session. (By the way, everyone should consider attending the Forum. It’s a great meeting that’s heavily focused on practice, and it gets better each year. This year, the Forum will be hosted by Arizona State University, October 12-13. The call for proposals will be coming out in early March).

In the coming months we’ll be doing some significant changes to our workflow, which will include tweaking a few things, reordering some steps, introducing new tools, e.g., walk_to_dfxml, Bulk Reviewer, and, I hope, introducing more automation into the process. We’re also due for a computer refresh, and, while we’re sticking with Macs for the time being, we’ll again work with our IT to review computer specifications.


Brian Dietz is the Digital Program Librarian for Special Collections at NC State University Libraries, where he manages born-digital processing, and web archiving, and digitization.

What’s Your Set-Up?: Establishing a Born-Digital Records Program at Brooklyn Historical Society

by Maggie Schreiner and Erica López


In establishing a born-digital records program at Brooklyn Historical Society, one of our main challenges was scaling the recommendations and best practices, which thus far have been primarily articulated by large and well-funded research universities, to fit our reality: a small historical society with limited funding, a very small staff, and no in-house IT support. In navigating this process, we’ve attempted to strike a balance that will allow us to responsibly steward the born-digital records in our collections, be sustainable for our staffing and financial realities, and allow us to engage with and learn from our colleagues doing similar work.

We started our process with research and learning. Our Digital Preservation Committee, which meets monthly, held a reading group. We read and discussed SAA’s Digital Preservation Essentials, reached out to colleagues at local institutions with born-digital records programs for advice, and read widely on the internet (including bloggERS!). Our approach was also strongly influenced by Bonnie Weddle’s presentation “Born Digital Collections: Practical First Steps for Institutions,” given at the Conservation Center for Art & Historic Artifact’s 2018 conference at the Center for Jewish History. Bonnie’s presentation focused on iterative processes that can be implemented by smaller institutions. Her presentation empowered us to envision a BHS-sized program, to start small, iterate when possible, and in the ways that make sense for our staff and our collections. 

We first enacted this approach in our equipment decisions. We assembled a workstation that consists of an air-gapped desktop computer, and a set of external drives based on our known and anticipated needs (3 ½ floppy, CD/DVD, Zip drives, and memory card readers). Our most expensive piece of equipment was our write-blocker (a Tableau TK8u USB 3.0 Bridge), which, based on our research, seemed like the most important place to splurge. We based our equipment decisions on background reading, informal conversations with colleagues about equipment possibilities, and an existing survey of born-digital carriers in our collections. We were also limited by our small budget; the total cost for our workstation was approximately $1,500. 

Born digital records workstation at the Brooklyn Historical Society

A grant from the Gardiner Foundation allowed us to create a paid Digital Preservation Fellowship, and hire the amazing Erica López for the position. The goals and timeline for Erica’s position were developed to allow lots of time for research, learning through trial and error, and mistakes. For a small staff, it is often difficult for us to create the time and space necessary for experimentation. Erica began by crafting processes for imaging and appraisal: testing software, researching, adapting workflows from other institutions, creating test disk images, and drafting appraisal reports. We opted to use BitCurator, due to the active user community. We also reached out to Bonnie Weddle, who generously agreed to answer our questions and review draft workflows. Bonnie’s feedback and support gave us additional confidence that we were on the right track.

Starting from an existing inventory of legacy media in our collections, Erica created disk images of the majority of items, and created appraisal assessments for each collection. Ultimately, Erica imaged eighty-seven born-digital objects (twelve 3.5 inch floppy disks, thirty-eight DVDs, and thirty-seven CDs), which contained a total of seventy-seven different file formats. Although these numbers may seem very small for some (or even most) institutions, these numbers are big for us! Our archives program is maintained by two FTE staff with multiple responsibilities, and vendor IT with no experience supporting the unique needs of archives and special collections. 

We encountered a few big bumps during the process! The first was that we unexpectedly had to migrate our archival storage server, and as a result did not have read-write access for several months. This interrupted our planned storage workflow for the disk images that Erica was creating. In hindsight, we made what was a glaring mistake to keep the disk images in the virtual machine running BitCurator. Inevitably, we had a day when we were no longer able to launch the virtual machine. After several days of failed attempts to recover the disk images, we decided that Erica would re-image the media. Fortunately, by this time, Erica was very proficient and it took less than two weeks! 

We had also hoped to do a case study on a hard drive in our collection, as Erica’s work had otherwise been limited to smaller removable media. After some experimentation, we discovered that our system would not be able to connect to the drive, and that we would need to use a FRED to access the content. We booked time at the Metropolitan New York Library Council’s Studio to use their FRED. Erica spent a day imaging the drive, and brought back a series of disk images… which to date we have not successfully opened in our BitCurator environment at BHS! After spending several weeks troubleshooting the technical difficulties and reaching out to colleagues, we decided to table the case study. Although disappointing, we also recognized that we have made huge strides in our ability to steward born-digital materials, and that we will continually iterate on this work in the future.

What have we learned about creating a BHS-sized born-digital records program? We learned that our equipment meets the majority of our use-case scenarios, that we have access to additional equipment at METRO when needed, and that maybe we aren’t quite ready to tackle more complex legacy media anyway. We learned that’s okay! We haven’t read everything, we don’t have the fanciest equipment, and we didn’t start with any in-house expertise. We did our research, did our best work, made mistakes, and in the end we are much more equipped to steward the born-digital materials in our collections. 


Maggie Schreiner is the Manager of Archives and Special Collections at the Brooklyn Historical Society, an adjunct faculty member in New York University’s Archives and Public History program, and a long-time volunteer at Interference Archive. She has previously held positions at the Fashion Institute of Technology (SUNY),  NYU, and Queens Public Library. Maggie holds an MA in Archives and Public History from NYU.

Erica López was born and raised in California by undocumented parents. Education was important but exploring Los Angeles’s colorful nightlife was more important. After doing hair for over a decade, Erica started studying to be a Spanish teacher at UC-Berkeley. Eventually, Erica quit the Spanish teacher dream, and found first film theory and then the archival world. Soon, Erica was finishing up an MA at NYU and working to become an archivist. Erica worked with Brooklyn Historical Society to setup workflows for born-digital collections, and is currently finishing up an internship at The Riverside Church translating and cataloging audio files.