by Brian Dietz
Background
Until January 2018 the NC State University Libraries did our born-digital processing using the BitCurator VM running on a Windows 7 machine. The BCVM bootstrapped our operations, and much of what I think we’ve accomplished over the last several years would not have been possible without this set up. Two years ago, we shifted our workflows to be run mostly at the command line on a Mac computer. The desire to move to CLI meant a need for a nix environment. Cygwin for Windows is not a realistic option, and the Linux subsystem, available on Windows 10, had not been released. A dedicated Linux computer wasn’t an ideal option due to IT support. I no longer wanted to manage virtual machine distributions, and a dual boot machine seemed too inefficient. Also, of the three major operating systems, I’m most familiar and comfortable with Mac OSX, which is UNIX under the hood, and certified as such. Additionally, Homebrew, a package manager for Mac, made installing and updating the programs we needed, as well as their dependencies, relatively simple. In addition to Homebrew, we use pip to update brunnhilde; and freshclam, included in ClamAV, to keep the virus database up to date. HFS Explorer, necessary for exploring Mac-formatted disks, is a manual install and update, but it might be the main pain point (and not too painful yet). With the exception of HFS Explorer, updating is done at the time of processing, so the environment is always fresh.
Current workstation
We currently have one workstation where we process born-digital materials. We do our work on a Mac Pro:
- macOS X 10.13 High Sierra
- 3.7 GHz processor
- 32GB memory
- 1TB hard drive
- 5TB NFS-mounted networked storage
- 5TB Western Digital external drive
We have a number of peripherals:
- 2 consumer grade Blu-ray optical drives (LG and Samsung)
- 2 iomega USB-powered ZIP drives (100MB and 250MB)
- Several 3.5” floppy drives (salvaged from surplused computers), but our go-to is a Sony 83 track drive (model MPF920)
- One TEAC 5.25” floppy drive (salvaged from a local scrap exchange)
- Kryoflux board with power supply and ribbon cable with various connectors
- Wiebetech USB and Forensic UltraPack v4 write blockers
- Apple iPod (for taking pictures of media, usually transferred via AirDrop)
The tools that we use for exploration/appraisal, extraction, and reporting are largely command line tools:
Exploration
- diskutil (finding where a volume is mounted)
- gls (finding volume name, where the GNU version shows escapes (“\”) in print outs)
- hdiutil (mounting disk image files)
- mmls (finding partition layout of disk images)
- drutil status (showing information about optical media)
Packaging
- tar (packaging content from media not being imaged)
- ddrescue (disk imaging)
- cdparanoia (packaging content from audio discs)
- KryoFlux GUI (floppy imaging)
Reporting
- brunnhilde (file and disk image profiling, duplication)
- bulk_extractor (PII scanning)
- clamav (virus scanning)
- Exiftool (metadata)
- Mediainfo (metadata)
Additionally, we perform archival description using ArchivesSpace, and we’ve developed an application called DAEV (“Digital Assets of Enduring Value”) that, among other things, guides processors through a session and interacts with ArchivesSpace to record certain descriptive metadata.
Working with IT
We have worked closely with our Libraries Information Technology department to acquire and maintain hardware and peripherals, just as we have worked closely with our Digital Library Initiatives department on the development and maintenance of DAEV. For purchasing, we submit larger requests, with justifications, to IT annually, and smaller requests as needs arise, e.g., our ZIP drive broke and we need a new one. Our computer is on the refresh cycle, meaning once it reaches a certain age, it will be replaced with a comparable computer. Especially with peripherals, we provide exact technical specifications and anticipated costs, e.g., iomega 250MB ZIP drive, and IT determines the purchasing process.
I think it’s easy to assume that, because people in IT are among our most knowledgeable colleagues about computing technology, they understand what it is we’re trying to do and what it is we’ll need to do it. I think that, while they are capable of understanding our needs, their specializations lay elsewhere, and it’s a bad assumption which can result in a less than ideal computing situation. My experience is that my coworkers in IT are eager to understand our problems and to help us to solve them, but that they really don’t know what our problems are.
The counter assumption is that we ourselves are supposed to know everything about computing. That’s probably more counterproductive than assuming IT knows everything, because 1) we feel bad when we don’t know everything and 2) in trying to hide what we don’t know, we end up not getting what we need. I think the ideal situation is for us to know what processes we need to run (and why), and to share those with IT, who should be able to say what sort of processor and how RAM is needed. If your institution has a division of labor, i.e., specializations, take advantage of it.
So, rather than saying, “we need a computer to support digital archiving,” or “I need a computer with exactly these specs,” we’ll be better off requesting a consultation and explaining what sort of work we need a computer to support. Of course, the first computer we requested for a born-digital workstation, which was intended to support a large initiative, came at a late hour and was in the form of “We need a computer to support digital archiving,” with the additional assumption of “I thought you knew this was happening.” We got a pretty decent Windows 7 computer that worked well enough.
I also recognize that I may be describing a situation that does not exist in man other institutions. In those cases, perhaps that’s something to be worked toward, through personal and inter-departmental relationship building. At any rate, I recognize and am grateful for the support my institution has extended to my work.
Challenges and opportunities
I’ve got two challenges coming up. Campus IT has required that all Macs be upgraded to macOS Mojave to “meet device security requirements.” From a security perspective, I’m all onboard for this. However, in our testing the Kryoflux is not compatible with Mojave. This appears to be related to a security measure Mojave has in place for controlling USB communication. After several conversations with Libraries IT, they’ve recommended assigning us a Windows 10 computer for use with the Kryoflux. Beyond having two computers, I see obvious benefits to this. One is that I’ll be able to install the Linux subsystem on Windows 10 and explore whether going full-out Linux might be an option for us. Another is that I’ll have ready access to FTK Imager again, which comes in handy from time to time.
The other challenge we have is working with our optical drives. We have consumer grade drives, and they work inconsistently. While Drive 1 may read Disc X but not Disc Y, Drive 2 will do the obverse. At the 2019 BitCurator Users Forum, Kam Woods discussed higher grade optical drives in the “There Are No Dumb Questions” session. (By the way, everyone should consider attending the Forum. It’s a great meeting that’s heavily focused on practice, and it gets better each year. This year, the Forum will be hosted by Arizona State University, October 12-13. The call for proposals will be coming out in early March).
In the coming months we’ll be doing some significant changes to our workflow, which will include tweaking a few things, reordering some steps, introducing new tools, e.g., walk_to_dfxml, Bulk Reviewer, and, I hope, introducing more automation into the process. We’re also due for a computer refresh, and, while we’re sticking with Macs for the time being, we’ll again work with our IT to review computer specifications.
Brian Dietz is the Digital Program Librarian for Special Collections at NC State University Libraries, where he manages born-digital processing, and web archiving, and digitization.