What’s Your Set-up?: Curation on a Shoestring

by Rachel MacGregor

At the Modern Records Centre, University of Warwick in the United Kingdom we have been making steady progress in our digital preservation work. Jessica Venlet from UNC Chapel Hill wrote recently about being in the lucky position of finding “an excellent stock of hardware and two processors” when she started in 2016. We’re a little further behind than this—when I began in 2017 I had a lot less!

What we want is FRED. Who’s he? He’s your Forensic Recovery of Evidence Device (forensic workstation), but costing several thousand dollars, it’s beyond the reach of many of us.  

What I had in 2017: 

  • A Tableau T8-R2 write blocker. Write blockers are very important when working with rewritable media (USB drives, hard drives, etc.) because they prevent accidental alteration of material by blocking overwriting or deletion.
  • A (fingers crossed) working 3.5 inch external floppy disk drive.
  • A lot of enthusiasm.

What I didn’t have: 

  • A budget.
Image of Dell monitor and computer, keyboard, mouse, and writeblocker on a desk in an office.  Bitcurator software opened on the screen.
My digital curation workstation – not fancy but it works for me. Photo taken by MacGregor, under CC-BY license.

Whilst doing background research for tackling our born-digital collections, I got interested in the BitCurator toolkit which is designed to help with the forensic recovery of digital materials.  It interested me particularly because:

  • It’s free.
  • It’s open source.
  • It’s created and managed by archivists for archivists.
  • There’s a great user community.
  • There are loads of training materials online and an online discussion group.

I found this excellent blog post by Porter Olsen to help get started. He suggests starting with a standard workstation with a relatively high specification (e.g. 8 GB of RAM). So, I asked our IT folk for one, which they had in stock (yay!).  I specified a Windows operating system and installed a virtual machine, which runs a Linux operating system on which to run BitCurator. 

I’m still exploring BitCurator—it’s a powerful suite of tools with lots of features. However, when trialing it on the papers of the eminent historian Eric Hobsbawm, I found that it was a bit like using a hammer to crack a nut. Whilst it was possible to produce all sorts of sophisticated reports identifying email addresses etc., this isn’t much use on drafts of published articles from the late 1990-early 2000s. I turned to FTK Imager which is proprietary but free software. It is widely used in the preservation community, but not designed by, with, or for archivists (as BitCurator is). I guess its popularity derives from the fact that it’s easy to use and will allow you to image (i.e. take a complete copy of all the whole media including deleted and empty space),  or just extract the files, without too much time spent learning to use it. There are standard options for disk image output (e.g. as a raw byte-for-byte image, an E01 Expert witness format, SMART, and AFF formats). However, I would like to spend some more time getting to know BitCurator and becoming part of its community. There is always room for new and different tools and I suspect the best approaches are those which embrace diversity. 

Another tool that looks useful for disk imaging is one called diskimgr created by Johan van der Knijff of the Nationale Bibliotheek van Nederland. It will only run on a Linux operating system (not on a virtual machine), so now I am wondering about getting a separate Linux workstation.  BitCurator also works more effectively in a Linux environment as opposed to a virtual machine–it does stall sometimes with larger collections. I wonder if I should have opted for a Linux machine to start with. . . it’s certainly something to consider when creating a specification for a digital curation workstation. 

Once the content is extracted, we need further tools to help us manage and process. Bitcurator does a lot, but there may be extra things that you might need depending on your intended workflow. I never go anywhere without DROID software. DROID is useful for loads of stuff like file format identification, creating checksums, deduplication, and lots more. My standard workflow is to create a DROID profile and then use this as part of the appraisal process further down the line. What I don’t yet have is some sort of file viewer—Quick View Plus is the one I have in mind (it’s not free and as I think I mentioned my resources are limited!). I would also like to get LibreOffice installed as it deals quite well with old Word processed documents.

I guess I’ll keep adding to it as I go along. I now need to work out the most efficient ways of using the tools I have and capturing the relevant metadata that is produced. I would encourage everyone to take some time experimenting with some of the tools out there and I’d love to hear about how people get on.

Rachel MacGregor is Digital Preservation Officer at the Modern Records Centre, University of Warwick, United Kingdom. Rachel is responsible for developing and implementing digital preservation processes at the Modern Records Centre, including developing cataloguing guidelines, policies and workflows. She is particularly interested in workforce digital skills development.

What’s Your Set-up?: Managing Electronic Records at University of Alaska Anchorage

by Veronica Denison

For years, archivists at the University of Alaska Anchorage Archives and Special Collections have known that we would have to grapple with how to store our electronic records. We were increasingly receiving more donations that contained items created born digitally, and I also knew I wanted to apply for grants to digitize audio, video, and film in our holdings. The grants would not provide funding for our storage system, which also meant we had to come up with alternative means to pay for it.

Prior to 2018, anything we digitized was saved on what we called “scribe drives” which were shared network drives mapped to the computers in the archives. These drive could store about 4TB of data. In 2017, we digitized 10 ¼-inch audio reels as .mp3 and .wav files, which gave us both access and master copies of those recordings. At the time, we had enough storage space for everything, but in 2018, we received a request to digitize two 16mm film reels from the Dorothy and Grenold Collins papers. Unfortunately, we could not save both the access and masters to the drive since the files were too large (about 350GB per film for the master copy). 

Around the same time, we also received the Anne Nevaldine papers. The collection contained 4 boxes of 35mm slides as well as multiple CDs that in total contained 64,932 files within 1446 folders for an amount of 322GB. I had a volunteer run each of the CDs through the Archives segregation machine to check for viruses and then transfer the digital files onto an external hard drive. We thought we had time to figure out a more permanent solution than the external hard drive, but two weeks after I made the finding aid available online, a researcher came in wanting to look at the digital photographs in the collection. This created an issue, as my only option was to give her the external hard drive to look at the images. While she was in the Archives Research Room, I watched her closely to make sure nothing was deleted or moved. 

We decided that we needed a system where we could save and access all of our digital content, while also having it backed up, and have the option to make read-only copies available to researchers in the Research Room. We knew we would probably end up having at least 5 TB of data right away if we factored in our current digital items and the possibility of future ones. We initially approached the University’s IT Department to learn about our options. Unfortunately, we were quoted a very high cost (over $10,000 a year for 20TB), so we approached the Library’s IT Department for suggestions. After some discussion about what would be appropriate for our needs, Brad, the Library’s PC and Network Administrator, presented us with some options.

We ultimately decided on a Synology DiskStation DS1817+, which cost $848, with WD Gold 10TB HDD drives. We settled on 8 drives for total of 80TB (to provide growth space), which were $375 each for a cost of $3000. Then we needed a system to hook it to. For that we used a Windows 10 Desktop, which cost $1065. The total cost for the hardware was $4913, however we also needed a cloud service provider to back up the files. We decided to go with Backblaze, which costs $5 per TB per month. This whole system is a network-attached storage system, which means it is a file-level computer data storage server connected to a computer network. We took to calling it “the NAS” for short. Thankfully, when we presented the need for electronic storage to the Library’s Dean, he was willing to provide us with the funding needed to purchase the items.

Once everything was installed, we had to transfer the files and develop a new system for arranging and saving the files. We decided on having three separate drives, two of which would be on the NAS (Master and Access), and one a separate network drive (Reference_Access). The Master drive is the only one that is backed-up to Backblaze. The Master drive acts as a dark archive, meaning once items are saved to it, they will not be accessed. Therefore, we created the Access drive where archivists can retrieve the digital contents for reference and use purposes. The Access drive is essentially a copy of the Master drive. There is also a Reference_Access drive, which is mapped separately to each computer within the Archives, and not on the NAS. Reference_Access is the drive researchers use to access digital content in the Research Room and contains the access copies and low resolution .jpgs of photographs that may be high resolution in the Master and Access drives. 

The next step was mapping the Reference_Access drive to the researcher computer in the Research Room, and to make it read-only, but only for that computer. After working with the University’s IT Department, Brad was able to make it work. Since establishing this system in Spring 2019, the Reference_Access drive has been used by multiple researchers and it works great! They are able to access digital content of collections as easily as looking through a box on a table. We are grateful for all those who helped the Archives have a great mechanism for saving our electronic records, at a relatively low cost.

Veronica Denison is currently the Assistant University Archivist at Kansas State University where she has been since September 2019. Prior to being hired at K-State, she was an archivist for six years at the University of Alaska Anchorage. She holds an MLIS with a Concentration in Archives Management from Simmons College.