Recently I recently participated in the Fall Symposium of the Midwest Archives Conference, Hard Skills for Managing Digital Collections in Archives. There were a number of excellent tools that the presenters covered including ExifTool, Bagit and MediaInfo as well as new tricks for analyzing/processing data using that old frenemy, Excel. These tools are tremendously helpful to the work of archivists and are getting better all the time. Here at the Carleton College Archives, each one has greatly increased the speed with which we can process electronic accessions.
But if we are going to keep up with the tremendous flood of new electronic records into our archives, our processing programs need to run even faster. Three commonly used programs—ExifTool, Bagit and DROID—all process collection materials one at a time, and each program also requires additional steps and selection of options to generate the desired output. The convenience of these solutions is often overshadowed by their resource intensiveness. We need the ability to instruct all these programs to process a whole series of records, producing all the various outputs we want, with one or two steps as opposed to a dozen.
To solve this problem at Carleton, we have created a set of batch processors for programs we regularly use in our archive. For each program in our toolbox we wrote a script using the programming language Python that applies the program not just to a single accession but an entire directory of accessions. Additionally we gave the batch processor instructions to generate all the reports from each program we might need to manage our digital files.
These improvements have made a big difference in our ability to process numerous digital accessions quickly and consistently. For anyone that would like to try them, all the batch processors are run via the command line (PC Command Prompt or the Mac Terminal) and can be downloaded from our GitHub repositories:
Photo credit: Harpo Shining Shoes animated GIF. From “A Night in Casablanca” via http://wfmu.org/playlists/shows/51730.