by Walker Sampson
This is the eleventh post in the bloggERS Script It! Series.
One of the strengths of the BitCurator Environment (BCE) is the open-ended potential of the toolset. BCE is a customized version of the popular (granted, insofar as desktop Linux can be popular) Ubuntu distribution, and as such it remains a very configurable working environment. While there is a basic notion of a default workflow found in the documentation (i.e., acquire content, run analyses on it, and increasingly, do something based on those analyses, then export all of it to another spot), the range of tools and prepackaged scripts in BCE can be used in whatever order fits the needs of the user. But even aside from this configurability, there is the further option of using new scripts to achieve different or better outcomes.
What is a script? I’m going to shamelessly pull from my book for a brief overview:
A script is a set of commands that you can write and execute in order to automatically run through a sequence of actions. A script can support a number of variations and branching paths, thereby supporting a considerable amount of logic inside it – or it can be quite straightforward, depending upon your needs. A script creates this chain of actions by using the commands and syntax of a command line shell, or by using the commands and functions of a programming language, such as Python, Perl or PHP.
In short, scripts allow the user to string multiple actions together in some defined way. This can open the door to batch operations – repeating the same job automatically for a whole set of items – that speed up processing. Alternatively, a user may notice that they are repeating a chain of actions for a single item in a very routine way. Again, a script may fill in here, grouping together all those actions into a single script that the user need only initiate once. Scripting can also bridge the gap between two programs, or adjust the output of one process to make it fit into the input of another. If you’re interested in scripting, there are basically two (non-exclusive) routes to take: shell scripting or scripting with a programming language.
- For an intro on both writing and running bash shell scripts, one of if not the most popular Unix shell – and the one BitCurator defaults with check out this tutorial by Tania Rascia.
- There are many programming languages that can be used in scripts; Python is a very common one. Learning how to script with Python is tantamount to simply learning Python, so it’s probably best to set upon that path first. Resources abound for this endeavor, and the book Automate the Boring Stuff with Python is free under a Creative Commons license.
The BitCurator Scripts Library
The BitCurator Scripts Library is a spot we designed to help connect users with scripts for the environment. Most scripts are already available online somewhere (typically GitHub), but a single page that inventories these resources can further their use. A brief look at a few of the scripts available will give a better idea of the utility of the library.
- If you’ve ever had the experience of repeatedly trying every listed disk format in the KryoFlux GUI (which numbers well over a dozen) in an attempt to resolve stream files into a legible disk image, the DiskFormatID program can automate that process.
fiwalk, which is used to identify files and their metadata, doesn’t operate on Hierarchical File System (HFS) disk images. This prevents the generation of a DFXML file for HFS disks as well. Given the utility and the volume of metadata located in that single document, along with the fact that other disk images receive the DFXML treatment, this stands out as a frustrating process gap. Dianne Dietrich has fortunately whipped up a script to generate just such a DFXML for all your HFS images!
- The shell scripts available at rl-bitcurator-scripts are a great example of running the same command over multiple files:
fiwalkover a directory, respectively. Conversely,
simgen_prod.shis an example of shell script grouping multiple commands together and running that group over a set of items.
For every script listed, we provide a link (where applicable) to any related resources, such as a paper that explains the thinking behind a script, a webinar or slides where it is discussed, or simply a blog post that introduces the code. Presently, the list includes both bash shell scripts along with Python and Perl scripts.
Scripts have a higher bar to use than programs with a graphic frontend, and some familiarity or comfort with the command line is required. The upside is that scripts can be amazingly versatile and customizable, filling in gaps in process, corralling disparate data into a single presentable sheet, or saving time by automating commands. Along with these strengths, viewing scripts often sparks an idea for one you may actually benefit from or want to write yourself.
Following from this, if you have a script you would like added to the library, please contact us (select ‘Website Feedback’) or simply post in our Google Group. Bear one aspect in mind however: scripts do not need to be perfect. Scripts are meant to be used and adjusted over time, so if you are considering a script to include, please know that it doesn’t need to accommodate every user or situation. If you have a quick and dirty script that completes the task, it will likely be beneficial to someone else, even if, or especially if, they need to adjust it for their work environment.
Walker Sampson is the Digital Archivist at the University of Colorado Boulder Libraries, where he is responsible for the acquisition, accessioning and description of born digital materials, along with the continued preservation and stewardship of all digital materials in the Libraries. He is the coauthor of The No-nonsense Guide to Born-digital Content.