by Matt McShane
I seem to have a real penchant for completing my schoolings in the middle of “once-in-a-lifetime” economic crises: first the 2008 housing recession, and now a global pandemic. And while the current situation has somewhat altered the final semester of my MLIS program—not to mention many others’ situations much more intensely—I was still able to have a very engaging and rewarding practicum experience at The Ohio State University Libraries, working on an incredible digital collection accessioned by the Billy Ireland Cartoon Library and Museum.
U Can with Beakman and Jax might be best known to a lot of people as the predecessor of the short-lived Saturday morning live action science show Beakman’s World, but the comic is arguably more successful than the television show it produced. With an international readership and a run that lasted more than 25 years, it was a success story that entertained and educated readers over generations. It was also the first syndicated newspaper comic to be entirely digitally drawn and distributed. Jok Church, the creator and author, used Adobe Illustrator throughout the run of the comic, and saved and migrated the files in various stages of creation through multiple hard drives. With the exception of a few gaps, the entire run was saved on Jok’s hard drive at the time of his death in April 2016. These are the files we received at University Libraries.
Richard Bolingbroke, a friend of Jok’s and executor of his estate, donated the collection to the Billy Ireland. He also provided us with an in-progress biography of Jok, which gave insight into who he was as a person beyond his work with Beakman and Jax, as well as a condensed history of the publication. This will be useful as the Billy Ireland creates author metadata and information for the collection.
Richard provided us access to direct copies of twenty-four folders via Dropbox, containing nearly 10,000 files, which we downloaded to our local processing server. Each folder contained a year’s worth of comics, from 1993 to 2016, though the years 1995 and 1996 were empty due to a hard drive failure Jok had experienced. We’re still in the process of possibly hunting down any existing backups from these years. In the existing folders, though, we found not only the many years’ worth of terrific comic content, but also a glimpse into Jok’s creative and organizational process. An initial DROID scan of the contents found over 2,000 duplicate files scattered throughout. After speaking with Richard about this, we determined it to be a mistaken copy/paste issue. Rather than manipulate the existing archival collection, we decided to create a distribution collection better organized for user access to the works, with the intention of maintaining archival integrity of the donated collection.
Before either of those goals could be reached, though, our second primary issue was that of file extensions. We found nearly 1,300 files without extensions in the collection, which we determined to be due to older Mac OS’s use of files without extensions appended. Adobe Illustrator produces both .ai and .eps file types. There are other file types among the collection, but these are the primary types for each work. It was impossible to determine which files were .ai versus .eps at a batch level, so the EXIF metadata of all files without extensions were manually examined to determine their proper extension. Using Bulk Rename Utility, we were able to semi-batch the extension appending, but it still required a fair amount of manual labor due to the intermingled nature of the different file types within subfolders.
Even though create dates within EXIF metadata were unreliable because of different versions of Illustrator being used to access files throughout the years, Jok named and organized his files by publication date, which gave us reliable organization metadata for our distribution file. His file and folder organization did shift throughout the years—understandable over two decades and who knows how many machines. This required a fair bit of manual labor in creating and organizing the distribution collection in a standardized file name and subfolder format. The comic was published weekly, albeit with some breaks. Typically there are four different versions: portrait versus landscape and black and white versus color of the finished product. A year\month\date folder tree was created based on how the largest portion of Jok’s files were organized. Once that was completed, we shifted focus to Ohio State’s Accessibility standards, and investigated a batch workflow to convert the comic files to PDF/A. Unfortunately, we could not achieve PDF/A compliance due to the nature of the original files; additionally, the “batch” processing includes a significant human interaction.
Further complicating matters, while we were discovering this, the COVID-19 global pandemic hit Ohio. In response, Ohio State declared all non-essential personnel to move to tele-work, which cut off my access to the server behind the University’s firewall for the remainder of my internship. As a result, we had to put the completion of this project on indefinite hold. Despite these extreme circumstances preventing me from seeing the collection all the way through to public hands, I was able to leave it in an organized state, ready for file conversion and metadata creation.
I learned a lot by being able to handle the collection from the beginning, untouched. One of the biggest takeaways was the importance of gathering information about the collection and its creator up front. Creating a manifest of the objects within the collection is a clear necessity to knowing how the collection should be preserved, and how it should be accessed, but also allowed us to see gaps in the collection, such as the significant number of duplicates and files without extensions. Having this knowledge up front allowed us to better plan our approach to the collection. I have actually suggested increasing students’ exposure to “messy” digital objects collections to my program’s faculty based on my experience with this project.
The other key takeaway I discovered was that sometimes it might be best to dirty your hands, and perform tasks manually. Digital preservation can have a lot of automated shortcuts compared to processing its traditional analog cousins, but not everything can or should be done through batch processes. While it may be technically possible to program a process, it may not really be the best use of time or effort. Part of workflow development is recognizing when the creation of an automated solution outweighs the time and effort to manually perform the task. It may have been possible to code a script to identify and append the file extensions for our objects missing them, but the effort and time to learn, write, and troubleshoot that likely would have be greater than the somewhat tedious work of doing it by hand in this instance. Alternatively, it might be worth looking into automated scripting if this were a significantly larger collection of mislabeled or disorganized objects. Having a good understanding of cost and benefit is important when approaching a problem that can have multiple solutions.
My time on-site with The Ohio State University Libraries was a bit shorter than I had intended, but it still provided me with a great experience and helped to solidify my love for the digital preservation process and work. The fact that U Can with Beakman and Jax is the first digitally created syndicated newspaper comic makes the whole experience that much more apt and impactful. Even though some aspects of work are in limbo at the moment, I am confident that this terrific collection of Jok’s work will be available for the public to enjoy and learn from. Even if I am not able to fully carry the work over the finish line, I am thankful for the opportunity to work on it as much as I did.
Matt McShane, a recent MLIS graduate from Kent State University, is currently focused on landing a role with a cultural heritage institution where he can work hands-on with digital collections, digital preservation, and influence broader preservation policy.