Computer Generated Archival Description


In our archive at Carleton College we have implemented a number of automated and semi-automated tools to assist with processing our digital records. We use several batch processes to create access copies, generate checksums, validate file formats, extract tagged metadata and are working on a data accessioner that can automate many of the repetitive steps we perform on our Submission Information Packages (SIPS).  While these improvements have been tremendously helpful with processing collections quickly, there is one area that is consistently backed up in our workflow: the creation of descriptive metadata. Minimal descriptive metadata has improved our processing time for electronic records, but I can already see that this will not be enough in the near future.

In light of the accelerating growth rate of digital accessions in our repositories, how sustainable will human created descriptive metadata be in the next few years? Perhaps we should be turning to automated, computer based methods for creating descriptions just like we have for other processing steps. We are already relying on optical character recognition (OCR) to improve access to scanned print documents, but there are other methods that hold great promise. Voice to text software, while not fully baked yet, is being used by some digitization vendors to create transcriptions of video and audio files. Facial recognition could be a powerful tool for photograph identification – I could see these same methods being applied to the recognition of buildings as well. Geospatial data based on known reference points, such as an address, can make images of locations more searchable and usable in dynamically generated maps. Analysis of text could even be used to generate subject categories.

These methods would of course change how we work as professionals and how users access our records.  Our descriptive metadata would be much more extensive, but would probably be filled with many more errors than we are currently willing to accept.  To use this data, researchers might turn away from the traditional finding aid, detailed biographical descriptions and human assigned subject headings in favor of term searching, ranked results and faceted displays.  These new tools and changes may be unsettling, but in light of our mounting backlogs of electronic records, we may have no choice but to embrace them.

Do you have any experience with this kind of cataloging?  Does the idea of trusting a machine do this work cause you to feel dizziness or shortness of breath?  Please let us know in the comments below.

Nat Wilson is a Digital Archivist at Carleton College.