Let the Bits Describe Themselves

By Brian Dietz

This post is the third in a bloggERS series about access to born-digital materials.

Over the last three years, the Special Collections Research Center (SCRC) at NCSU Libraries has undertaken an initiative to enhance our capacity for managing born-digital archival materials. For two years, the initiative was staffed half-time by a Libraries’ Fellow and guided by an advisory group representing the department’s core functional areas, with input from colleagues in our Digital Library Initiatives (DLI) department. The initiative has been focused on a set of minimally viable tools and workflows for ingest, processing, and access. For ingest, we’re using a combination of tools, like FTK Imager, BitCurator, and FITS; for access, we are providing access to materials, in most cases, on a laptop in the SCRC’s reading room.

For processing, we decided that we did not want to dedicate time and money to arranging folders and files below the object, be it a floppy, optical media, hard drive, or set of files. In ArchivesSpace, we create an archival object record in the appropriate series in which files on media belong; the archival object is given a title as descriptive as possible, based, in part, on information found on the object itself. If an appropriate existing series does not exist, we create it. However, if the media contains content that fits into more than one series, we create a new series, “Electronic Media,” in which the record for the media object will go.

The decision not to rearrange files has numerous advantages, including saving staff time, maintaining data authenticity, and allowing users to see the environment in which the creator worked. Most compelling is the fact that we have access to all sorts of metadata about files and their computing environment that we can leverage to make materials discoverable by researchers and to provide them with the resources necessary to do their own arrangement.

For instance, from most media, and for the bulk of files we’re currently dealing with, we can easily grab:

  •    File name
  •    File path
  •    File type
  •    Document size
  •    Dates

If we’re lucky, we might get from embedded metadata:

  •    Creator
  •    Title

And with a little extra work, information about the computing environment can be gathered, like:

  •    Software used to create the files
  •    Operating system information
  •    Word lists

So, what can be done with this metadata? One thing is to combine it into a CSV file that is available for download via our finding aids. There is real potential benefit to the researcher in offering her the ability to do her own arrangement, through sorting by file path, date, document type, or other criteria she sees that might give sense and order to materials. With staff in DLI, we are developing this feature in our finding aids.

Another tool we’re developing to help researchers explore the contents of electronic media is a virtual filesystem browser.  Using the file and system metadata gathered during ingest, it recreates a file browsing environment—like Explorer (Windows), Finder (Mac), or Nautilus (Ubuntu)—in a web browser, allowing a researcher, from within the context of a finding aid, to navigate virtually the contents of a media object or a set of folders and files. At the file level, there is additional file metadata available for the researcher to consider.

Directories and files in “My Documents,” viewed through the filesystem browser.
Directories and files in “My Documents,” viewed through the filesystem browser.
File representation, with metadata, viewed through the filesystem browser.
File representation, with metadata, viewed through the filesystem browser.

We currently do not plan to provide researchers with access to files through the filesystem browser (we are exploring tiers of access, from restricted to unmediated web access, and the filesystem browser will likely have a role in access). Still, it will allow researchers to get a sense of what kinds of content may be on a disk—which may inform their decisions regarding requests for access—without the expense of symbolically arranging folders or files and tracking that work.

Leveraging this metadata for description and resource discovery initially seemed like a minimally viable product, but, as we go along, I think we’ll find that it’s much better than that.


Linda Sellars and Trevor Thornton provided insightful suggestions and edits for this post.

Brian Dietz is the Digital Program Librarian for Special Collections at NCSU Libraries, where he manages digitization, born-digital processing, and web and social media archiving.


4 thoughts on “Let the Bits Describe Themselves

  1. Abby Adams January 11, 2016 / 4:24 pm

    Is the virtual filesystem browser you refer to a homegrown system? If so, is the code on Git yet?


  2. Brian Dietz January 20, 2016 / 7:58 pm

    It is homegrown. We’re about to redesign the application to make it more flexible and scalable. The code will be openly available on GitHub once that’s done.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s