Digital Object Modeling

Submitted by Erin Faulder

The Division of Rare and Manuscript Collections (RMC) at Cornell University Library (CUL) was a leader in early digitization endeavors. However, infrastructure to support coordination between archival description and digital material has not kept pace. In 2019, RMC implemented ArchivesSpace and I turned my attention to developing practice to connect archival description and digital object management.

CUL has distributed systems for displaying and preserving digitized content, and RMC has historically refrained from describing and linking to digitized content within EAD. As a result, I’ve taken this opportunity to thoughtfully engage the array of systems that we use in order to model digital objects in ASpace to best take advantage of future technological developments.

I could find almost no information about how other institutions represent their digital content in ASpace. Perhaps other institutions had <dao> elements from EAD that were imported into ASpace or other data structured from legacy systems, and have not critically evaluated, documented, and shared their practice. Further, the ASpace documentation itself makes no recommendations about how to represent digital content in the digital object module, and it’s unclear how widely or consistently the community is using this functionality. 

Given the distributed systems at CUL that store RMC’s digital content, ASpace is the system of record for archival description and basic descriptive information for digital content. It should be the hub that connects physical material to digital surrogates in both delivery environments and preservation systems. To appropriately evaluate the possible representations, I set several goals for our model. The model must support our ability to:

  • batch-create digital objects in ASpace based on systems and rules. No human data entry of digital objects should be required. 
  • represent both digitized and born digital content with clear indications which is which. 
  • bulk update URLs as access systems change. (Preservation systems have permanent identifiers that require less metadata maintenance.)
  • maintain and represent machine-actionable contextual relationships between
    • physical items and digital surrogates;
    • archival collections and digital material that lives in systems that are largely unaware of archival arrangement and description;
    • preservation object in one system and delivery object(s) in another system.
  • enable users, curators, and archivists to answer:
    • Is this thing born digital? 
    • Has this thing been digitized and where is the surrogate?
    • Where do I go to find the version (Preservation vs. Delivery) I want?
    • Where is all of the digital material for this collection?
    • How much of a collection has been digitized?

ASpace is not the system of record for technical, administrative (other than collection-level), or detailed descriptive metadata about our digital objects. Nor does ASpace need to understand how objects are further modeled within delivery or preservation systems. The systems that store the material handle those functions. Setting clear functional boundaries was essential to determining which option would meet my established needs as I balanced flexibility for unimagined future needs and current limited resources to create the digital object records at a large scale.

Given this set of requirements, I drafted four possible modeling scenarios that are represented visually, along with a metadata profile for the digital objects:

I then talked through several real-world examples of digitized material (ex. A/V, single-page image/text, multi-page image/text) for each of these scenarios with CUL colleagues from metadata and digital lifecycle services. Their fresh, non-archivist questions helped clarify my thinking. 

  • Scenario 1: 
  • Pros: 
    • Simple structure.
  • Cons:
    • RMC’s local ID (used to identify media objects in a human-readable form) only exists on the archival object in the component ID field.
    • Preservation and delivery objects only recognize a relationship with each other through the linked archival object. This is a potential break point if the links aren’t established or maintained accurately.
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or by parsing the identifier.
  • Scenario 2: 
  • Pros: 
    • Preservation and delivery objects linked through a single object making the relationship between preservation and delivery object clear.
    • Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
    • Local ID easily attached to the digital object.
  • Cons:
    • No place to store delivery system Identifier if using file version URI for the URL.
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note or parsing the URI structure.
    • Challenging to ensure that the identifier is unique across ASpace given legacy practices of assigning local identifiers.
  • Scenario 3:
  • Pros:
    • Preservation and delivery versions as digital object components linked through a single object make the relationship between preservation and delivery object clear.
    • Only one “digital object” represents a range of possible iterations making search results for digital objects easier to interpret.
    • Local ID easily attached to the digital object.
  • Cons:
    • Creating a human-meaningful label or title for a digital object component is time consuming.
    • Challenging to ensure identifiers are unique across ASpace given legacy practices of assigning local identifiers.
  • Scenario 4:
  • Pros:
    • High level of granularity in parsing data to objects, potentially providing extensible functionality in the future.
  • Cons:
    • Difficult to identify in a machine actionable way which object is preservation and which is delivery other than a note, or parsing identifier.
    • Time consuming to create a human-meaningful label or title for the digital object component, particularly for born-digital material.
    • Complex hierarchy that may be more trouble to navigate in an automated fashion with no significant benefit.

Following several conversations exploring the pros, cons, and non-archival interpretations of these representations, I ultimately decided to use scenario 1. It seemed to represent the digital objects in a way that was simplest to batch-create digital objects, once explained to technologists it was most intuitive, and it hacks the ASpace fields from their presumed use the least. 

I made two changes to the scenario to address some of the feedback raised by CUL staff. First, there will be no file-level information in the preservation package objects since that is managed well in the preservation system already and there’s no direct linking into that system. Identifiers stored in ASpace could allow us to add the information later if we find a need for it. Second in order to facilitate identifying whether an object was a preservation or delivery object, I added a user-defined controlled vocabulary field for either “Preservation” or “Delivery” to facilitate machine-actionable identification of object type. Additionally, in order to help users in the ASpace interface identify which record is which when the digital objects titles are identical, I’ll append the title with either [Preservation] or [Delivery]. 

The primary limitation of this model is that there is no way to directly define a relationship between the delivery object and preservation object. If the link between digital object(s) and archival object is broken or incorrect, there will be limited options for restoring contextual understanding of content. This lack of direct referencing means that when a patron requests a high resolution version of an object they found online an archivist must search for the delivery identifier in ASpace, find the digital object representing the delivery object, navigate to the linked archival object, and then to the linked preservation object in order to request retrieval from preservation storage. This is a clunky go-up-to-go-down mechanism that I hope to find a solution for eventually. 

Choosing scenario 1 also means enforcing that digital objects are packaged and managed at the level of archival description. We’ve been moving this direction for a while, but description for existing digitized material described at a level lower than existing archival description must be added to ASpace in order to add and link the digital objects. But that is another blog post entirely.

Erin Faulder, Assistant Director for Digital Strategies for Division of Rare and Manuscript Collections

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s