By Lori Eaton
As an archivist working largely with digital records, I’ve often felt a disconnect when thinking about physical custody of the electronic records in my care. Physical custody, according to Hunter (2003), falls in the accessioning process between legal control and intellectual control of records. I can make a tactile connection to archival materials in the form of paper records, manuscripts, photos, negatives, slides, film, and video tapes. I can point to them on a shelf or hold them in my hands. But the idea of physical control of digital archives feels slippery to me. Even CDs, DVDs, and various physical media are only temporary locations for digital records. While I can point to the representations of electronic files rendered by my operating system and software applications, does that mean I have physical control? Is physical custody of digital records even possible?
To ease my unease, I decided to investigate where the digital records for one of the clients I work with are physically stored. This client chose Preservica to preserve their digital records. (It works well for them; however, this statement is not intended as an endorsement.) Based on my client’s geographic location, Preservica stores their data, with redundancies, at an Amazon Web Services (AWS) data center in Northern Virginia. Since I currently live in Georgia and travel is not advisable these days, I decided that touring a data center of any kind would help me visualize where the digital archives I manage live. I contacted the manager of a data center in Atlanta called DataBank for a tour.
The first thing I learned was that not all data centers are created equal. DataBank ATL1 serves as a high-performance computing (HPC) center for organizations such as 911 call centers, hospitals, and government agencies that require a high level of security (we’re talking spy-novel level encryption), fast data recoverability, and near instantaneous access to their data through fiber optic connections rather than via internet-dependent connections. In contrast, requirements for digital preservation data storage are a bit more pedestrian. Reliable, redundant storage is criticalfor digital collections but cultural heritage institutions and other organizations preserving electronic archival records are less likely to require encryption or high-speed fiber optic access. Customers like Preservica, and therefore my client, can purchase data center space at a lower price point than a customer that requires HPC services. That’s not to say that security is not a consideration at lower-cost data centers. AWS security and compliance white paper describes their commitment to managing the security of on-site data.
Setting aside the variables of speed and security, I still wanted to understand where inside the data center the 0s and 1s that actually make up my client’s data are stored. According to a colleague who consumes data storage services at many levels and price points, regardless of the type of data center, the hardware configurations they use to store data are conceptually similar. Spin drives or solid-state drives are connected to one other in RAIDs (redundant array of inexpensive disks or redundant array of independent disks). There are different RAID levels, each offering different levels of redundancy and performance (more on that here). In some RAIDs when a disk in an array fails, a new one is installed and data is copied to it from the remaining disks in the array, so no data is lost. The heart of most data centers are rows of racks that hold servers, multiple disk arrays, and other related hardware, along with components that help manage, cool, and power the drives.
If I could identify which AWS data center in Northern Virginia manages my client’s data, I might have some hope of pointing to a particular disk, in a particular RAID, on a particular rack and saying, the archival records I’m responsible for managing are stored there. But disk failures mean that my client’s data may not remain in one place for long. When it comes to long-term digital preservation, data repair and redundancy are positive attributes. Add to that the best practice of geographic redundancy and different data storage locations for access copies, preservation copies, and dark archives and it becomes even more challenging to point to a specific location where digital archives are stored. Maybe that’s how it should be. To stay viable, useable, accessible, data must remain on the move, a step ahead of bit rot and format obsolescence. But all that movement and multiplication certainly puts quotation marks around the idea of having “physical custody” of digital archives.
 Hunter, Gregory S. (2003). Developing and Maintaining Practical Archives. New York: Neal Schuman, pp. 101-109.
Lori Eaton is an archives consultant working with foundations and nonprofits. Previously, she was project archivist at the Dorothy A. Johnson Center for Philanthropy at Grand Valley State University. She has a MLIS and Graduate Certificate in Archival Administration from Wayne State University.