The Conversation Must Go On: Climate Change and Archival Practice

By Itza A. Carbajal
This post is part of our BloggERS Another Kind of Glacier series.

On September 20th, 2019, an unprecedented number of archivists joined together in person and online to reignite conversations around Climate Change, archives, and the role of archivists in the ongoing crisis. What initially started as a conversation in search of hope between two archivists, Ted Lee and Itza Carbajal, quickly grew into an archival community wide search for change (1). Archivists as part of the “Archives and Climate Change Teach Ins Action” engaged through teach-ins, marches, resource gathering, and on social media in an effort to talk in parallel with the estimated 4-6 million people striking as part of the Global Climate Strike movement (2). These global strikes, led mostly by young people from around the world and inspired by Greta Thunberg’s Fridays for Future student strikes, occurred in over 163 countries on all seven continents uniting for perhaps the first time the residents of this house called Earth.

What led Ted, myself, and others to seek a shift in the conversations around archives and Climate Change likely began with this simple question: “why must we (as archivists) act?” While a simple question, the “why” in the case of the Climate Strike Teach-Ins was in fact the impetus for me and for many others involved (3). When Ted Lee and I, both archivists and archival scholars, first set out to organize the archivist community through these Teach-Ins, we intended the actions to be 1) opportunities to learn 2) moments to converse and 3) sparks to previous conversations around archives and climate change. With over 9 teach ins, information translated into 5 languages, a comprehensive reading list, and a global twitterthon, the action #Archivists4ClimateAction undoubtedly sparked a lasting conversation.

In all frankness, I would say that we are no longer in the stage of why we would, but rather, why would we not. Not everyone in the house is aware of the growing fire outside and within our own walls, and as a result, the archival community must begin conversations like these. For those new to advocacy, organizing, or activist work, this first question is the starting line (4). Regardless of age, length of work experience, or other backgrounds, we all must start somewhere. The “why” in this question asks us to think about why it matters to act. In returning to the metaphor of our house being on fire, the initial question could be “why should or would I be compelled to act as a result of this fire?” In the case of Climate Change, some may feel more comfortable calling our work advocacy, either on behalf of the field, our jobs, or perhaps our overall environment: the world. Others may feel more compelled to frame their work as organizing or activism, the former focused more on coordinating people and the latter focused on calling attention to an issue. In all three cases, we are striving for some sort of change or solution to what we perceive as a problem. 

We had, I would say, already accepted our responsibility to act. Both as inhabitants of this planet, and as practitioners dependent on the survival of humanity in order to make sense of our work, we had an obligation to act. We adopted a strategy– starting a conversation– which was both intentional and logical. As neither Ted nor I were environmental or climate change experts, we knew that we could only advance the conversations so much. But we recognized that our interests and skills lay in teaching, a form of educational conversation. And that led us to our answer for the second question: “what can we as archivists do?” 

The Teach-In strategy addressed the discomfort Ted and I initially felt approaching this subject, which frankly still feels overwhelming and outside of our expertise. We felt that the using Teach-Ins would allow us, as educators, to immerse ourselves in a topic of our choosing with the intention of sharing that information with our participants. The Teach-In method also allowed us to disrupt the “business as usual” attitude and tendency for many in our field, thus aligning with the original vision of the 2019 Global Climate Strike. As archivists, record managers, curators, librarians, and LIS students paused or walked out of work to attend or participate in these Teach-Ins, there was a recognition that many of us still desire to learn even after completing our formal participation in educational systems such as graduate programs. Plainly, the Teach-Ins resonated with participants from archival backgrounds, workplaces, and programs.

Ted and I chose a strategy that played to our strengths: the Teach-Ins were our preferred method because they gave us a path forward, a way to participate in the conversation by using our existing skills in teaching and organizing. Looking at what we knew, and what we had to contribute, the Teach-Ins made sense. Your skills, levels of comfort, insights, and connections will vary, but for bad or worse, the problem of Climate Change will require us all to contribute in big and small ways. This brings me to the last question: “how do we (as archivists and an archival community) take action?” In response, I propose a follow-up question, drawn from the work we started with the Archives and Climate Change Teach-Ins as well the discussions that led to the formation of ProjectARCC: “how do we continue the momentum built during the Global Climate Strike, build on conversations held, and work towards the changes that our field and community needs?” My simple answer would be to find ways to keep on learning. What happens after you learn will be up to you. But, I believe, the answers will inevitably circle back to the initial two questions – why and what

As many recognize, Climate Change is neither a new topic nor is it in its early stages. Our house is on fire and for many it is starting to crumble. This blog post attempts to highlight the importance of starting and continuing conversations and actions around Climate Change and its relationship and impact on archivists and archives. The work did not end with the 2019 strike. That was simply the beginning.

Itza A. Carbajal is a Ph.D student at the University of Washington School of Information focusing her research on children and their records. Previously, she worked as the Latin American Metadata Librarian at LLILAS Benson after having received a Master of Science in Information Studies with a focus on archival management and digital records at the University of Texas at Austin School of Information. Before that, she obtained a dual-degree Bachelor of Arts in History and English with a concentration on creative writing and legal studies at the University of Texas at San Antonio. More information: www.itzacarbajal.com

Notes:
1. Itza A. Carbajal and Ted Lee, “If Not Now, When? Archivists Respond to Climate Change,” Archival Outlook, November/December 2019, |PAGE|, https://mydigitalpublication.com/publication/?m=30305&i=635670&p=8)
 2. “Over 4 Million Join 2 Days of Global Climate Strike,” Global Climate Strike, September 21, 2019, accessed October 6, 2020, https://globalclimatestrike.net/4-million/
3. “Climate Strike Teach-Ins,” Project ARCC Events, September 11, 2019, accessed October 12, 2020, https://projectarcc.org/2019/09/11/climate-strike-teach-ins/)
4. I couple these terms together for a reason as they most definitely mean different things and carry different implications, they are in my opinion similar in that they seek some sort of change.

Dispatches from a Distance: The New Normal is Not Normal

by Emily Higgs

This post is part of Dispatches from a Distance, a series of short posts o provide a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. Now that so many of us are returning to full- or part-time on-site work, we’d like to extend this series to include reflections on reopening, returning to work, and other anxieties facing the profession due to COVID-19. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas you’d like to share with other readers of the Electronic Records Section blog. Dispatches should be between 200-500 words and can be submitted here.


I feel extremely lucky to work at an institution that takes the pandemic seriously enough to have me stay home. It is much, much better than the alternatives. Still, I wouldn’t say I “work from home.” I have friends who work from home in non-pandemic times. They have the proper hardware, infrastructures, support networks, and communication channels to be able to do their work effectively from their personal dwelling.

As for me, I’m just doing the best with what I have. Back when the weather was warming up for the summer months, for example, I quickly realized that I was not equipped to appropriately control the climate of my “office” with my single window-unit A/C; why would it? I am usually at work for the hottest part of the day. Since then, I have moved apartments, which has done wonders for my productivity (I finally have room for a desk AND a chair). But still, this is an apartment set up for interim pandemic work and not a “real” home office. My internet connection frequently drops. My VPN kicks me off the network every 10 hours, often in the middle of a process I’m running. I have to log back in to systems every hour or so and 2FA-authenticate every time, which means I have to go run and find wherever I left my phone the last time I went downstairs. I’m constantly competing with my partner, who spends 9+ hours teaching on Zoom every day, for precious bandwidth.

We’re running “work from home” scenarios on infrastructures that were never designed to be persistent or long-term. Our IT systems aren’t set up for this, among other structures that typically support our work on-site. If this really is “the new normal,” we’re going to have to do some serious retooling with that in mind.

Dispatches from a Distance: Work/Work Balance

by Marcella Huggard

This post is part of Dispatches from a Distance, a series of short posts o provide a forum for those of us facing disruption in our professional lives, whether that’s working from home or something else, to stay engaged with the community. Now that so many of us are returning to full- or part-time on-site work, we’d like to extend this series to include reflections on reopening, returning to work, and other anxieties facing the profession due to COVID-19. There is no specific topic or theme for submissions–rather, this is a space to share your thoughts on current projects or ideas you’d like to share with other readers of the Electronic Records Section blog. Dispatches should be between 200-500 words and can be submitted here.


My special collections and archives library has started the reopening process, in preparation for the fall semester.  We’re not open to the public yet but expect we will be in a limited fashion for the fall, and in the meantime us staff in processing and conservation are coming into the building regularly to get back to working with the collections.

Transitioning to working strictly from home was one set of processes—physical, emotional, and mental. Transitioning to a hybrid situation is another set of processes. My staff are working approximately 50% in the office, 50% at home.  This means getting back to processing projects they haven’t really looked at since March, and it means continuing data cleanup projects they started in March, or starting new data cleanup projects from home. It means possibly inconsistent schedules depending on when the building is open (for some, this is good—variety is the spice of life!—for others, routine is essential and this is a disruption). It means adjusting to long stretches wearing a mask and getting sweaty extra quickly when schlepping boxes or archival supplies around. It means still not seeing some co-workers in person as we continue to work split shifts to lower the numbers of people in our building.

I’m taking our university administration’s direction to work from home as much as possible seriously, and I find that a lot of my regular work can be done remotely. Reviewing finding aids? Check. Ongoing data cleanup projects? Check. Research involving materials I’ve already retrieved from other archives and from electronically available resources? Check. Meetings with colleagues to plan projects and determine what we’ll do this fall?  Check. Professional reading, conferences, and workshops? Check. Data entry for processing projects? Check. This means extra disruption, though—“I’ll be able to get a full 4-hour shift in processing collections tomorrow afternoon,” I think happily to myself, until somebody schedules a meeting smack in the middle of what would have been that shift, and I’m adjusting yet again.

The guiding principles for this pandemic has been adaptability and flexibility, and I don’t see that changing anytime soon.

Estimating Energy Use for Digital Preservation, Part II

by Bethany Scott

This post is part of our BloggERS Another Kind of Glacier series. Part I was posted last week.


Conclusions

While the findings of the carbon footprint analysis are predicated on our institutional context and practices, and therefore may be difficult to directly extrapolate to other organizations’ preservation programs, there are several actionable steps and recommendations that sustainability-minded digital preservationists can implement right away. Getting in touch with any campus sustainability officers and investigating environmental sustainability efforts currently underway can provide enlightening information – for instance, you may discover that a portion of the campus energy grid is already renewable-powered, or that your institution is purchasing renewable energy credits (RECs). In my case, I was previously not aware that UH’s Office of Sustainability has published an improvement plan outlining its sustainability goals, including a 10% total campus waste reduction, a 15% campus water use reduction, and a 35% reduction in energy expenditures for campus buildings – all of which will require institutional support from the highest level of UH administration as well as partners among students, faculty, and staff across campus. I am proud to consider myself a partner in UH campus sustainability and look forward to promoting awareness of and advocating for our sustainability goals in the future.

As Keith Pendergrass highlighted in the first post of this series, there are other methods by which digital preservation practitioners can reduce their power draw and carbon footprint, thereby increasing the sustainability of their digital preservation programs – from turning off machines when not use or scheduling resource-intensive tasks for off-peak times, to making broader policy changes that incorporate sustainability principles and practices.

At UHL, one such policy change I would like to implement is a tiered approach to file format selection, through which we match the file formats and resolution of files created to the scale and scope of the project, the informational and research value of the content, the discovery and access needs of end users, and so on. Existing digital preservation policy documentation outlines file formats and specifications for preservation-quality archival masters for images, audio, and video files that are created through our digitization unit. However, as UHL conducts a greater number of mass digitization projects – and accumulates an ever larger number of high-resolution archival master files – greater flexibility is needed. By choosing to create lower-resolution files for some projects, we would reduce the total storage for our digital collections, thereby reducing our carbon footprint.

For instance, we may choose to retain large, high-resolution archival TIFFs for each page image of a medieval manuscript book, because researchers study minute details in the paper quality, ink and decoration, and the scribe’s lettering and handwriting. By contrast, a digitized UH thesis or dissertation from the mid-20th century could be stored long-term as one relatively small PDF, since the informational value of its contents (and not its physical characteristics) is what we are really trying to preserve. Similarly, we are currently discussing the workflow implications of providing an entire archival folder as a single PDF in our access system. Although the initial goal of this initiative was to make a larger amount of archival material quickly available online for patrons, the much smaller amount of storage needed to store one PDF vs. dozens or hundreds of high-res TIFF masters would also have a positive impact on the sustainability of the digital preservation and access systems.

UHL’s digital preservation policy also includes requirements for monthly fixity checking of a random sample of preservation packages stored in Archivematica, with a full fixity check of all packages to be conducted every three years during an audit of the overall digital preservation program. Frequent fixity checking is computationally intensive, though, and adds to the total energy expenditure of an institution’s digital preservation program. But in UHL’s local storage infrastructure, storage units run on the ZFS filesystem, which includes self-healing features such as internal checksum checks each time a read/write action is performed. This storage infrastructure was put in place in 2019, but we have not yet updated our policies and procedures for fixity checking to reflect the improved baseline durability of assets in storage.

Best practices calling for frequent fixity checks were developed decades ago – but modern technology like ZFS may be able to passively address our need for file integrity and durability in a less resource-intensive way. Through considered analysis matching the frequency of fixity checking to the features of our storage infrastructure, we may come to the conclusion that less frequent hands-on fixity checks, on a smaller random sample of packages, is sufficient moving forward. Since this is a new area of inquiry for me, I would love to hear thoughts from other digital preservationists about the pros and cons to such an approach – is fixity checking really the end-all, or could we use additional technological elements as part of a broader file integrity strategy over time?

Future work

I eagerly anticipate refining this electricity consumption research with exact figures and values (rather than estimates) when we are able to more consistently return to campus. We would like to investigate overhead costs such as lighting and HVAC in UHL’s server room, and we plan to grab point-in-time values physically from the power distribution units in the racks. Also, there may be additional power statistics that our Sys Admin can capture from the VMware hosts – which would allow us to begin on this portion of the research remotely in the interim. Furthermore, I plan to explore additional factors to provide a broader understanding of the impact of UHL’s energy consumption for digital systems and initiatives. By gaining more details on our total storage capacity, percentage of storage utilization, and GHG emissions per TB, we will be able to communicate about our carbon footprint in a way that will allow other libraries and archives to compare or estimate the environmental impact of their digital programs as well.

I would also like to investigate whether changes in preservation processes, such as the reduced hands-on fixity strategy outlined above, can have a positive impact on our energy expenditure – and whether this strategy can still provide a high level of integrity and durability for our digital assets over time. Finally, as a longer-term initiative I would like to take a deeper look at sustainability factors beyond energy expenditure, such as current practices for recycling e-waste on campus or a possible future life-cycle assessment for our hardware infrastructure. Through these efforts, I hope to help improve the long-term sustainability of UHL’s digital initiatives, and to aid other digital preservationists to undertake similar assessments of their programs and institutions as well.


Bethany Scott is Digital Projects Coordinator at the University of Houston Libraries, where she is a contributor to the development of the BCDAMS ecosystem incorporating Archivematica, ArchivesSpace, Hyrax, and Avalon. As a representative of UH Special Collections, she contributes knowledge on digital preservation, born-digital archives, and archival description to the BCDAMS team.

Estimating Energy Use for Digital Preservation, Part I

by Bethany Scott

This post is part of our BloggERS Another Kind of Glacier series. Part II will be posted next week.


Although the University of Houston Libraries (UHL) has taken steps over the last several years to initiate and grow an effective digital preservation program, until recently we had not yet considered the long-term sustainability of our digital preservation program from an environmental standpoint. As the leader of UHL’s digital preservation program, I aimed to address this disconnect by gathering information on the technology infrastructure used for digital preservation activities and its energy expenditures in collaboration with colleagues from UHL Library Technology Services and the UH Office of Sustainability. I also reviewed and evaluated the requirements of UHL’s digital preservation policy to identify areas where the overall sustainability of the program may be improved in the future by modifying current practices.

Inventory of equipment

I am fortunate to have a close collaborator in UHL’s Systems Administrator, who was instrumental in the process of implementing the technical/software elements of our digital preservation program over the past few years. He provided a detailed overview of our hardware and software infrastructure, both for long-term storage locations and for processing and workflows.

UHL’s digital access and preservation environment is almost 100% virtualized, with all of the major servers and systems for digital preservation – notably, the Archivematica processing location and storage service – running as virtual machines (VMs). The virtual environment runs on VMware ESXi and consists of five physical host servers that are part of a VMware vSAN cluster, which aggregates the disks across all five host servers into a single storage datastore.

VMs where Archivematica’s OS and application data reside may have their virtual disk data spread across multiple hosts at any given time. Therefore, exact resource use for digital preservation processes running via Archivematica is difficult to distinguish or pinpoint from other VM systems and processes, including UHL’s digital access systems. After discussing possible approaches for calculating the energy usage, we decided to take a generalized or blanket approach and include all five hosts. This calculation thus represents the energy expenditure for not only the digital preservation system and storage, but also for the A/V Repository and Digital Collections access systems. At UHL, digital access and preservation are strongly linked components of a single large ecosystem, so the decision to look at the overall energy expenditure makes sense from an ecosystem perspective.

In addition to the VM infrastructure described above, all user and project data is housed in the UHL storage environment. The storage environment includes both local shared network drive storage for digitized and born-digital assets in production, and additional shares that are not accessible to content producers or other end users, where data is processed and stored to be later served up by the preservation and access systems. Specifically, with the Archivematica workflow, preservation assets are processed through a series of automated preservation actions including virus scanning, file format characterization, fixity checking, and so on, and are then transferred and ingested to secure preservation storage.

UHL’s storage environment consists of two servers: a production unit and a replication unit. Archivematica’s processing shares are not replicated, but the end storage share is replicated. Again, for purposes of simplification, we generalized that both of these resources are being used as part of the digital preservation program when analyzing power use. Finally, within UHL’s server room there is a pair of redundant network switches that tie all the virtual and storage components together.

The specific hardware components that make up the digital access and preservation infrastructure described above include:

  • One (1) production storage unit: iXsystems True NAS M40 HA (Intel Xeon Silver 4114 CPU @ 2.2 Ghz and 128 GB RAM)
  • One (1) replication storage unit: iXsystems FreeNAS IXC-4224 P-IXN (Intel Xeon CPU E5-2630 v4 @ 2.2 Ghz and 128 GB RAM)
  • Two (2) disk expansion shelves: iXsystems ES60
  • Five (5) VMware ESXi hosts: Dell PowerEdge R630 (Intel Xeon CPU E5-2640 v4 @ 2.4 Ghz and 192 GB RAM)
  • Two (2) network switches: HPE Aruba 3810M 16SFP+ 2-slot

Electricity usage

Each of the hardware components listed above has two power supplies. However, the power draw is not always running at the maximum available for those power supplies and is dependent on current workloads, how many disks are in the units, and so on. Therefore, the power being drawn can be quantified but will vary over time.

With the unexpected closure of the campus due to COVID-19, I conducted this analysis remotely with the help of the UH campus Sustainability Coordinator. We compared the estimated maximum power draw based on the technical specifications for the hardware components, the draw when idle, and several partial power draw scenarios, with the understanding that the actual numbers will likely fall somewhere in this range.

Estimated power use and greenhouse gas emissions

 Daily Usage Total (Watts)Annual Total (kWh)Annual GHG (lbs)
Max9,09479,663.44124,175.71
95%8,639.375,680.268117,966.92
90%8,184.671,697.096111,758.14
85%7,729.967,713.924105,549.35
80%7,275.263,730.75299,340.565
Idle5,365.4647,001.4373,263.666

The estimated maximum annual greenhouse gas emissions derived from power use for the digital access and preservation hardware is over 124,000 pounds, or approximately 56.3 metric tons. To put this in perspective, it’s equivalent to the GHG emissions from nearly 140,000 miles driven by an average passenger vehicle, and to the carbon dioxide emissions from 62,063 pounds of coal burned or 130 barrels of oil consumed. While I hope to refine this analysis further in the future, for now these figures can serve as an entry point to discussions on the importance of environmental sustainability actions – and our plans to reduce our consumption – with Libraries administration, colleagues in the Office of Sustainability, and other campus leaders.

Part II, including conclusions and future work, will be posted next week.


Bethany Scott is Digital Projects Coordinator at the University of Houston Libraries, where she is a contributor to the development of the BCDAMS ecosystem incorporating Archivematica, ArchivesSpace, Hyrax, and Avalon. As a representative of UH Special Collections, she contributes knowledge on digital preservation, born-digital archives, and archival description to the BCDAMS team.

Integrating Environmental Sustainability into Policies and Workflows

by Keith Pendergrass

This is the first post in the BloggERS Another Kind of Glacier series.


Background and Challenges

My efforts to integrate environmental sustainability and digital preservation in my organization—Baker Library Special Collections at Harvard Business School—began several years ago when we were discussing the long-term preservation of forensic disk images in our collections. We came to the conclusion that keeping forensic images instead of (or in addition to) the final preservation file set can have ethical, privacy, and environmental issues. We decided that we would preserve forensic images only in use cases where there was a strong need to do so, such as a legal mandate in our records management program. I talked about our process and results at the BitCurator Users Forum 2017.

From this presentation grew a collaboration with three colleagues who heard me speak that day: Walker Sampson, Tessa Walsh, and Laura Alagna. Together, we reframed my initial inquiry to focus on environmental sustainability and enlarged the scope to include all digital preservation practices and the standards that guide them. The result was our recent article and workshop protocol.

During this time, I began aligning our digital archives work at Baker Library with this research as well as our organization-wide sustainability goals. My early efforts mainly took the form of the stopgap measures that we suggest in our article: turning off machines when not in use; scheduling tasks for off-peak network and electricity grid periods; and purchasing renewable energy certificates that promote additionality, which is done for us by Harvard University as part of its sustainability goals. As these were either unilateral decisions or were being done for me, they were straightforward and quick to implement.

To make more significant environmental gains along the lines of the paradigm shift we propose in our article, however, requires greater change. This, in turn, requires more buy-in and collaboration within and across departments, which often slows the process. In the face of immediate needs and other constraints, it can be easy for decision makers to justify deprioritizing the work required to integrate environmental sustainability into standard practices. With the urgency of the climate and other environmental crises, this can be quite frustrating. However, with repeated effort and clear reasoning, you can make progress on these larger sustainability changes. I found success most often followed continual reiteration of why I wanted to change policy, procedure, or standard practice, with a focus on how the changes would better align our work and department with organizational sustainability goals. Another key argument was showing how our efforts for environmental sustainability would also result in financial and staffing sustainability.

Below, I share examples of the work we have done at Baker Library Special Collections to include environmental sustainability in some of our policies and workflows. While the details may be specific to our context, the principles are widely applicable: integrate sustainability into your policies so that you have a strong foundation for including environmental concerns in your decision making; and start your efforts with appraisal as it can have the most impact for the time that you put in.

Policies

The first policy in which we integrated environmental sustainability was our technology change management policy, which controls our decision making around the hardware and software we use in our digital archives workflows. The first item we added to the policy was that we must dispose of all hardware following environmental standards for electronic waste and, for items other than hard drives, that we must donate them for reuse whenever possible. The second item involved more collaboration with our IT department, which controls computer refresh cycles, so that we could move away from the standard five-year replacement timeframe for desktop computers. The workstations that we use to capture, appraise, and process digital materials are designed for long service lives, heavy and sustained workloads, and easy component change out. We made our case to IT—as noted above, this was an instance where the complementarity of environmental and financial sustainability was key—and received an exemption for our workstations, which we wrote into our policy to ensure that it becomes standard practice.

We can now keep the workstations as long as they remain serviceable and work with IT to swap out components as they fail or need upgrading. For example, we replaced our current workstations’ six-year-old spinning disk drives with solid state drives when we updated from Windows 7 to Windows 10, improving performance while maintaining compliance with IT’s security requirements. Making changes like this allows us to move from the standard five-year to an expected ten-year service life for these workstations (they are currently at 7.5 years). While the policy change and subsequent maintenance actions are small, they add up over time to provide substantial reductions in the full life-cycle environmental and financial costs of our hardware.

We also integrated environmental sustainability into our new acquisition policy. The policy outlines the conditions and terms of several areas that affect the acquisition of materials in any format: appraisal, agreements, transfer, accessioning, and documentation. For appraisal, we document the value and costs of a potential acquisition, but previously had been fairly narrow in our definition of costs. With the new policy, we broadened the costs that were in scope for our acquisition decisions and as part of this included environmental costs. While only a minor point in the policy, it allows us to determine environmental costs in our archival and technical appraisals, and then take those costs into account when making an acquisition decision. Our next step is to figure out how best to measure or estimate environmental impacts for consistency across potential acquisitions. I am hopeful that explicitly integrating environmental sustainability into our first decision point—whether to acquire a collection—will make it easier to include sustainability in other decision points throughout the collection’s life cycle.

Workflows

In a parallel track, we have been integrating environmental sustainability into our workflows, focusing on the appraisal of born-digital and audiovisual materials. This is a direct result of the research article noted above, in which we argue that focusing on selective appraisal can be the most consequential action because it affects the quantity of digital materials that an organization stewards for the remainder of those materials’ life cycle and provides an opportunity to assign levels of preservation commitment. While conducting in-depth appraisal prior to physical or digital transfer is ideal, it is not always practical, so we altered our workflows to increase the opportunities for appraisal after transfer.

For born-digital materials, we added an appraisal point during the initial collection inventory, screening out storage media whose contents are wholly outside of our collecting policy. We then decide on a capture method based on the type of media: we create disk images of smaller-capacity media but often package the contents of larger-capacity media using the bagit specification (unless we have a use case that requires a forensic image) to reduce the storage capacity needed for the collection and to avoid the ethical and privacy issues previously mentioned. When we do not have control of the storage media—for network attached storage, cloud storage, etc.—we make every attempt to engage with donors and departments to conduct in-depth appraisal prior to capture, streamlining the remaining appraisal decision points.

After capture, we conduct another round of appraisal now that we can more easily view and analyze the digital materials across the collection. This tends to be a higher-level appraisal during which we make decisions about entire disk images or bagit bags, or large groupings within them. Finally (for now), we conduct our most granular and selective appraisal during archival processing when processing archivists, curators, and I work together to determine what materials should be part of the collection’s preservation file set. As our digital archives program is still young, we have not yet explored re-appraisal at further points of the life cycle such as access, file migration, or storage refresh.

For audiovisual materials, we follow a similar approach as we do for born-digital materials. We set up an audiovisual viewing station with equipment for reviewing audiocassettes, microcassettes, VHS and multiple Beta-formatted video tapes, multiple film formats, and optical discs. We first appraise the media items based on labels and collection context, and with the viewing station can now make a more informed appraisal decision before prioritizing for digitization. After digitization, we appraise again, making decisions on retention, levels of preservation commitment, and access methods.

While implementing multiple points of selective appraisal throughout workflows is more labor intensive than simply conducting an initial appraisal, several arguments moved us to take this approach: it is a one-time labor cost that helps us reduce on-going storage and maintenance costs; it allows us to target our resources to those materials that have the most value for our community; it decreases the burden of reappraisal and other information maintenance work that we are placing on future archivists; and, not least, it reduces the on-going environmental impact of our work.


Keith Pendergrass is the digital archivist for Baker Library Special Collections at Harvard Business School, where he develops and oversees workflows for born-digital materials. His research and general interests include integration of sustainability principles into digital archives standard practice, systems thinking, energy efficiency, and clean energy and transportation. He holds an MSLIS from Simmons College and a BA from Amherst College.

ml4arc – Machine Learning, Deep Learning, and Natural Language Processing Applications in Archives

by Emily Higgs


On Friday, July 26, 2019, academics and practitioners met at Wilson Library at UNC Chapel Hill for “ml4arc – Machine Learning, Deep Learning, and Natural Language Processing Applications in Archives.” This meeting featured expert panels and participant-driven discussions about how we can use natural language processing – using software to understand text and its meaning – and machine learning – a branch of artificial intelligence that learns to infer patterns from data – in the archives.

The meeting was hosted by the RATOM Project (Review, Appraisal, and Triage of Mail).  The RATOM project is a partnership between the State Archives of North Carolina and the School of Information and Library Science at UNC Chapel Hill. RATOM will extend the email processing capabilities currently present in the TOMES software and BitCurator environment, developing additional modules for identifying and extracting the contents of email-containing formats, NLP tasks, and machine learning approaches. RATOM and the ml4arc meeting are generously supported by the Andrew W. Mellon Foundation.

Presentations at ml4arc were split between successful applications of machine learning and problems that could potentially be addressed by machine learning in the future. In his talk, Mike Shallcross from Indiana University identified archival workflow pain points that provide opportunities for machine learning. In particular, he sees the potential for machine learning to address issues of authenticity and integrity in digital archives, PII and risk mitigation, aggregate description, and how all these processes are (or are not) scalable and sustainable. Many of the presentations addressed these key areas and how natural language processing and machine learning can lend aid to archivists and records managers. Additionally, attendees got to see presentations and demonstrations from tools for email such as RATOM, TOMES, and ePADD. Euan Cochrane also gave a talk about the EaaSI sandbox and discussed potential relationships between software preservation and machine learning.

The meeting agenda had a strong focus on using machine learning in email archives; collecting and processing emails is a large encumbrance in many archives that can stand to benefit greatly from machine learning tools. For example, Joanne Kaczmarek from the University of Illinois presented a project processing capstone email accounts using an e-discovery and predictive coding software called Ringtail. In partnership with the Illinois State Archives, Kaczmarek used Ringtail to identify groups of “archival” and “non-archival” emails from 62 capstone accounts, and to further break down the “archival” category into “restricted” and “public.” After 3-4 weeks of tagging training data with this software, the team was able to reduce the volume of emails by 45% by excluding “non-archival” messages, and identify 1.8 million emails that met the criteria to be made available to the public. Manually, this tagging process could have easily taken over 13 years of staff time.

After the ml4arc meeting, I am excited to see the evolution of these projects and how natural language processing and machine learning can help us with our responsibilities as archivists and records managers. From entity extraction to PII identification, there are myriad possibilities for these technologies to help speed up our processes and overcome challenges.


Emily Higgs is the Digital Archivist for the Swarthmore College Peace Collection and Friends Historical Library. Before moving to Swarthmore, she was a North Carolina State University Libraries Fellow. She is also the Assistant Team Leader for the SAA ERS section blog.


Securing Our Digital Legacy: An Introduction to the Digital Preservation Coalition

by Sharon McMeekin, Head of Workforce Development


Nineteen years ago, the digital preservation community gathered in York, UK, for the Cedars Project’s Preservation 2000 conference. It was here that the first seeds were sown for what would become the Digital Preservation Coalition (DPC). Guided by Neil Beagrie, then of King’s College London and Jisc, work to establish the DPC continued over the next 18 months and, in 2002, representatives from 7 organizations signed the articles that formally constituted the DPC.

In the 17 years since its creation, the DPC has gone from strength to strength, the last 10 years under the leadership of current Executive Director, William Kilbride. The past decade has been a particular period of growth, as shown by the rise in the staff compliment from 2 to 7. We now have more than 90 members who represent an increasingly diverse group of organizations from 12 countries across sectors including cultural heritage, higher education, government, banking, industry, media, research and international bodies.

DPC staff, chair, and president

Our mission at the DPC is to:

[…] enable our members to deliver resilient long-term access to digital content and services, helping them to derive enduring value from digital assets and raising awareness of the strategic, cultural and technological challenges they face.

We work to achieve this through a broad portfolio of work across six strategic areas of activity: Community Engagement, Advocacy, Workforce Development, Capacity Building, Good Practice and Standards, and Management and Governance. Everything we do is member-driven and they guide our activities through the DPC Board, Representative Council, and Sub-Committees which oversee each strategic area.

Although the DPC is driven primarily by the needs of our members, we do also aim to contribute to the broader digital preservation community. As such, many of the resources we develop are made publicly available. In the remainder of this blog post, I’ll be taking a quick look at each of the DPC’s areas of activity and pointing out resources you might find useful.

1 | Community Engagement

First up is our work in the area of Community Engagement. Here our aim is to enable “a growing number of agencies and individuals in all sectors and in all countries to participate in a dynamic and mutually supportive digital preservation community”. Collaboration is a key to digital preservation success, and we hope to encourage and support it by helping build an inclusive and active community. An important step in achieving this aim was the publication of our ‘Inclusion and Diversity Policy’ in 2018.

Webinars are key to building community engagement amongst our members. We invite speakers to talk to our members about particular topics and share experiences through case studies. These webinars are recorded and made available for members to watch at a later date. We also run a monthly ‘Members Lounge’ to allow informal sharing of current work and discussion of issues as they arise and, on the public end of the website, a popular blog, covering case studies, new innovations, thought pieces, recaps of events and more.

2 | Advocacy

Our advocacy work campaigns “for a political and institutional climate more responsive and better informed about the digital preservation challenge”, as well as “raising awareness about the new opportunities that resilient digital assets create”. This tends to happen on several levels, from enabling and aiding members’ advocacy efforts within their own organizations, through raising legislators’ and policy makers’ awareness of digital preservation, to educating the wider populace.

To help those advocating for digital preservation within their own context, we have recently published our Executive Guide. The Guide provides a grab bag of statements and facts to help make the case for digital preservation, including key messages, motivators, opportunities to be gained and risks faced. We welcome any suggestions for additions or changes to this resource!

Our longest running advocacy activity is the biannual Digital Preservation Awards, last held in 2018. The Awards aim to celebrate excellence and innovation in digital preservation across a range of categories. This high-profile event has been joined in recent years by two other activities with a broad remit and engagement. The first is the Bit List of Digitally Endangered Species, which highlights at risk digital information, showing both where preservation work is needed and where efforts have been successful. Finally, there is World Digital Preservation Day (WDPD), a day to showcase digital preservation around the globe. Response to WDPD since its inauguration in 2017 has been exceptionally positive. There’s been tweets, blogs, events, webinars, and even a song and dance! This year WDPD is scheduled for 7th November, and we encourage everyone to get involved.

The nominees, winners, and judges for the 2018 Digital Preservation Awards

3 | Workforce Development

Workforce Development activities at the DPC focus on “providing opportunities for our members to acquire, develop and retain competent and responsive workforces that are ready to address the challenges of digital preservation”. There are many threads to this work, but key for our members are the scholarships we provide through our Career Development Fund and free access to the training courses we run.

At the moment we offer three training courses: ‘Getting Started with Digital Preservation’, ‘Making Progress with Digital Preservation’ and ‘Advocacy for Digital Preservation’, but we have plans to expand the portfolio in the coming year. All of our training courses are available to non-members for a modest fee, but at the moment are mostly held face to face in the UK and Ireland. A move to online training provision is, however, planned for 2020. We are also happy to share training resources and have set up a Slack workspace to enable this and greater collaboration with regards to digital preservation training.

Other resources that may prove helpful that fall under our Workforce Development heading include the ‘Digital Preservation Handbook’, a free online publication covering a digital preservation in the broadest sense. The Handbook aims to be a comprehensive guide for those starting with digital preservation, whilst also offering links additional resources. The content for Handbook was crowd-sourced from experts and has all been peer reviewed. Another useful and slightly less well-known series of publications are our ‘Topical Notes’, originally funded by the National Archives of Ireland, and intended to create resources that introduced key digital preservation issues to a non-specialist audience (particularly record creators). Each note is only two pages long and jargon-free, so a great resource to help raise awareness.

4 | Capacity Building

Perhaps the biggest area of DPC work covers Capacity Building, that is “supporting and assuring our members in the delivery and maintenance of high quality and sustainable digital preservation services through knowledge exchange, technology watch, research and development.” This can take the form of direct member support, helping with tasks such as policy development and procurement, as well as participation in research projects.

Our more advanced publication series, the Technology Watch Reports, also sit below the Capacity Building heading. Written by experts and peer reviewed, each report takes a deeper dive into a particular digital preservation issue. Our latest report on Email Preservation is currently available for member preview but will be publicly released shortly. Some other ‘classics’ include Preserving Social Media, Personal Digital Archiving, and the always popular The Open Archival Information System (OAIS) Reference Model: Introductory Guide (2nd Edition) (I always tell those new to OAIS to start here rather than the 200+ dry pages of the full standard!)

We also run around six thematic Briefing Day events a year on topical issues. As with the training, these are largely held in the UK and Ireland, but they are now also live-streamed for members. We support a number of Thematic Task Forces and Working Groups, with the ‘Web Archiving and Preservation Working Group’ being particularly active at the moment.

DPC members engaged in a brainstorming session

5 | Good Practice and Standards

Our Good Practice and Standards stream of work was a new addition as of the publication of our latest Strategic Plan (2018-22). Here we are contributing work towards “identifying and developing good practice and standards that make digital preservation achievable, supporting efforts to ensure services are tightly matched to shifting requirements.”

We hope this work will allow us to input into standards with the needs of our members in mind and facilitate the sharing of good practice that already happens across the coalition. This has already borne fruit in the shape of the forthcoming DPC Rapid Assessment Model, a maturity model to help with benchmarking digital preservation progress within your organization. You can read a bit more about it in this blog post by Jen Mitcham and the model will be released publicly in late September.

We also work with vendors through our Supporter Program and events like our ‘Digital Futures’ series to help bridge the gap between practice and solutions.

6 | Management and Governance

Our final stream of work is less focused on digital preservation and instead on “ensuring the DPC is a sustainable, competent organization focussed on member needs, providing a robust and trusted platform for collaboration within and beyond the Coalition.” This obviously relates to both the viability of the organization and well as good governance. It is essential that everything we do is transparent and that the members can both direct what we do and ensure accountability.

The Future

Before I depart, I thought I would share a little bit about some of our plans for the future. In the next few years we’ll be taking steps to further internationalize as an organization. At the moment our membership is roughly 75% UK and Ireland and 25% international, but those numbers are gradually moving closer and we hope that continues. With that in mind we will be investigating new ways to deliver services and resources online, as well as in languages beyond English. We’re starting this year with the publication of our prospectus in German, French and Spanish.

We’re also beginning to look forward to our 20th anniversary in 2022. It’s a Digital Preservation Awards Year, so that’s reason enough for a celebration, but we will also be welcoming the digital preservation community to Glasgow, Scotland, as hosts of iPRES 2022. Plans are already afoot for the conference, and we’re excited to make it a showcase for both the community and one of our home cities. Hopefully we’ll see you there, but I encourage you to make use of our resources and to get in touch soon!

Access our Knowledge Base: https://www.dpconline.org/knowledge-base

Follow us on Twitter: https://twitter.com/dpc_chat

Find out how to join us: https://www.dpconline.org/about/join-us


Sharon McMeekin is Head of Workforce Development with the Digital Preservation Coalition and leads on work including training workshops and their scholarship program. She is also Managing Editor of the ‘Digital Preservation Handbook’. With Masters degrees in Information Technology and Information Management and Preservation, both from the University of Glasgow, Sharon is an archivist by training, specializing in digital preservation. She is also an ILM qualified trainer. Before joining the DPC she spent five years as Digital Archivist with RCAHMS. As an invited speaker, Sharon presents on digital preservation at a wide variety of training events, conferences and university courses.

Student Impressions of Tech Skills for the Field

by Sarah Nguyen


Back in March, during bloggERS’ Making Tech Skills a Strategic Priority series, we distributed an open survey to MLIS, MLS, MI, and MSIS students to understand what they know and have experienced in relation to  technology skills as they enter the field. 

To be frank, this survey stemmed from personal interests since I just completed an MLIS core course on Research, Assessment, and Design (re: survey to collect data on current landscape). I am also interested in what skills I need to build/what class I should sign up for my next quarter (re: what tech skills do I need to become hire-able?). While I feel comfortable with a variety of tech-related tools and tasks, I’ve been intimidated by more “high-level”computational languages for some years. This survey was helpful for exploring what skills other LIS pre-professionals are interested in and which skills will help us make these costly degrees worth the time and financial investment that is traditionally required to enter a stable archive or library position.

Method

The survey was open for one month on Google Forms, and distributed to SAA communities, @SAA_ERS Twitter, the Digital Curation Google Group, and a few MLIS university program listservs. There were 15 questions and we received responses from 51 participants. 

Results & Analysis

Here’s a superficial scan of the results. If you would like to come up with your own analyses, feel free to view the raw data on GitHub.

Figure 1. Technology-related skills that students want to learn

The most popular technology-related skill that students are interested in learning is data management (manipulating, querying, transforming data, etc.). This is a pretty broad topic as it involves many tools and protocols which can vary between a GUI or scripts. A separate survey that does a breakdown of specific data management tools might be in order, especially since these types of skills can be divided into specialty courses, workshops, which then translates into a specific job position. A more specific survey could help demonstrate what types of skills need to be taught in a full semester-long course, or what skills can be covered in a day-long or multi-day workshop.

It was interesting to see that even in this day and age where social media management can be second nature to many students’ daily lives, there was still a notable interest in understanding how to make this a part of their career. This makes me wonder what value students have in knowing how to strategically manage an archives’ social media account. How could this help with the job market, as well as an archival organization’s main mission?

Looking deeper into the popular data management category, it would be interesting to know the current landscape of knowledge or pedagogy in communicating with IT (e.g. project management and translating users’ needs). In many cases, archivists are working separately from but dependently on IT system administrators, and it can be frustrating since either department may have distinct concerns about a server or other networks. In June’s NYC Preservathon/Preservashare 2019, there was mention that IT exists to make sure servers and networks are spinning at all hours of the day. Unlike archivists, they are not concerned about the longevity of the content, obsolescence of file formats, or the software to render files. Could it be useful to have a course on how to effectively communicate and take control of issues that can be fuzzy lines between archives, data management, and IT? Or as one survey respondent said, “I think more basic programming courses focusing on tech languages commonly used in archives/libraries would be very helpful.” Personally, I’ve only learned this from experience working in different tech-related jobs. This is not a subject I see on my MLIS course catalog, nor a discussion at conference workshops. 

The popularity of data management skills sparked another question: what about knowledge around computer networks and servers? Even though LTO will forever be in our hearts, cloud storage is also a backup medium we’re budgeting for and relying on. Same goes for hosting a database for remote access and/or publishing digital files. A friend mentioned this networking workshop for non-tech savvy learners—Grassroots Networking: Network Administration for Small Organizations/Home Organizations—which could be helpful for multiple skill types including data management, digital forensics, web archiving, web development, etc. This is similar to a course that could be found in computer science or MLIS-adjacent information management departments.

Figure 2. Have you taken/will you take technology-focused courses in your program?
Figure 3. Do you feel comfortable defining the difference between scripting and programming

I can’t say this is statistically significant, but the inverse relationship between 15.7% who have not/will not take a technology-focused course in their program, compared to 78.4% of respondents who are not aware of the difference between scripting and programming is eyebrow raising. According to an article in PLOS Computational Biology,  the term “script” means “something that is executed directly as is”, while a “program[… is] something that is explicitly compiled before being used. The distinction is more one of degree than kind—libraries written in Python are actually compiled to bytecode as they are loaded, for example—so one other way to think of it is “things that are edited directly” and “things that are not edited directly” (Wilson et al 2017). This distinction is important since more archives are acquiring, processing and sharing collections that rely on the archivist to execute jobs such as web-scraping or metadata management (scripts) or archivists who can build and maintain a database (programming). These might be interpreted as trick questions, but the particular semantics and what is considered technology-focused is something modern library, archives, and information programs might want to consider. 

Figure 4. How do you approach new technology?

Figure 4 illustrates the various ways students tackle new technologies. Reading the f* manual (RTFM) and Searching forums are the most common approaches to navigating technology. Here are quotes from a couple students on how they tend to learn a new piece of software:

  • “break whatever I’m trying to do with a new technology into steps and look for tutorials & examples related to each of those steps (i.e. Is this step even possible with X, how to do it, how else to use it, alternatives for accomplishing that step that don’t involve X)”
  • “I tend to google “how to….” for specific tasks and learn new technology on a task-by-task basis.”

In the end, there was overwhelming interest in “more project-based courses that allow skills from other tech classes to be applied.” Unsurprisingly, many of us are looking for full-time, stable jobs after graduating and the “more practical stuff, like CONTENTdm for archives” seems to be a pressure felt in-order to get an entry-level position. Not just entry too; as continuing education learners, there is also a push to strive for more—several respondents are looking for a challenge to level up their tech skills: 

  • “I want more classes with hands-on experience with technical skills. A lot of my classes have been theory based or else they present technology to us in a way that is not easy to process (i.e. a lecture without much hands-on work).”
  • “Higher-level programming, etc. — everything on offer at my school is entry level. Also digital forensics — using tools such as BitCurator.”
  • “Advanced courses for the introductory courses. XML 2 and python 2 to continue to develop the skills.”
  • “A skills building survey of various code/scripting, that offers structured learning (my professor doesn’t give a ton of feedback and most learning is independent, and the main focus is an independent project one comes up with), but that isn’t online. It’s really hard to learn something without face to face interaction, I don’t know why.”

It’ll be interesting to see what skills recent MLIS, MLS, MIS, and MSIM graduates will enter the field with. While many job postings list certain software and skills as requirements, will programs follow suit? I have a feeling this might be a significant question to ask in the larger context of what is the purpose of this Master’s degree and how can the curriculum keep up with the dynamic technology needs of the field.

Disclaimer: 

  1. Potential bias: Those taking the survey might be interested in learning higher-level tech skills because they do not already know the skills, while those who are already tech-savvy might avoid a basic survey such as this one since they already know the skills. This may put a bias on the survey population consisting of mostly novice tech students.   
  2. More data on specific computational languages and technology courses taken are available in the GitHub csv file. As mentioned earlier, I just finished my first year as a part-time MLIS student, so I’m still learning the distinct jobs and nature of the LIS field. Feel free to submit an issue to the GitHub repo, or tweet me @snewyuen if you’d like to talk more about what this data could mean.

Bibliography

Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, Teal TK (2017) Good enough practices in scientific computing. PLoS Computational Biology 13(6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510


Sarah Nguyen with a Uovo storage truck

Sarah Nguyen is an advocate for open, accessible, and secure technologies. While studying as an MLIS candidate with the University of Washington iSchool, she is expressing interests through a few gigs: Project Coordinator for Preserve This Podcast at METRO, Assistant Research Scientist for Investigating & Archiving the Scholarly Git Experience at NYU Libraries, and archivist for the Dance Heritage Coalition/Mark Morris Dance Group. Offline, she can be found riding a Cannondale mtb or practicing movement through dance. (Views do not represent Uovo. And I don’t even work with them. Just liked the truck.)

The Theory and Craft of Digital Preservation: An interview with Trevor Owens

BloggERS! editor, Dorothy Waugh recently interviewed Trevor Owens, Head of Digital Content Management at the Library of Congress about his recent–and award-winning–book, The Theory and Craft of Digital Preservation.


Who is this book for and how do you imagine it being used?

I attempted to write a book that would be engaging and accessible to anyone who cares about long-term access to digital content and wants to devote time and energy to helping ensure that important digital content is not lost to the ages. In that context, I imagine the primary audience as current and emerging professionals that work to ensure enduring access to cultural heritage: archivists, librarians, curators, conservators, folklorists, oral historians, etc. With that noted, I think the book can also be of use to broader conversations in information science, computer science and engineering, and the digital humanities. 

Tell us about the title of the book and, in particular, your decision to use the word “craft” to describe digital preservation.

The words “theory” and “craft” in the title of the book forecast both the structure and the two central arguments that I advance in the book. 

The first chapters focus on theory. This includes tracing the historical lineages of preservation in libraries, archives, museums, folklore, and historic preservation. I then move to explore work in new media studies and platform studies to round out a nuanced understanding of the nature of digital media. I start there because I think it’s essential that cultural heritage practitioners moor their own frameworks and approaches to digital preservation in a nuanced understanding of the varied and historically contingent nature of preservation as a concept and the complexities of digital media and digital information. 

The latter half of the book is focused on what I describe as the “craft” of digital preservation. My use of the term craft is designed to intentionally challenge the notion that work in digital preservation should be understood as “a science.” Given the complexities of both what counts as preservation in a given context and the varied nature of digital media, I believe it is essential that we explicitly distance ourselves from many of the assumptions and baggage that come along with the ideology of “digital.” 

We can’t build some super system that just solves digital preservation. Digital preservation requires making judgement calls. Digital preservation requires the applied thinking and work of professionals. Digital preservation is not simply a technical question, instead digital preservation involves understanding the nature of the content that matters most to an intended community and making judgement calls about how best to mitigate risks of potential loss of access to that content. As a result of my focus on craft, I offer less of a “this is exactly what one should do” approach, and more of an invitation to join the community of practice that is developing knowledge and honing and refining their craft. 

Reading the book, I was so happy to see you make connections between the work that we do as archivists and digital preservation. Can you speak to that relationship and why you think it is important?

Archivists are key players in making preservation happen and the emergence of digital content across all kinds of materials and media that archivists work with means that digital preservation is now a core part of the work that archivists do. 

I organize a lot of my discussion about the craft of digital preservation around archival concepts as opposed to library science or curatorial practices. For example, I talk about arrangement and description. I also draw from ideas like MPLP as key concepts for work in digital preservation and from work on community archives. 

Old Files. From XKCD: webcomic of romance, sarcasm, math, and language. 2014

Broadly speaking, in the development of digital media, I see a growing context collapse between formats that had been distinct in the past. That is, conservation of oil paintings, management and preservation of bound volumes, and organizing and managing heterogeneous sets of records have some strong similarities but there are also a lot of differences. The born digital incarnations of those works; digital art, digital publishing, and digital records, are all made up of digital information and file formats, and face a related set of digital preservation issues.

With that note, I think archival practice tends to be particularly well-suited for dealing with the nature of digital content. Archives have long dealt with the problem of scale that is now intensified by digital data. At the same time, archivists have also long dealt with hybrid collections and complex jumbles of formats, forms, and organizational structures, which is also increasingly the case for all types of forms that transition into born-digital content. 

You emphasize that the technical component of digital preservation is sometimes prioritized over social, ethical, and organizational components. What are the risks implicit in overlooking these other important components?

Digital preservation is not primarily a technical problem. The ideology of “digital” is that things should be faster, cheaper, and automatic. The ideology of “digital” suggests that we should need less labor, less expertise, and less resources to make digital stuff happen. If we let this line of thinking infect our idea of digital preservation we are going to see major losses of important data, we will see major failures to respect ethical and privacy issues relating to digital content, and lots of money will be spent on work that fails to get us the results that we want.

In contrast, when we take as a starting point that digital preservation is about investing resources in building strong organizations and teams who participate in the community of practice and work on the complex interactions that emerge between competing library and archives values then we have a chance of both being effective but also building great and meaningful jobs for professionals.

If digital preservation work is happening in organizations that have an overly technical view of the problem, it is happening despite, not because, of their organization’s approach. That is, there are people doing the work, they just likely aren’t getting credit and recognition for doing that work. Digital preservation happens because of people who understand that the fundamental nature of the work requires continual efforts to get enough resources to meaningfully mitigate risks of loss, and thoughtful decision making about building and curating collections of value to communities.

Considerations related to access and discovery form a central part of the book and you encourage readers to “Start simple and prioritize access,” an approach that reminded me of many similar initiatives focused on getting institutions started with the management and preservation of born-digital archives. Can you speak to this approach and tell us how you see the relationship between preservation and access?

A while back, OCLC ran an initiative called “walk before you run,” focused on working with digital archives and digital content. I know it was a major turning point for helping the field build our practices. Our entire community is learning how to do this work and we do it together. We need to try things and see which things work best and which don’t. 

It’s really important to prioritize access in this work. Preservation is fundamentally about access in the future. The best way you know that something will be accessible in the future is if you’re making it accessible now. Then your users will help you. They can tell you if something isn’t working. The more that we can work end-to-end, that is, that we accession, process, arrange, describe, and make available digital content to our users, the more that we are able to focus on how we can continually improve that process end-to-end. Without having a full end-to-end process in place, it’s impossible to zoom out and look at that whole sequence of processes to start figuring out where the bottlenecks are and where you need to focus on working to optimize things. 


Dr. Trevor Owens is a librarian, researcher, policy maker, and educator working on digital infrastructure for libraries. Owens serves as the first Head of Digital Content Management for Library Services at the Library of Congress. He previously worked as a senior program administrator at the United States Institute of Museum and Library Services (IMLS) and, prior to that, as a Digital Archivist for the National Digital Information Infrastructure and Preservation Program and as a history of science curator at the Library of Congress. Owens is the author of three books, including The Theory and Craft of Digital Preservation and Designing Online Communities: How Designers, Developers, Community Managers, and Software Structure Discourse and Knowledge Production on the Web. His research and writing has been featured in: Curator: The Museum Journal, Digital Humanities Quarterly, The Journal of Digital Humanities, D-Lib, Simulation & Gaming, Science Communication, New Directions in Folklore, and American Libraries. In 2014 the Society for American Archivists granted him the Archival Innovator Award, presented annually to recognize the archivist, repository, or organization that best exemplifies the “ability to think outside the professional norm.”