In the lead-up to the 2020 BitCurator Users Forum, the session I looked forward to the most was titled “GREAT QUESTION!”. This was a returning session from the 2019 BitCurator conference, and was an opportunity for attendees to anonymously ask digital preservation questions they might not be comfortable asking otherwise. The session showed that no question was too “simple” or “basic” to be worth discussing, and that no matter where you are as a practitioner, there’s always more to learn.
Last year’s session was at the very end of the forum, and ended things on a fun note. Attendees could submit questions anonymously at any point in the conference, and moderators presented these questions for discussion until the session ended. Most of the questions turned into discussions about tools, methods, perspectives, and professional philosophies in a way that made these topics accessible and exciting. It was reassuring to see how much we’re still collectively figuring out as a field, and that sometimes digital preservation work is less about best practices, and more about adapting those practices so they work for you and your institution.
This year’s session, like the rest of the forum, took place over Zoom. The format didn’t change much from last year, aside from switching to a virtual session and adding more ways to answer questions. Question submissions went to a queue visible to the moderators, as well as an Airtable board where everyone could see both questions and answers. This allowed attendees to see and respond to other questions while the main conversation addressed questions one at a time. Questions in the response queue were prioritized through progressive stacking, a technique that gives priority to marginalized voices. In this case, there was a box on the question submission form which attendees could check if they were part of a group historically underrepresented or marginalized in digital preservation spaces (e.g. attendees of color, LGBTQ attendees). Submissions with this box checked were discussed first.
Attendees could submit answers anonymously via Airtable, answer verbally on Zoom, or respond in the chat. Further discussion (and chatter) happened both out loud and in the chat It was fun and conversational, but never chaotic. Question topics ranged from virus scanning and fixity checking, to tool recommendations and workload distribution. There were also questions about advocating for digital preservation, the ethical issues inherent in using law enforcement-derived tools for digital archives work, and handling the emotional toll of doing this work in the current moment. Each question sparked thoughtful, informative, and sometimes funny responses, and the option to submit written answers allowed attendees to keep answering questions after the session ended. The question submission form was left open as well, in case anyone thought of a question once the session was over.
Everyone seemed to get a lot out of the experience, and several people mentioned wanting to do something like it at future conferences, or on a regular basis. It was heartening to see that others had the same questions I did; it really emphasized how much we’re all still learning, and how important it is to have a community of fellow practitioners you can rely on and share ideas with. I liked how casual the session felt; since we used the chat in addition to speaking out loud and answering questions via Airtable, it was easier to expand on a point, talk about what worked, and commiserate about what didn’t. This made it a lot less intimidating to jump into the discussion; no one was staring at you, and you weren’t the only person speaking, you were just chatting with colleagues who had the same kinds of experiences, questions, and problems as you. I’m looking forward to seeing more conference sessions like this in the future, and hope to see similar ones in other venues.
Tori Maches is the Digital Archivist at UC San Diego Library. Her work currently includes developing and implementing born-digital processing workflows in Special Collections & Archives, and managing the Library’s overall web archiving work.
San José is in many ways an apt location for a tech-centered library conference like Code4Lib. It is the largest city in Santa Clara Valley (aka Silicon Valley) and home to San Jose State University, one of the biggest library science programs in the country. Yet the tone of the 14th annual Code4Lib conference, which convened on February 19-22, 2019, was cautious and at times critical of the “big tech” landscape. In her opening keynote, Sarah Roberts, Assistant Professor of Information Studies at UCLA, talked about her research on social media content moderation. She said that while this work is deemed critical by social media companies to manage lewd or disturbing content, it is also emotionally taxing, low-paying, and executed by a mostly invisible global labor force. In keeping this work hidden, consumers are led to believe that social media content is either unmediated, or that content moderation is somehow automated. This push towards transparency and openness—in how we manipulate our code, technologies, content, and even our labor practices—was a recurring theme throughout the conference.
There were a number of archivists and archives-adjacent folks attending the conference and a handful of interesting sessions related to digital archives. In a talk entitled “Natural Language Processing for Discovery of Born-Digital Records,” NCSU Libraries Fellow Emily Higgs discussed her exploration of named entity recognition (NER) to aid in describing digital collections. Using the open source natural language processing software, spaCy, Higgs extracted personal names to a CSV file, with entities ranked by frequency, and included the top five to ten names in the Scope and Content section of the finding aid. She also tested a discovery tool, Open Semantic Desktop Search, to enable researchers to more easily browse through a digital collection using the reading room computer. She noted that while it offered faceted browsing as well as fuzzy and semantic search capabilities, the major drawback was the long indexing time for larger digital collections.
In the realm of web-archiving, Ilya Kreymer of Rhizome presented a demo of Webrecorder, a set of free and open source tools for creating and viewing web archives. Funded by two Mellon Foundation grants, Webrecorder is a browser-based application that focuses on capturing high-fidelity web archives. Unlike the more traditional web crawlers, Webrecorder is meant to be used as a more curated approach to web archiving—think quality over quantity. In his demo, Kreymer quickly and easily archived audio files from a SoundCloud library as well as the most recent Code4Lib conference hashtag posts on Twitter. One of Webrecorder’s most impressive features is its ability to emulate legacy browsers to record things like flash-based websites. Webrecorder has a lot going for it—it’s free and easy to use, with an attractive and intuitive interface. While Kreymer was quick to point out that they haven’t solved web-archiving, it was nonetheless exciting to see a concentrated effort towards refining it.
As a metadata librarian, I am probably a little biased here, but one of the most exciting talks of the conference was given by Dhanushka Samarakoon and Harish Maringanti of the University of Utah’s Marriott Library. Inspired by a story they heard on NPR about PoetiX, a sonnet-writing competition where judges are asked to determine if a sonnet was written by man or machine, Samarakoon and Maringati began to think about the implications of machine learning on metadata creation. Recognizing that metadata is typically where the bottleneck occurs in digital library workflows, they wanted to explore how machine learning technology might simplify descriptive metadata creation for historical image collections. To do this they created a model using data from Imagenet, a database of over 14 million images designed for use in visual object recognition software research; and over 470 photographs with high quality human-generated metadata from their own digital library collections. Once this data was introduced into a pre-trained neural network, they ran a collection of photographs through the system to see how well the model worked. It wasn’t perfect—for instance, a photo of a man standing next to a cow was described as “Mary Jane standing by a cow,” apparently due to the many people identified as “Mary Jane” in the original digital library dataset. However, it was exciting to see the possibilities of AI in image analysis and the implications this might have for future metadata automation.
At one point during the conference someone took a quick visual poll of how many first-time attendees were in the audience. There were a lot of us. But there were also a lot of Code4Lib veterans. During a lightning talk about the origin of the conference, Karen Coombs, Ryan Wick, and Roy Tennant recalled wanting to create a conference with a “no spectators” motto—where attendees had ample opportunities to engage, participate, and have their voices heard. Unlike most other library conferences, Code4Lib doesn’t have competing programming. Everyone gathers in one large room and attends the same talks and sessions. It was this model of inclusivity, equality, and innovation that I found most appealing about Code4Lib, and will no doubt draw me back in coming years.
For more information about the conference, including streaming video and slides, visit the Code4Lib 2019 website.
Nicole Shibata is the Metadata Librarian at California State University, Northridge.
PASIG 2019 met the week of February 11th at El Colegio de México (commonly known as Colmex) in Mexico City. PASIG stands for Preservation and Archiving Special Interest Group, and the group’s meeting brings together an international group of practitioners, industry experts, vendors, and researchers to discuss practical digital preservation topics and approaches. This meeting was particularly special because it was the first time the group convened in Latin America (past meetings have generally been held in Europe and the United States). Excellent real-time bilingual translation for presentations given in both English and Spanish enabled conversations across geographical and lingual boundaries and made room to center Latin American preservationists’ perspectives and transformative post-custodial archival practice.
The conference began with broad overviews of digital preservation topics and tools to create a common starting ground, followed by more focused deep-dives on subsequent days. I saw two major themes emerge over the course of the week. The first was the importance of people over technology in digital preservation. From David Minor’s introductory session to Isabel Galina Russell’s overview of the digital preservation landscape in Mexico, presenters continuously surfaced examples of the “people side” of digital preservation (think: preservation policies, appraisal strategies, human labor and decision-making, keeping momentum for programs, communicating to stakeholders, ethical partnerships). One point that struck me during the community archives session was Verónica Reyes-Escudero’s discussion of “cultural competency as a tool for front-end digital preservation.” By conceptualizing interpersonal skills as a technology for facilitating digital preservation, we gain a broader and more ethically grounded idea of what it is we are really trying to do by preserving bits in the first place. Software and hardware are part of the picture, but they are certainly not the whole view.
The second major theme was that digital preservation is best done together. Distributed digital preservation platforms, consortial preservation models, and collaborative research networks were also well-represented by speakers from LOCKSS, Texas Digital Library (TDL), Duraspace, Open Preservation Foundation, Software Preservation Network, and others. The takeaway from these sessions was that the sheer resource-intensiveness of digital preservation means that institutions, both large and small, are going to have to collaborate in order to achieve their goals. PASIG seemed to be a place where attendees could foster and strengthen these collective efforts. Throughout the conference, presenters also highlighted failures of collaborative projects and the need for sustainable financial and governance models, particularly in light of recent developments at the Digital Preservation Network (DPN) and Digital Public Library of America (DPLA). I was particularly impressed by Mary Molinaro’s honest and informative discussion about the factors that led to the shuttering of DPN. Molinaro indicated that DPN would soon be publishing a final report in order to transparently share their model, flaws and all, with the broader community.
Touching on both of these themes, Carlos Martínez Suárez of Video Trópico Sur gave a moving keynote about his collaboration with Natalie M. Baur, Preservation Librarian at Colmex, to digitize and preserve video recordings he made while living with indigenous groups in the Mexican state of Chiapas. The question and answer portion of this session highlighted some of the ethical issues surrounding rights and consent when providing access to intimate documentation of people’s lives. While Colmex is not yet focusing on access to this collection, it was informative to hear Baur and others talk a bit about the ongoing technical, legal, and ethical challenges of a work-in-progress collaboration.
Presenters also provided some awesome practical tools for attendees to take home with them. One of the many great open resources session leaders shared was Frances Harrell (NEDCC) and Alexandra Chassanoff (Educopia)’s DigiPET: A Community Built Guide for Digital Preservation Education + Training Google document, a living resource for compiling educational tools that you can add to using this form. Julian Morley also shared a Preservation Storage Cost Model Google sheet that contains a template with a wealth of information about estimating the cost of different digital preservation storage models, including comparisons for several cloud providers. Amy Rudersdorf (AVP), Ben Fino-Radin (Small Data Industries), and Frances Harrell (NEDCC) also discussed helpful frameworks for conducting self-assessments.
PASIG closed out by spending some time on the challenges involved with preserving emerging and complex formats. On the last afternoon of sessions, Amelia Acker (University of Texas at Austin) spoke about the importance of preserving APIs, terms of service, and other “born-networked” formats when archiving social media. She was followed by a panel of software preservationists who discussed different use cases for preserving binaries, source code, and other software artifacts.
Thanks to the wonderful work of the PASIG 2019 steering, program, and local arrangements committees!
Kelly Bolding is the Project Archivist for Americana Manuscript Collections at Princeton University Library, as well as the team leader for bloggERS! She is interested in developing workflows for processing born-digital and audiovisual materials and making archival description more accurate, ethical, and inclusive.
by Richard Marciano, Victoria Lemieux, and Mark Hedges
The 3rd workshop on Computational Archival Science (CAS) was held on December 12, 2018, in Seattle, following two earlier CAS workshops in 2016 in Washington DC and in 2017 in Boston. It also built on three earlier workshops on ‘Big Humanities Data’ organized by the same chairs at the 2013-2015 conferences, and more directly on a symposium held in April 2016 at the University of Maryland. The current working definition of CAS is:
A transdisciplinary field that integrates computational and archival theories, methods and resources, both to support the creation and preservation of reliable and authentic records/archives and to address large-scale records/archives processing, analysis, storage, and access, with aim of improving efficiency, productivity and precision, in support of recordkeeping, appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival material.
The workshop featured five sessions and thirteen papers with international presenters and authors from the US, Canada, Germany, the Netherlands, the UK, Bulgaria, South Africa, and Portugal. All details (photos, abstracts, slides, and papers) are available at: http://dcicblog.umd.edu/cas/ieee-big-data-2018-3rd-cas-workshop/. The keynote focused on using digital archives to preserve the history of WWII Japanese-American incarceration and featured Geoff Froh, Deputy Director at Densho.org in Seattle.
This workshop explored the conjunction (and its consequences) of emerging methods and technologies around big data with archival practice and new forms of analysis and historical, social, scientific, and cultural research engagement with archives. The aim was to identify and evaluate current trends, requirements, and potential in these areas, to examine the new questions that they can provoke, and to help determine possible research agendas for the evolution of computational archival science in the coming years. At the same time, we addressed the questions and concerns scholarship is raising about the interpretation of ‘big data’ and the uses to which it is put, in particular appraising the challenges of producing quality – meaning, knowledge and value – from quantity, tracing data and analytic provenance across complex ‘big data’ platforms and knowledge production ecosystems, and addressing data privacy issues.
Computational Thinking and Computational Archival Science
#1:Introducing Computational Thinking into Archival Science Education [William Underwood et al]
#2:Automating the Detection of Personally Identifiable Information (PII) in Japanese-American WWII Incarceration Camp Records [Richard Marciano, et al.]
#3:Computational Archival Practice: Towards a Theory for Archival Engineering [Kenneth Thibodeau]
#4:Stirring The Cauldron: Redefining Computational Archival Science (CAS) for The Big Data Domain [Nathaniel Payne]
Machine Learning in Support of Archival Functions
#5:Protecting Privacy in the Archives: Supervised Machine Learning and Born-Digital Records [Tim Hutchinson]
#6:Computer-Assisted Appraisal and Selection of Archival Materials [Cal Lee]
Metadata and Enterprise Architecture
#7:Measuring Completeness as Metadata Quality Metric in Europeana [Péter Királyet al.]
#8:In-place Synchronisation of Hierarchical Archival Descriptions [Mike Bryant et al.]
#9:The Utility Enterprise Architecture for Records Professionals [Shadrack Katuu]
#10:Framing the scope of the common data model for machine-actionable Data Management Plans [João Cardoso et al.]
#11:The Blockchain Litmus Test [Tyler Smith]
Social and Cultural Institution Archives
#12:A Case Study in Creating Transparency in Using Cultural Big Data: The Legacy of Slavery Project [Ryan Cox, Sohan Shah et al]
#13:Jupyter Notebooks for Generous Archive Interfaces [Mari Wigham et al.]
Finally, we are planning a 4th CAS Workshop in December 2019 at the 2019 IEEE International Conference on Big Data (IEEE BigData 2019) in Los Angeles, CA. Stay tuned for an upcoming CAS#4 workshop call for proposals, where we would welcome SAA member contributions!
Richard Marciano is a professor at the University of Maryland iSchool where he directs the Digital Curation Innovation Center (DCIC). He previously conducted research at the San Diego Supercomputer Center at the University of California San Diego for over a decade. His research interests center on digital preservation, sustainable archives, cyberinfrastructure, and big data. He is also the 2017 recipient of Emmett Leahy Award for achievements in records and information management. Marciano holds degrees in Avionics and Electrical Engineering, a Master’s and Ph.D. in Computer Science from the University of Iowa. In addition, he conducted postdoctoral research in Computational Geography.
Victoria Lemieux is an associate professor of archival science at the iSchool and lead of the Blockchain research cluster, Blockchain@UBC at the University of British Columbia – Canada’s largest and most diverse research cluster devoted to blockchain technology. Her current research is focused on risk to the availability of trustworthy records, in particular in blockchain record keeping systems, and how these risks impact upon transparency, financial stability, public accountability and human rights. She has organized two summer institutes for Blockchain@UBC to provide training in blockchain and distributed ledgers, and her next summer institute is scheduled for May 27-June 7, 2019. She has received many awards for her professional work and research, including the 2015 Emmett Leahy Award for outstanding contributions to the field of records management, a 2015 World Bank Big Data Innovation Award, a 2016 Emerald Literati Award and a 2018 Britt Literary Award for her research on blockchain technology. She is also a faculty associate at multiple units within UBC, including the Peter Wall Institute for Advanced Studies, Sauder School of Business, and the Institute for Computers, Information and Cognitive Systems.
Mark Hedges is a Senior Lecturer in the Department of Digital Humanities at King’s College London, where he teaches on the MA in Digital Asset and Media Management, and is also Departmental Research Lead. His original academic background was in mathematics and philosophy, and he gained a PhD in mathematics at University College London, before starting a 17-year career in the software industry, before joining King’s in 2005. His research is concerned primarily with digital archives, research infrastructures, and computational methods, and he has led a range of projects in these areas over the last decade. Most recently has been working in Rwanda on initiatives relating to digital archives and the transformative impact of digital technologies.
Where: Metropolitan New York Library Council (METRO), New York, NY
Stephen Klein, Digital Services Librarian at the CUNY Graduate Center (CUNY)
Ashley Blewer, AV Preservation Specialist at Artefactual
Kelly Stewart, Digital Preservation Services Manager at Artefactual
On December 3, 2018, the Metropolitan New York Library Council (METRO)’s Digital Preservation Interest Group hosted an informative (and impeccably titled) presentation about how the CUNY Graduate Center (GC) plans to incorporate Archivematica, a web-based, open-source digital asset management software (DAMs) developed by Artefactual, into its document management strategy for student dissertations. Speakers included Stephen Klein, Digital Services Librarian at the CUNY Graduate Center (GC); Ashley Blewer, AV Preservation Specialist at Artefactual; and Kelly Stewart, Digital Preservation Services Manager at Artefactual. The presentation began with an overview from Stephen about the GC’s needs and why they chose Archivematica as a DAMs, followed by an introduction to and demo of Archivematica and Duracloud, an open-source cloud storage service, led by Ashley and Kelly (who was presenting via video-conference call). While this post provides a general summary of the presentation, I would recommend reaching out to any of the presenters for more detailed information about their work. They were all great!
Every year the GC Library receives between 400-500 dissertations, theses, and capstones. These submissions can include a wide variety of digital materials, from PDF, video, and audio files, to websites and software. Preservation of these materials is essential if the GC is to provide access to emerging scholarship and retain a record of students’ work towards their degrees. Prior to implementing a DAMs, however, the GC’s strategy for managing digital files of student work was focused primarily on access, not preservation. Access copies of student work were available on CUNY Academic Works, a site that uses Bepress Digital Commons as a CMS. Missing from the workflow, however, was the creation, storage, and management of archival originals. As Stephen explained, if the Open Archival Information System (OAIS) model is a guide for a proper digital preservation workflow, the GC was without the middle, Archival Information Package (AIP), portion of it. Some of the qualities that GC liked about Archivematica was that it was open-source and highly-customizable, came with strong customer support from Artefactual, and had an API that could integrate with tools already in use at the library. GC Library staff hope that Archivematica can eventually integrate with both the library’s electronic submission system (Vireo) and CUNY Academic Works, making the submission, preservation, and access of digital dissertations a much more streamlined, automated, and OAIS-compliant process.
Next, Ashley and Kelly introduced and demoed Archivematica and Duracloud. I was very pleased to see several features of the Archivematica software that were made intentionally intuitive. The design of the interface is very clean and easily customizable to fit different workflows. Also, each AIP that is processed includes a plain-text, human-readable file which serves as extra documentation explaining what Archivematica did to each file. Artefactual recommends pairing Archivematica with Duracloud, although users can choose to integrate the software with local storage or with other cloud services like those offered by Google or Amazon. One of the features I found really interesting about Duracloud is that it comes with various data visualization graphs that show the user how much storage is available and what materials are taking up the most space.
I close by referencing something Ashley wrote in her recent bloggERS post (conveniently she also contributed to this event). She makes an excellent point about how different skill-sets are needed to do digital preservation, from the developers that create the tools that automate digital archival processes to the archivists that advocate for and implement said tools at their institutions. I think this talk was successful precisely because it included the practitioner and vendor perspectives, as well as the unique expertise that comes with each role. Both are needed if we are to meet the challenges and tap into the potential that digital archives present. I hope to see more of these “meetings of the minds” in the future.
Regina Carra is the Archive Project Metadata and Cataloging Coordinator at Mark Morris Dance Group. She is a recent graduate of the Dual Degree MLS/MA program in Library Science and History at Queens College – CUNY.
In December 2017, the IEEE Big Data conference came to Boston, and with it came the second annual computational archival science workshop! Workshop participants were generous enough to come share their work with the local library and archives community during a one-day public unconference held at the Harvard Law School. After some sessions from Harvard librarians that touched on how they use computational methods to explore archival collections, the unconference continued with lightning talks from CAS workshop participants and discussions about what participants need to learn to engage with computational archival science in the future.
So, what is computational archival science? It is defined by CAS scholars as:
“An interdisciplinary field concerned with the application of computational methods and resources to large-scale records/archives processing, analysis, storage, long-term preservation, and access, with aim of improving efficiency, productivity and precision in support of appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival material.”
Lightning round (and they really did strike like a dozen 90-second bolts of lightning, I promise!) talks from CAS workshop participants ranged from computational curation of digitized records to blockchain to topic modeling for born-digital collections. Following a voting session, participants broke into two rounds of large group discussions to dig deeper into lightning round topics. These discussions considered natural language processing, computational curation of cultural heritage archives, blockchain, and computational finding aids. Slides from lightning round presenters and community notes can be found on the CAS Unconference website.
What did we learn? (What questions do we have now?)
Beyond learning a bit about specific projects that leverage computational methods to explore archival material, we discussed some of the challenges that archivists may bump up against when they want to engage with this work. More questions were raised than answered, but the questions can help us build a solid foundation for future study.
First, and for some of us in attendance perhaps the most important point, is the need to familiarize ourselves with computational methods. Do we have the specific technical knowledge to understand what it really means to say we want to use topic modeling to describe digital records? If not, how can we build our skills with community support? Are our electronic records suitable for computational processes? How might these issues change the way we need to conceptualize or approach appraisal, processing, and access to electronic records?
Many conversations repeatedly turned to issues of bias, privacy, and ethical issues. How do our biases shape the tools we build and use? What skills do we need to develop in order to recognize and dismantle biases in technology?
What do we need?
The unconference was intended to provide a space to bring more voices into conversations about computational methods in archives and, more specifically, to connect those currently engaged in CAS with other library and archives practitioners. At the end of the day, we worked together to compile a list of things that we felt many of us would need to learn in order to engage with CAS.
These needs include lists of methodologies and existing tools, canonical data and/or open datasets to use in testing such tools, a robust community of practice, postmortem analysis of current/existing projects, and much more. Building a community of practice and skill development for folks without strong programming skills was identified as both particularly important and especially challenging.
The Harvard CAS unconference was planned and administered by Ceilyn Boyd, Jane Kelly, and Jessica Farrell of Harvard Library, with help from Richard Marciano and Bill Underwood from the Digital Curation Innovation Center (DCIC) at the University of Maryland’s iSchool. Many thanks to all the organizers, presenters, and participants!
Jane Kelly is the Historical & Special Collections Assistant at the Harvard Law School Library. She will complete her MSLIS from the iSchool at the University of Illinois, Urbana-Champaign in December 2018.
The 2017 DLF Forum and NDSA’s Digital Preservation took place this October in Pittsburgh, Pennsylvania. Each year the DLF Forum brings together a variety of digital library practitioners, including librarians, archivists, museum professionals, metadata wranglers, technologists, digital humanists, and scholars in support of the Digital Library Federation’s mission to “advance research, learning, social justice, and the public good through the creative design and wise application of digital library technologies.” The National Digital Stewardship Alliance follows up the three-day main forum with Digital Preservation (DigiPres), a day-long conference dedicated to the “long-term preservation and stewardship of digital information and cultural heritage.” While there were a plethora of takeaways from this year’s events for the digital archivist community, for the sake of brevity, this recap will focus on a few broad themes, followed by some highlights related to electronic records specifically.
As an early career archivist and a first-time DLF/DigiPres attendee, I was impressed by the DLF community’s focus on inclusion and social justice. While technology was central to all aspects of the conference, the sessions centered the social and ethical aspects of digital tools in a way that I found both refreshing and productive. (The theme for this year’s DigiPres was, in fact, “Preservation is Political.”) Rasheedah Phillips, a Philadelphia-based public interest attorney, activist, artist, and science fiction writer opened the forum with a powerful keynote about the Community Futures Lab, a space she co-founded and designed around principles of Afrofuturism and Black Quantum Futurism. By presenting an alternate model of archiving deeply grounded in the communities affected, Phillips’s talk and Q&A responses brought to light an important critique of the restrictive nature of archival repositories. I left Phillips’s talk thinking about how we might allow the the liberatory “futures” she envisions to shape how we design online spaces for engaging with born-digital archival materials, as opposed to modeling these virtual spaces after the physical reading rooms that have alienated many of our potential users.
Other conference sessions echoed Phillips’s challenge to archivists to better engage and center the communities they document, especially those who have been historically marginalized. Ricky Punzalan noted in his talk on access to dispersed ethnographic photographs that collaboration with documented communities should now be a baseline expectation for all digital projects. Rosalie Lack and T-Kay Sangwand spoke about UCLA’s post-custodial approach to ethically developing digital collections across international borders using a collaborative partnership framework. Martha Tenney discussed concrete steps taken by archivists at Barnard College to respect the digital and emotional labor of students whose materials the archives is collecting to fill in gaps in the historical record.
Eira Tansey, Digital Archivist and Records Manager at the University of Cincinnati and organizer for Project ARCC, gave her DigiPres keynote about how our profession can develop an ethic of environmental justice. Weaving stories about the environmental history of Pittsburgh throughout her talk, Tansey called for archivists to commit firmly to ensuring the preservation and usability of environmental information. Related themes of transparency and accountability in the context of preserving and providing access to government and civic data (which is nowadays largely born-digital) were also present through the conference sessions. Regarding advocacy and awareness initiatives, Rachel Mattson and Brandon Locke spoke about Endangered Data Week; and several sessions discussed the PEGI Project. Others presented on the challenges of preserving born-digital civic and government information, including how federal institutions and smaller universities are tackling digital preservation given their often limited budgets, as well as how repositories are acquiring and preserving born-digital congressional records.
Collaborative workflow development for born-digital processing was another theme that emerged in a variety of sessions. Annalise Berdini, Charlie Macquarie, Shira Peltzman, and Kate Tasker, all digital archivists representing different University of California campuses, spoke about their process in coming together to create a standardized set of UC-wide guidelines for describing born-digital materials. Representatives from the OSSArcFlow project also presented some initial findings regarding their research into how repositories are integrating open source tools including BitCurator, Archivematica, and ArchivesSpace within their born-digital workflows; they reported on concerns about the scalability of various tools and standards, as well as desires to transition from siloed workflows to a more holistic approach and to reduce the time spent transforming the output of one tool to be compatible with another tool in the workflow. Elena Colón-Marrero of the Computer History Museum’s Center for Software History provided a thorough rundown of building a software preservation workflow from the ground-up, from inventorying software and establishing a controlled vocabulary for media formats to building a set of digital processing workstations, developing imaging workflows for different media formats, and eventually testing everything out on a case study collection (and she kindly placed her whole talk online!)
Also during the forum, the DLF Born-Digital Access Group met over lunch for an introduction and discussion. The meeting was well-attended, and the conversation was lively as members shared their current born-digital access solutions, both pretty and not so pretty (but never perfect); their wildest hopes and dreams for future access models; and their ideas for upcoming projects the group could tackle together. While technical challenges certainly figured into the discussion about impediments to providing better born-digital access, many of the problems participants reported had to do with their institutions being unwilling to take on perceived legal risks. The main action item that came out of the meeting is that the group plans to take steps to expand NDSA’s Levels of Preservation framework to include Levels of Access, as well as corresponding tiers of rights issues. The goal would be to help archivists assess the state of existing born-digital access models at their institutions, as well as give them tools to advocate for more robust, user-friendly, and accessible models moving forward.
For additional reports on the conference, reflections from several DLF fellows are available on the DLF blog. In addition to the sessions I mentioned, there are plenty more gems to be found in the openly available community notes (DLF, DigiPres) and OSF Repository of slides (DLF, DigiPres), as well as in the community notes for the Liberal Arts Colleges/HBCU Library Alliance unconference that preceded DLF.
Kelly Bolding is a processing archivist for the Manuscripts Division at Princeton University Library, where she is responsible for the arrangement and description of early American history collections and has been involved in the development of born-digital processing workflows. She holds an MLIS from Rutgers University and a BA in English Literature from Reed College.