Preserve This Podcast!

by Molly Schwartz

Mary Kidd (MLIS ’14) and Dana Gerber-Margie (MLS ’13) first met at a Radio Preservation Task Force meeting in 2016. They bonded over experiences of conference fatigue, but quickly moved onto topics near and dear to both of their hearts: podcasts and audio archiving. Dana Gerber-Margie has been a long-time podcast super-listener. She is subscribed to over 1400 podcasts, and she regularly listens to 40-50 of them. She launched a podcast recommendation newsletter when she was getting her MLS, called “The Audio Signal,” which has grown into a popular podcast publication called The Bello Collective. Mary was a National Digital Stewardship Resident at WNYC, where she was creating a born-digital preservation strategy for their archives. She had worked on analog archives projects in the past — scanning and transferring collections of tapes — but she’s embraced the madness and importance of preserving born-digital audio. Mary and Dana stayed in touch and continued to brainstorm ideas, which blossomed into a workshop about podcast preservation that they taught at the Personal Digital Archives conference at Stanford in 2017, along with Anne Wootton (co-founder of Popup Archive, now at Apple Podcasts).

Then Mary and I connected at the National Digital Stewardship Residency symposium in Washington, DC in 2017. I got my MLS back in 2013, but since then I’ve been working more at the intersection of media, storytelling, and archives. I had started a podcast and was really interested, for selfish reasons, in learning the most up-to-date best practices for born-digital audio preservation. I marched straight up to Mary and said something like, “hey, let’s work together on an audio preservation project.” Mary set up a three-way Skype call with Dana on the line, and pretty soon we were talking about podcasts. How we love them. How they are at risk because most podcasters host their files on commercial third-party platforms. And how we would love to do a massive outreach and education program where we teach podcasters that their digital files are at risk and give them techniques for preserving them. We wrote these ideas into a grant proposal, with a few numbers and a budget attached, and the Andrew W. Mellon Foundation gave us $142,000 to make it happen. We started working on this grant project, called “Preserve This Podcast,” back in February 2018. We’ve been able to hire people who are just as excited about the idea to help us make it happen. Like Sarah Nguyen, a current MLIS student at the University of Washington and our amazing Project Coordinator.

Behaviors chart from the Preserve This Podcast! survey.

One moral of this story is that digital archives conferences really can bring people together and inspire them to advance the field. The other moral of the story is that, after months of consulting audio preservation experts and interviewing podcasters and getting 556 podcasters to take a survey and reading about the history of podcasting, we can confirm that podcasts are disappearing and podcast producers are not adequately equipped to preserve their work against the onslaught of forces working against the long-term endurance of digital information rendering devices. There is more information on our website about the project (preservethispodcast.org) and in the report about the survey findings. Please reach out to mschwartz@metro.org or snguyen@metro.org if you have any thoughts or ideas.


Molly Schwartz is the Studio Manager at the Metropolitan New York Library Council (METRO). She is the host and producer of two podcasts about libraries and archives — Library Bytegeist and Preserve This Podcast. Molly did a Fulbright grant at the Aalto University Media Lab in Helsinki, was part of the inaugural cohort of National Digital Stewardship Residents in Washington, D.C., and worked at the U.S. State Department as a data analyst. She holds an MLS with a specialization in Archives, Records and Information Management from the University of Maryland at College Park and a BA/MA in History from the Johns Hopkins University.

Advertisements

Students Reflect (Part 1 of 2): Tech Skills In and Out of the Classroom

By London Stever, Hayley Wilson, and Adriana Casarez

This is the third post in the bloggERS Making Tech Skills a Strategic Priority series.

As part of our “Making Tech Skills a Strategic Priority” series, the bloggERS team asked five current and recent MLIS/MSIS students to reflect on how they have learned the technology skills necessary to tackle their careers after school. One major theme, as expressed by these three writers, is the need for a balance of learning inside and outside the classroom.

London Stever, 2018 graduate, University of Pittsburgh

Approaching the six-month anniversary of my MLIS graduation, I find myself reflecting on my technological growth. Going into graduate school, I expected little technology training. Naively, I believed that most archival jobs were paper-only, excepting occasional digitization projects. Imagine my surprise upon finding out the University of Pittsburgh required an introduction to HTML. This trend continued, as the university insisted students have balanced knowledge.

I took technology-focused courses ranging from a history of computers (useful for those expecting to work with older hardware) to an overview of open-source library repositories and learning management systems (not to be discounted by those going into academia). The most useful of these classes was the required digital humanities course. Since graduating, I have applied the practical introduction to ArchivesSpace and Archivematica – and the in-depth explanation of discoverability, access, and web crawling – to my current work at SAE International.

However, none of the information I learned in those classes would be helpful on its own. University did not prepare me for talking to the IT Department. Terminology used in archives and in IT often overlaps, but usage does not. Custom, in-house programs require troubleshooting, and university technology classes did not teach me those skills. Libraries and archives often need to work with software not specially designed for them, but the university did not address this.

Self-taught classes, YouTube videos, and outside certifications were the most useful technology education for me. Using these, I customized my education to meet the needs companies mention and my own learning needs, which focus on practical application I did not get in university. I understand troubleshooting, allowing me to use programs built fifteen years ago. Creating a blog or using a content services platform to increase discoverability and internal access is a breeze. In addition to the balanced digital to analog education of university, I also needed a balance of library and general technology education.

Hayley Wilson, current student, University of North Carolina at Chapel Hill

When registering for classes at UNC Chapel Hill prior to the Fall semester of 2017, I was informed that I was required to fulfill a technology competency requirement. I had the option to either take an at home test or take a technology course (for no credit). I decided to take the technology course because I assumed it would be beneficial to other classes I would be required to take as an MLS student.

As it turns out, as a library science student on the archives and records management track, I had a very strict set of courses I was required to take, with room for only two electives. None of these required courses were focused on technology or building technology skills. I have friends on the Information Science side of the program who are required to take numerous courses that have a strong focus on technology. Fortunately, while at SILS I have had numerous opportunities outside of the classroom to learn and build my technology skills through my various internships and graduate assistant positions. However, I don’t think that every student has the opportunity to do so in their jobs.

Adriana Cásarez, 2018 graduate, University of Texas at Austin

Entering my MSIS program with an interest in digital humanities, I expected my coursework would provide most of the expertise I needed to become a more tech-savvy researcher. Indeed, a survey course in digital humanities gave me an overview of digital tools and methodologies. Additionally, a more intense programming course for cultural data analysis taught me specialized coding for data analysis, machine learning and data visualization. The programming was challenging and using the command line was daunting, but I was fortunate to develop a network of motivated peers who also wanted to develop their technical aptitude.  

Sometimes, I felt I was learning just as many technical skills outside of my general coursework. The university library offered workshops on digital scholarship tools for the academic community. My technical skills and knowledge of trends in topics like text analysis, data curation, and metadata grew by attending as many as I could. The Digital Scholarship Librarian and I also organized co-working sessions for students working on digital scholarship projects. These sessions created a community of practice to share expertise, feedback, and support with others interested in developing their technical aptitude in a productive space. We discussed the successes and frustrations with our projects and with the technology that we were often independently teaching ourselves to use. These community meetups were invaluable avenues to learn from each other and further develop our technical capabilities.

With increased focus on digital archives, libraries and scholarship, students often feel expected to just know or to teach themselves technical skills independently. My experience in my MSIS program taught me that often others are in the same boat, experiencing similar frustrations but too embarrassed to ask for help or admit ignorance. Communities of practice are essential to create an environment where students felt comfortable discussing obstacles and developing technical skills together.


Stever-LondonLondon Stever is an archival consultant at SAE International, where she balances company culture with international and industry standards, including bridging the gap between IT and discovery partners. London graduated from the University of Pittsburgh’s MLIS – Archives program and is currently working on her CompTIA certifications. She values self-education and believes multilingualism and technological literacy are the keys to archival accessibility. Please email london.stever@outlook.com or go to londonstever.com to contact London.

IMG_0186-2

Hayley Wilson is originally from San Diego but moved to New York to attend New York University. She graduated from NYU with a BA in Art History and stayed in NYC to work for a couple of years before moving abroad to work. She then moved to North Carolina for graduate school and will be graduating in May with her master’s degree in Library Science with a concentration in Archives and Records Management.

casarez_headshotAdriana Cásarez is a recent MSIS graduate from the University of Texas at Austin. She has worked as a research assistant on a digital classics project for the Quantitative Criticism Lab. She also developed a digital collection of artistic depictions of the Aeneid using cultural heritage APIs. She aspires to work in digital scholarship and advocate for diversity and inclusivity in libraries.

More skills, less pain with Library Carpentry

By Jeffrey C. Oliver, Ph.D

This is the second post in the bloggERS Making Tech Skills a Strategic Priority series.

Remember that scene in The Matrix where Neo wakes and says “I know kung fu”? Library Carpentry is like that. Almost. Do you need to search lots of files for pieces of text and tire of using Ctrl-F? In the UNIX shell lesson you’ll learn to automate tasks and rapidly extract data from files. Are you managing datasets with not-quite-standardized data fields and formats? In the OpenRefine lesson you’ll easily wrangle data into standard formats for easier processing and de-duplication. There are also Library Carpentry lessons for Python (a popular scripting programming language), Git (a powerful version control system), SQL (a commonly used relational database interface), and many more.

But let me back up a bit.

Library Carpentry is part of the Carpentries, an organization is designed to provide training to scientists, researchers, and information professionals on the computational skills necessary for work in this age of big data.

The goals of Library Carpentry align with this series’ initial call for contributions, providing resources for those in data- or information-related fields to work “more with a shovel than with a tweezers.” Library Carpentry workshops are primarily hands-on experiences with tools to make work more efficient and less prone to mistakes when performing repeated tasks.

One of the greatest parts about a Library Carpentry workshop is that they begin at the beginning. That is, the first lesson is an Introduction to Data, which is a structured discussion and exercise session that breaks down jargon (“What is a version control system”) and sets down some best practices (naming things is hard).

Not only are the lessons designed for those working in library and information professions, but they’re also designed by “in the trenches” folks who are dealing with these data and information challenges daily. As part of the Mozilla Global Sprint, Library Carpentry ran a two-day hackathon in May 2018 where lessons were developed, revised, remixed, and made pretty darn shiny by contributors at ten different sites. For some, the hackathon itself was an opportunity to learn how to use GitHub as a collaboration tool.

Furthermore, Library Carpentry workshops are led by librarians, like the most recent workshop at the University of Arizona, where lessons were taught by our Digital Scholarship Librarian, our Geospatial Specialist, our Liaison Librarian to Anthropology (among other domains), and our Research Data Management Specialist.

Now, a Library Carpentry workshop won’t make you an expert in Python or the UNIX command line in two days. Even Neo had to practice his kung fu a bit. But workshops are designed to be inclusive and accessible, myth-busting, and – I’ll say it – fun. Don’t take my word for it, here’s a sampling of comments from our most recent workshop:

  • Loved the hands-on practice on regular expressions
  • Really great lesson – I liked the challenging exercises, they were fun! It made SQL feel fun instead of scary
  • Feels very powerful to be able to navigate files this way, quickly & in bulk.

So regardless of how you work with data, Library Carpentry has something to offer. If you’d like to host a Library Carpentry workshop, you can use our request a workshop form. You can also connect to Library Carpentry through social media, the web, or good old fashioned e-mail. And since you’re probably working with data already, you have something to offer Library Carpentry. This whole endeavor runs on the multi-faceted contributions of the community, so join us, we have cookies. And APIs. And a web scraping lesson. The terrible puns are just a bonus.

IEEE Big Data 2018: 3rd Computational Archival Science (CAS) Workshop Recap

by Richard Marciano, Victoria Lemieux, and Mark Hedges

Introduction

The 3rd workshop on Computational Archival Science (CAS) was held on December 12, 2018, in Seattle, following two earlier CAS workshops in 2016 in Washington DC and in 2017 in Boston. It also built on three earlier workshops on ‘Big Humanities Data’ organized by the same chairs at the 2013-2015 conferences, and more directly on a symposium held in April 2016 at the University of Maryland. The current working definition of CAS is:

A transdisciplinary field that integrates computational and archival theories, methods and resources, both to support the creation and preservation of reliable and authentic records/archives and to address large-scale records/archives processing, analysis, storage, and access, with aim of improving efficiency, productivity and precision, in support of recordkeeping, appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival material [1].

The workshop featured five sessions and thirteen papers with international presenters and authors from the US, Canada, Germany, the Netherlands, the UK, Bulgaria, South Africa, and Portugal. All details (photos, abstracts, slides, and papers) are available at: http://dcicblog.umd.edu/cas/ieee-big-data-2018-3rd-cas-workshop/. The keynote focused on using digital archives to preserve the history of WWII Japanese-American incarceration and featured Geoff Froh, Deputy Director at Densho.org in Seattle.

Keynote speaker Geoff Froh, Deputy Director at Densho.org in Seattle presenting on “Reclaiming our Story: Using Digital Archives to Preserve the History of WWII Japanese American Incarceration.”

This workshop explored the conjunction (and its consequences) of emerging methods and technologies around big data with archival practice and new forms of analysis and historical, social, scientific, and cultural research engagement with archives. The aim was to identify and evaluate current trends, requirements, and potential in these areas, to examine the new questions that they can provoke, and to help determine possible research agendas for the evolution of computational archival science in the coming years. At the same time, we addressed the questions and concerns scholarship is raising about the interpretation of ‘big data’ and the uses to which it is put, in particular appraising the challenges of producing quality – meaning, knowledge and value – from quantity, tracing data and analytic provenance across complex ‘big data’ platforms and knowledge production ecosystems, and addressing data privacy issues.

Sessions

  1. Computational Thinking and Computational Archival Science
  • #1:Introducing Computational Thinking into Archival Science Education [William Underwood et al]
  • #2:Automating the Detection of Personally Identifiable Information (PII) in Japanese-American WWII Incarceration Camp Records [Richard Marciano, et al.]
  • #3:Computational Archival Practice: Towards a Theory for Archival Engineering [Kenneth Thibodeau]
  • #4:Stirring The Cauldron: Redefining Computational Archival Science (CAS) for The Big Data Domain [Nathaniel Payne]
  1. Machine Learning in Support of Archival Functions
  • #5:Protecting Privacy in the Archives: Supervised Machine Learning and Born-Digital Records [Tim Hutchinson]
  • #6:Computer-Assisted Appraisal and Selection of Archival Materials [Cal Lee]
  1. Metadata and Enterprise Architecture
  • #7:Measuring Completeness as Metadata Quality Metric in Europeana [Péter Királyet al.]
  • #8:In-place Synchronisation of Hierarchical Archival Descriptions [Mike Bryant et al.]
  • #9:The Utility Enterprise Architecture for Records Professionals [Shadrack Katuu]
  1. Data Management
  • #10:Framing the scope of the common data model for machine-actionable Data Management Plans [João Cardoso et al.]
  • #11:The Blockchain Litmus Test [Tyler Smith]
  1. Social and Cultural Institution Archives
  • #12:A Case Study in Creating Transparency in Using Cultural Big Data: The Legacy of Slavery Project [Ryan CoxSohan Shah et al]
  • #13:Jupyter Notebooks for Generous Archive Interfaces [Mari Wigham et al.]

Next Steps

Updates will continue to be provided through the CAS Portal website, see: http://dcicblog.umd.edu/cas and a Google Group you can join at computational-archival-science@googlegroups.com.

Several related events are scheduled in April 2019: (1) a 1 ½ day workshop on “Developing a Computational Framework for Library and Archival Education” will take place on April 3 & 4, 2019, at the iConference 2019 event (See: https://iconference2019.umd.edu/external-events-and-excursions/ for details), and (2) a “Blue Sky” paper session on “Establishing an International Computational Network for Librarians and Archivists” (See: https://www.conftool.com/iConference2019/index.php?page=browseSessions&form_session=356).

Finally, we are planning a 4th CAS Workshop in December 2019 at the 2019 IEEE International Conference on Big Data (IEEE BigData 2019) in Los Angeles, CA. Stay tuned for an upcoming CAS#4 workshop call for proposals, where we would welcome SAA member contributions!

References

[1] “Archival records and training in the Age of Big Data”, Marciano, R., Lemieux, V., Hedges, M., Esteva, M., Underwood, W., Kurtz, M. & Conrad, M.. See: LINK. In J. Percell , L. C. Sarin , P. T. Jaeger , J. C. Bertot (Eds.), Re-Envisioning the MLS: Perspectives on the Future of Library and Information Science Education (Advances in Librarianship, Volume 44B, pp.179-199). Emerald Publishing Limited. May 17, 2018. See: http://dcicblog.umd.edu/cas/wp-content/uploads/sites/13/2017/06/Marciano-et-al-Archival-Records-and-Training-in-the-Age-of-Big-Data-final.pdf


Richard Marciano is a professor at the University of Maryland iSchool where he directs the Digital Curation Innovation Center (DCIC). He previously conducted research at the San Diego Supercomputer Center at the University of California San Diego for over a decade. His research interests center on digital preservation, sustainable archives, cyberinfrastructure, and big data. He is also the 2017 recipient of Emmett Leahy Award for achievements in records and information management. Marciano holds degrees in Avionics and Electrical Engineering, a Master’s and Ph.D. in Computer Science from the University of Iowa. In addition, he conducted postdoctoral research in Computational Geography.

Victoria Lemieux is an associate professor of archival science at the iSchool and lead of the Blockchain research cluster, Blockchain@UBC at the University of British Columbia – Canada’s largest and most diverse research cluster devoted to blockchain technology. Her current research is focused on risk to the availability of trustworthy records, in particular in blockchain record keeping systems, and how these risks impact upon transparency, financial stability, public accountability and human rights. She has organized two summer institutes for Blockchain@UBC to provide training in blockchain and distributed ledgers, and her next summer institute is scheduled for May 27-June 7, 2019. She has received many awards for her professional work and research, including the 2015 Emmett Leahy Award for outstanding contributions to the field of records management, a 2015 World Bank Big Data Innovation Award, a 2016 Emerald Literati Award and a 2018 Britt Literary Award for her research on blockchain technology. She is also a faculty associate at multiple units within UBC, including the Peter Wall Institute for Advanced Studies, Sauder School of Business, and the Institute for Computers, Information and Cognitive Systems.

Mark Hedges is a Senior Lecturer in the Department of Digital Humanities at King’s College London, where he teaches on the MA in Digital Asset and Media Management, and is also Departmental Research Lead. His original academic background was in mathematics and philosophy, and he gained a PhD in mathematics at University College London, before starting a 17-year career in the software industry, before joining King’s in 2005. His research is concerned primarily with digital archives, research infrastructures, and computational methods, and he has led a range of projects in these areas over the last decade. Most recently has been working in Rwanda on initiatives relating to digital archives and the transformative impact of digital technologies.

A Recap of “DAM if you do and DAM if you don’t!”

by Regina Carra

When: December 3, 2018

Where: Metropolitan New York Library Council (METRO), New York, NY

Speakers:

  • Stephen Klein, Digital Services Librarian at the CUNY Graduate Center (CUNY)
  • Ashley Blewer, AV Preservation Specialist at Artefactual
  • Kelly Stewart, Digital Preservation Services Manager at Artefactual

On December 3, 2018, the Metropolitan New York Library Council (METRO)’s Digital Preservation Interest Group hosted an informative (and impeccably titled) presentation about how the CUNY Graduate Center (GC) plans to incorporate Archivematica, a web-based, open-source digital asset management software (DAMs) developed by Artefactual, into its document management strategy for student dissertations. Speakers included Stephen Klein, Digital Services Librarian at the CUNY Graduate Center (GC); Ashley Blewer, AV Preservation Specialist at Artefactual; and Kelly Stewart, Digital Preservation Services Manager at Artefactual. The presentation began with an overview from Stephen about the GC’s needs and why they chose Archivematica as a DAMs, followed by an introduction to and demo of Archivematica and Duracloud, an open-source cloud storage service, led by Ashley and Kelly (who was presenting via video-conference call). While this post provides a general summary of the presentation, I would recommend reaching out to any of the presenters for more detailed information about their work. They were all great!

Every year the GC Library receives between 400-500 dissertations, theses, and capstones. These submissions can include a wide variety of digital materials, from PDF, video, and audio files, to websites and software. Preservation of these materials is essential if the GC is to provide access to emerging scholarship and retain a record of students’ work towards their degrees. Prior to implementing a DAMs, however, the GC’s strategy for managing digital files of student work was focused primarily on access, not preservation. Access copies of student work were available on CUNY Academic Works, a site that uses Bepress Digital Commons as a CMS. Missing from the workflow, however, was the creation, storage, and management of archival originals. As Stephen explained, if the Open Archival Information System (OAIS) model is a guide for a proper digital preservation workflow, the GC was without the middle, Archival Information Package (AIP), portion of it. Some of the qualities that GC liked about Archivematica was that it was open-source and highly-customizable, came with strong customer support from Artefactual, and had an API that could integrate with tools already in use at the library. GC Library staff hope that Archivematica can eventually integrate with both the library’s electronic submission system (Vireo) and CUNY Academic Works, making the submission, preservation, and access of digital dissertations a much more streamlined, automated, and OAIS-compliant process.

A sample of one of Duracloud’s data visualization graphs from the presentation slides.

Next, Ashley and Kelly introduced and demoed Archivematica and Duracloud. I was very pleased to see several features of the Archivematica software that were made intentionally intuitive. The design of the interface is very clean and easily customizable to fit different workflows. Also, each AIP that is processed includes a plain-text, human-readable file which serves as extra documentation explaining what Archivematica did to each file. Artefactual recommends pairing Archivematica with Duracloud, although users can choose to integrate the software with local storage or with other cloud services like those offered by Google or Amazon. One of the features I found really interesting about Duracloud is that it comes with various data visualization graphs that show the user how much storage is available and what materials are taking up the most space.

I close by referencing something Ashley wrote in her recent bloggERS post (conveniently she also contributed to this event). She makes an excellent point about how different skill-sets are needed to do digital preservation, from the developers that create the tools that automate digital archival processes to the archivists that advocate for and implement said tools at their institutions. I think this talk was successful precisely because it included the practitioner and vendor perspectives, as well as the unique expertise that comes with each role. Both are needed if we are to meet the challenges and tap into the potential that digital archives present. I hope to see more of these “meetings of the minds” in the future.

(For more info: Stephen and Ashley and Kelly have generously shared their slides!)


Regina Carra is the Archive Project Metadata and Cataloging Coordinator at Mark Morris Dance Group. She is a recent graduate of the Dual Degree MLS/MA program in Library Science and History at Queens College – CUNY.

The Archivist’s Guide to KryoFlux

by [Matthew] Farrell and Shira Peltzman

As cultural icons go, the floppy disk continues to persist in the contemporary information technology landscape. Though digital storage has moved beyond the 80 KB – 1.44 MB storage capacity of the floppy disk, its image is often shorthand for the concept of saving one’s work (to wit: Microsoft Word 2016 still uses an icon of a 3.5″ floppy disk to indicate save in its user interface). Likewise, floppy disks make up a sizable portion of many archival collections, in number of objects if not storage footprint. If a creator of personal papers or institutional records maintained their work in electronic form in the 1980s or 1990s, chances are high that these are stored on floppy disks. But the persistent image of the ubiquitous floppy disk conceals a long list of challenges that come into play as archivists attempt to capture their data.

For starters, we often grossly underestimate the extent to which the technology was in active development during its heyday. One would be forgiven the assumption that there existed only a small number of floppy disk formats: namely 5.25″ and 3.5″, plus their 8″ forebears. But within each of these sizes there existed myriad variations of density and encoding, all of which complicate the archivist’s task now that these disks have entered our stacks. This is to say nothing of the hardware: 8″ and 5.25″ drives and standard controller boards are no longer made, and the only 3.5″ drive currently manufactured is a USB-enabled device capable only of reading disks with the more recent encoding methods storing file systems compatible with the host computer. And, of course, none of the above accounts for media stability over time for obsolete carriers.

Enter KryoFlux, a floppy disk controller board first made available in 2009. KryoFlux is very powerful, allowing users of contemporary Windows, Mac, and Linux machines to interface with legacy floppy drives via a USB port. The KryoFlux does not attempt to mount a floppy disk’s file system to the host computer, granting two chief affordances: users can acquire data (a) independent of their host computer’s file system, and (b) without necessarily knowing the particulars of the disk in question. The latter is particularly useful when attempting to analyze less stable media.

Despite the powerful utility of KryoFlux, uptake among archives and digital preservation programs has been hampered by a lack of accessible documentation and training resources. The official documentation and user forums assume a level of technical knowledge largely absent from traditional archival training. Following several informal conversations at Stanford University’s Born-Digital Archives eXchange events in 2015 and 2016, as well as discussions at various events hosted by the BitCurator Consortium, we formed a working group that included archivists and archival studies students from Emory University, the University of California Los Angeles, Yale University, Duke University, and the University of Texas at Austin to create user-friendly documentation aimed specifically at archivists.

Development of The Archivists Guide to KryoFlux began in 2016, with a draft released on Google Docs in Spring 2017. The working group invited feedback over a 6-month comment period and were gratified to receive a wide range of comments and questions from the community. Informed by this incredible feedback, a revised version of the Guide is now hosted in GitHub and available for anyone to use, though the use cases described are generally those encountered by archivists working with born-digital collections in institutional and manuscript repositories.

The Guide is written in two parts. “Part One: Getting Started” provides practical guidance on how to set-up and begin using the KryoFlux and aims to be as inclusive and user-friendly as possible. It includes instructions for running KryoFlux using both Mac and Windows operating systems. Instructions for running KryoFlux using Linux are also provided, allowing repositories that use BitCurator (an Ubuntu-based open-source suite of digital archives tools) to incorporate the KryoFlux into their workflows.

“Part Two: In Depth” examines KryoFlux features and floppy disk technology in more detail. This section introduces the variety of floppy disk encoding formats and provides guidance as to how KryoFlux users can identify them. Readers can also find information about working with 40-track floppy disks. Part Two covers KryoFlux-specific output too, including log files and KryoFlux stream files, and suggests ways in which archivists might make use of these files to support digital preservation best practices. Short case studies documenting the experiences of archivists at other institutions are also included here, providing real-life examples of KryoFlux in action.

As with any technology, the KryoFlux hardware and software will undergo updates and changes in the future which will, if we are not careful, have an effect on the currency of the Guide. In an attempt to address this possibility, the working group have chosen to host the guide as a public GitHub repository. This platform supports versioning and allows for easy collaboration between members of the working group. Perhaps most importantly, GitHub supports the integration of community-driven contributions, including revisions, corrections, and updates. We have established a process for soliciting and reviewing additional contributions and corrections (short answer: submit a pull request via GitHub!), and will annually review the membership of an ongoing working group responsible for monitoring this work to ensure that the Guide remains actively maintained for as long as humanly possible.

WDPD2018groot-30

On this year’s World Digital Preservation Day, the Digital Preservation Coalition presented The Archivist’s Guide to KryoFlux with the 2018 Digital Preservation Award for Teaching and Communications. It was truly an honor to be recognized alongside the other very worthy finalists, and a cherry-on-top for what we hope will remain a valuable resource for years to come.


[Matthew] Farrell is the Digital Records Archivist in Duke University’s David M. Rubenstein Rare Book & Manuscript Library. Farrell holds an MLS from the University of North Carolina at Chapel Hill.


Shira Peltzman is the Digital Archivist for the UCLA Library where she leads a preservation program for Library Special Collections’ born-digital material. Shira received her M.A. in Moving Image Archiving and Preservation from New York University’s Tisch School of the Arts, and was a member of the inaugural cohort of the National Digital Stewardship Residency in New York (NDSR-NY).

Trained in Classification, Without Classification

by Ashley Blewer

This is the first post in the bloggERS Making Tech Skills a Strategic Priority series.

Hi, SAA ERS readers! My name is Ashley Blewer, and I am sort of an archivist, sort of a developer, and sort of something else I haven’t quite figured out what to call myself. I work for a company for Artefactual Systems, and we make digital preservation and access software called Archivematica and AtoM (Access to Memory) respectively. My job title is AV Preservation Specialist, which is true, that is what I specialize in, and maybe that fulfills part of that “sort of something else I haven’t quite figured out.” I’ve held a lot of different roles in my career, as digital preservation consultant, open source software builder and promoter, developer at a big public library, archivist at a small public film archive, and other things. This, however, is my first time working for an open source technology company that makes software used by libraries, archives, museums, and other organizations in the cultural heritage sector. I think this is a rare vantage point from which to look at the field and its relationship to technology, and I think that even within this rare position, we have an even more unique culture and mentality around archives and technology that I’d like to talk about.

Within Archivematica, we have a few loosely defined types of jobs. There are systems archivists, which we speak of internally as analysts, there are developers (software engineers), and there are also systems operations folks (systems administrators and production support engineers). We have a few other roles that sit more at the executive level, but there isn’t a wall between any of these roles, as even those who are classified as being “in management” also work as analysts or systems engineers when called upon to do so. My role also sits between a lot of these loosely defined roles — I suppose I am technically classified as an analyst, and I run with the fellow analyst crew: I attend their meetings, work directly with clients, and other preservation-specific duties, but I also have software development skills, and can perform more traditionally technical tasks like writing code, changing how things function at a infrastructure level, and reviewing and testing the code that has been written by others. I’m still learning the ropes (I have been at the organization full-time for 4 months), but I am increasingly able to do some simple system administration tasks too, mostly for clients that need me to log in and check out what’s going on with their systems. This seems to be a way in which roles at my company and within the field (I hope) are naturally evolving. Another example is my brilliant colleague Ross Spencer who works as a software engineer, but has a long-established career working within the digital preservation space, so he definitely lends a hand providing crucial insight when doing “analyst-style” work.

We are a technical company, and everyone on staff has some components that are essential to a well-rounded digital preservation systems infrastructure. For example, all of us know how to use Git (a version control management system made popular by Github) and we use it as a regular part of our job, whether we are writing code or writing documentation for how to use our software. But “being technical” or having technical literacy skills involves much, much more than writing code. My fellow analysts have to do highly complex and nuanced workflow development and data mapping work, figuring out niche bugs associated with some of the microservices we run, and articulating in common human language some of the very technical parts of a large software system. I think Artefactual’s success as a company comes from the collective ability to foster a safe, warm, and collaborative environment that allows anyone on the team to get the advice or support they need to understand a technical problem, and use that knowledge to better support our software, every Archivematica user (client or non-client), and the larger digital preservation community. This is the most important part of any technical initiative or training, and it is the most fundamental component of any system.

I don’t write this as a representative for Artefactual, but as myself, a person who has held many different roles at many different institutions all with different relationships to technology, and this has by far been the most healthy and on-the-job educational experience I have had, and I think those two things go hand-in-hand. I can only hope that other organizations begin to narrow the line between “person who does archives work” and “technical person” in a way that supports collaboration and cross-training between people coming into the field with different backgrounds and experiences. We are all in this together, and the only way we are gonna get things done is if we work closely together.



Ashley works as at Artefactual Systems as their AV Preservation Specialist, primarily on the Archivematica project. She specializes in time-based media preservation, digital repository management, infrastructure/community building, computer-to-human interpretation, and teaching technical concepts. She is an active contributor to MediaArea’s MediaConch, a open source digital video file conformance checker software project, and Bay Area Video Coalition’s QCTools, an open source digitized video analysis software project. She holds Master of Library and Information Science (Archives) and Bachelor of Arts (Graphic Design) degrees from the University of South Carolina.

Announcing the Digital Processing Framework

by Erin Faulder

Development of the Digital Processing Framework began after the second annual Born Digital Archiving eXchange unconference at Stanford University in 2016. There, a group of nine archivists saw a need for standardization, best practices, or general guidelines for processing digital archival materials. What came out of this initial conversation was the Digital Processing Framework (https://hdl.handle.net/1813/57659) developed by a team of 10 digital archives practitioners: Erin Faulder, Laura Uglean Jackson, Susanne Annand, Sally DeBauche, Martin Gengenbach, Karla Irwin, Julie Musson, Shira Peltzman, Kate Tasker, and Dorothy Waugh.

An initial draft of the Digital Processing Framework was presented at the Society of American Archivists’ Annual meeting in 2017. The team received feedback from over one hundred participants who assessed whether the draft was understandable and usable. Based on that feedback, the team refined the framework into a series of 23 activities, each composed of a range of assessment, arrangement, description, and preservation tasks involved in processing digital content. For example, the activity Survey the collection includes tasks like Determine total extent of digital material and Determine estimated date range.

The Digital Processing Framework’s target audience is folks who process born digital content in an archival setting and are looking for guidance in creating processing guidelines and making level-of-effort decisions for collections. The framework does not include recommendations for archivists looking for specific tools to help them process born digital material. We draw on language from the OAIS reference model, so users are expected to have some familiarity with digital preservation, as well as with the management of digital collections and with processing analog material.

Processing born-digital materials is often non-linear, requires technical tools that are selected based on unique institutional contexts, and blends terminology and theories from archival and digital preservation literature. Because of these characteristics, the team first defined 23 activities involved in digital processing that could be generalized across institutions, tools, and localized terminology. These activities may be strung together in a workflow that makes sense for your particular institution. They are:

  • Survey the collection
  • Create processing plan
  • Establish physical control over removeable media
  • Create checksums for transfer, preservation, and access copies
  • Determine level of description
  • Identify restricted material based on copyright/donor agreement
  • Gather metadata for description
  • Add description about electronic material to finding aid
  • Record technical metadata
  • Create SIP
  • Run virus scan
  • Organize electronic files according to intellectual arrangement
  • Address presence of duplicate content
  • Perform file format analysis
  • Identify deleted/temporary/system files
  • Manage personally identifiable information (PII) risk
  • Normalize files
  • Create AIP
  • Create DIP for access
  • Publish finding aid
  • Publish catalog record
  • Delete work copies of files

Within each activity are a number of associated tasks. For example, tasks identified as part of the Establish physical control over removable media activity include, among others, assigning a unique identifier to each piece of digital media and creating suitable housing for digital media. Taking inspiration from MPLP and extensible processing methods, the framework assigns these associated tasks to one of three processing tiers. These tiers include: Baseline, which we recommend as the minimum level of processing for born digital content; Moderate, which includes tasks that may be done on collections or parts of collections that are considered as having higher value, risk, or access needs; and Intensive, which includes tasks that should only be done to collections that have exceptional warrant. In assigning tasks to these tiers, practitioners balance the minimum work needed to adequately preserve the content against the volume of work that could happen for nuanced user access. When reading the framework, know that if a task is recommended at the Baseline tier, then it should also be done as part of any higher tier’s work.

We designed this framework to be a step towards a shared vocabulary of what happens as part of digital processing and a recommendation of practice, not a mandate. We encourage archivists to explore the framework and use it however it fits in their institution. This may mean re-defining what tasks fall into which tier(s), adding or removing activities and tasks, or stringing tasks into a defined workflow based on tier or common practice. Further, we encourage the professional community to build upon it in practical and creative ways.


Erin Faulder is the Digital Archivist at Cornell University Library’s Division of Rare and Manuscript Collections. She provides oversight and management of the division’s digital collections. She develops and documents workflows for accessioning, arranging and describing, and providing access to born-digital archival collections. She oversees the digitization of analog collection material. In collaboration with colleagues, Erin develops and refines the digital preservation and access ecosystem at Cornell University Library.

The Top 10 Things We Learned from Building the Queer Omaha Archives, Part 2 – Lessons 6 to 10

by Angela Kroeger and Yumi Ohira

The Queer Omaha Archives (QOA) is an ongoing effort by the University of Nebraska at Omaha Libraries’ Archives and Special Collections to collect and preserve Omaha’s LGBTQIA+ history. This is still a fairly new initiative at the UNO Libraries, having been launched in June 2016. This blog post is adapted and expanded from a presentation entitled “Show Us Your Omaha: Combatting LGBTQ+ Archival Silences,” originally given at the June 2017 Nebraska Library Association College & University Section spring meeting. The QOA was only a year old at that point, and now that another year (plus change) has passed, the collection has continued to grow, and we’ve learned some new lessons.

So here are the top takeaways from UNO’s experience with the QOA.

#6. Words have power, and sometimes also baggage.

Words sometimes mean different things to different people. Each person’s life experience lends context to the way they interpret the words they hear. And certain words have baggage.

We named our collecting initiative the Queer Omaha Archives because in our case, “queer” was the preferred term for all LGBTQIA+ people as well as referring to the academic discipline of queer studies. In the early 1990s, the community in Omaha most commonly referred to themselves as “gays and lesbians.” Later on, bisexuals were included, and the acronym “GLB” came into more common use. Eventually, when trans people were finally acknowledged, it became “GLBT.” Then there was a push to switch the order to “LGBT.” And then more letters started piling on, until we ended up with the LGBTQIA+ commonly used today. Sometimes, it is taken even further, and we’ve seen LGBTQIAPK+, LGBTQQIP2SAA, LGBTQIAGNC, and other increasingly long and difficult-to-remember variants. (Although, Angela confesses to finding QUILTBAG to be quite charming.) The acronym keeps shifting, but we didn’t want our name to shift, so we followed the students’ lead (remember the QTS “Cuties”?) and just went with “queer.” “Queer” means everyone.

Except . . . “queer” has baggage. Heavy, painful baggage. At Pride 2016, an older man, who had donated materials to our archive, stopped by our booth and we had a conversation about the word. For him, it was the slur his enemies had been verbally assaulting him with for decades. The word still had a sharp edge for him. Angela (being younger than this donor, but older than most of the students on campus) acknowledged that they were just old enough for the word to be startling and sometimes uncomfortable. But many Millennials and Generation Z youths, as well as some older adults, have embraced “queer” as an identity. Notably, many of the younger people on campus have expressed their disdain for being put into boxes. Identifying as “gay” or “lesbian” or “bi” seems too limiting to them. Our older patron left our booth somewhat comforted by the knowledge that for much of the population, especially the younger generations, “queer” has lost its sting and taken on a positive, liberating openness.

But what about other LGBTQIA+ people who haven’t stopped by to talk to us, to learn what we mean when we call our archives “queer”? Who feels sufficiently put off by this word that they might choose against sharing their stories with our archive? We aren’t planning to change our name, but we are aware that our choice of word may give some potential donors and potential users a reason to hesitate before approaching us.

So whatever community your archive serves, think about the words that community uses to describe themselves, and the words others use to describe them, and whether those words might have connotations you don’t intend.

#7. Find your community. Partnerships, donors, and volunteers are the keys to success.

It goes without saying that archives are built from communities. We don’t (usually) create the records. We invite them, gather them, describe them, preserve them, and make them available to all, but the records (usually) come from somewhere else.

Especially if you’re building an archive for an underrepresented community, you need buy-in from the members of that community if you want your project to be successful. You need to prove that you’re going to be trustworthy, honorable, reliable stewards of the community’s resources. You need someone in your archive who is willing and able to go out into that community and engage with them. For us, that was UNO Libraries’ Archives and Special Collections Director Amy Schindler, who has a gift for outreach. Though herself cis and straight, she has put in the effort to prove herself a friend to Omaha’s LGBTQIA+ community.

You also need members of that community to be your advocates. For us, our advocates were our first donors, the people who took that leap of faith and trusted us with their resources. We started with our own university. The work of the archivist and the archives would not have been possible without the collaboration and support of campus partners. UNO GSRC Director Dr. Jessi Hitchins and UNO Sociology Professor Dr. Jay Irwin together provided the crucial mailing list for the QOA’s initial publicity and networking. Dr. Irwin and his students collected the interviews which launched the LGBTQ+ Voices oral history project. Retired UNO professor Dr. Meredith Bacon donated her personal papers and extensive library of trans resources. From outside the UNO community, Terry Sweeney, who with his partner Pat Phalen had served as editor of The New Voice of Nebraska, donated a complete set of that publication, along with a substantial collection of papers, photographs, and artifacts, and he volunteered in the archives for many weeks, creating detailed and accurate descriptions of the items. These four people, and many others, have become our advocates, friends, and champions within the Omaha LGBTQIA+ community.

Our lesson here: Find your champions. Prove your trustworthiness, and your champions will help open doors into the community.

#8. Be respectful, be interested, and be present.

Outreach is key to building connections, bringing in both donors and users for the collection. This isn’t Field of Dreams, where “If you build it, they will come.” You need to forge the partnerships first, in order to even be able to build it. And they won’t come if they don’t know about it and don’t believe in its value. (“They” in this case meaning the community or communities your archives serve, and “it” of course meaning your archives or special collections for that community.)

Fig. 3: UNO Libraries table at a Transgender Day of Remembrance event.

Yumi and Angela are both behind-the-scenes technical services types, so we don’t have quite as much contact with patrons and donors as some others in our department, but we’ve helped out staffing tables at events, such as Pride, Transgender Day of Remembrance, and Transgender Day of Visibility events. We also work to create a welcoming atmosphere for guests who come to the archives for events, tours, and research. We recognize the critical importance of the work our director does, going out and meeting people, attending events, talking to everyone, and inviting everyone to visit. As our director Amy Schindler said in the article “Collaborative Approaches to Digital Projects,” “Engagement with community partners is key . . .”

There’s also something to be said for simply ensuring that folks within the archives, and the library as a whole for that matter, have a basic awareness of the QOA and other collecting initiatives, so that we can better fulfill our mission of connecting people to the resources they need. After all, when someone walks into the library, before they even reach the archives, any staff member might be their first point of contact. So be sure to do outreach within your own institution, as well.

#9. Let me sing you the song of my administrative support.

The QOA was conceived by the efforts from UNO students, UNO employees, and Omaha communities to address the underrepresentation of the LGBTQIA+ communities in Omaha.

The initiative of the QOA was inspired by Josh Burford who delivered a presentation about collecting and archiving historical materials related to queering history. This presentation was co-hosted by UNO’s Gender and Sexuality Resource Center in the LGBTQIA+ History Month. After this event, the UNO community became keenly interested in collecting and preserving historical materials and oral history interviews about “Queer Omaha,” and began collaborating with our local LGBTQIA+ communities. In Summer 2016, the QOA was officially launched to preserve the enduring value of the legacy of LGBTQIA+ communities in greater Omaha. The QOA is an effort to combat an archival silence in the community, and digital collections and digital engagement are especially effective tools for making LGBTQIA+ communities aware that the archives welcome their records!

But none of this would have been possible without administrative support. If the library administration or the university administration had been indifferentor worse, hostileto the idea of building a LGBTQIA+ archive, we might not have been allowed to allocate staff time and other resources to this project. We might have been told “no.” Thank goodness, our library dean is 100% on board. And UNO is deeply committed to inclusion as one of our core values, which has created a campus-wide openness to the LGBTQIA+ community, resulting in an environment perfectly conducive to building this archive. In fact, in November 2017, UNO was identified as the most LGBTQIA+-friendly college in the state of Nebraska by the Campus Pride Index in partnership with BestColleges.com. An initiative like the QOA will always go much more smoothly when your administration is on your side.

#10. The Neverending Story.

We recognize that we still have a long way to go. There are quite a few gaps within our collection. We have the papers of a trans woman, but only oral histories from trans men. We don’t yet have anything from intersex folks or asexuals. We have very little from queer people of color or queer immigrants, although we do have some oral histories from those groups, thanks to the efforts of Dr. Jay Irwin, who launched the oral history project, and Luke Wegener, who was hired as a dedicated oral history associate for UNO Libraries. A major focus on the LGBTQIA+ oral history interview project is filling identified gaps within the collection, actively seeking more voices of color and other underrepresented groups within the LGBTQIA+ community. However, despite our efforts to increase the diversity within the collection, we haven’t successfully identified potential donors or interviewees to represent all of the letters within LGBTQIA, much less the +.

This isn’t—and should never bea series of checkboxes on a list. “Oh, we have a trans woman. We don’t need any more trans women.” No, that’s not how it works. We seek to fill the gaps, while continuing to welcome additional material from groups already represented. We are absolutely not going to turn away a white cis gay man just because we already have multiple resources from white cis gay men. Every individual is different. Every individual brings a new perspective. We want our collection to include as many voices as possible. So we need to promote our collection more. We need to do more outreach. We need to attract more donors, users, and champions. This will remain an ongoing effort without an endpoint. There is always room for growth.


Angela Kroeger is the Archives and Special Collections associate at the University of Nebraska at Omaha and a lifelong Nebraskan. They received their B.A. in English from the University of Nebraska at Omaha and their Master’s in Library and Information Science from the University of Missouri.

Yumi Ohira is the Digital Initiatives Librarian at the University of Nebraska at Omaha. Ohira is originally from Japan where she received a B.S. in Applied Physics from Fukuoka University. Ohira moved to the United States to attend University of Kansas and Southern Illinois University-Carbondale where she was awarded an M.F.A. in Studio Art. Ohira went on to study at Emporia State University, Kansas, where she received an M.L.S. and Archive Studies Certification.