PASIG (Preservation and Archiving Special Interest Group) 2019 Recap

by Kelly Bolding

PASIG 2019 met the week of February 11th at El Colegio de México (commonly known as Colmex) in Mexico City. PASIG stands for Preservation and Archiving Special Interest Group, and the group’s meeting brings together an international group of practitioners, industry experts, vendors, and researchers to discuss practical digital preservation topics and approaches. This meeting was particularly special because it was the first time the group convened in Latin America (past meetings have generally been held in Europe and the United States). Excellent real-time bilingual translation for presentations given in both English and Spanish enabled conversations across geographical and lingual boundaries and made room to center Latin American preservationists’ perspectives and transformative post-custodial archival practice.

Perla Rodriguez of the Universidad Nacional Autónoma de México (UNAM) discusses an audiovisual preservation case study.

The conference began with broad overviews of digital preservation topics and tools to create a common starting ground, followed by more focused deep-dives on subsequent days. I saw two major themes emerge over the course of the week. The first was the importance of people over technology in digital preservation. From David Minor’s introductory session to Isabel Galina Russell’s overview of the digital preservation landscape in Mexico, presenters continuously surfaced examples of the “people side” of digital preservation (think: preservation policies, appraisal strategies, human labor and decision-making, keeping momentum for programs, communicating to stakeholders, ethical partnerships). One point that struck me during the community archives session was Verónica Reyes-Escudero’s discussion of “cultural competency as a tool for front-end digital preservation.” By conceptualizing interpersonal skills as a technology for facilitating digital preservation, we gain a broader and more ethically grounded idea of what it is we are really trying to do by preserving bits in the first place. Software and hardware are part of the picture, but they are certainly not the whole view.

The second major theme was that digital preservation is best done together. Distributed digital preservation platforms, consortial preservation models, and collaborative research networks were also well-represented by speakers from LOCKSS, Texas Digital Library (TDL), Duraspace, Open Preservation Foundation, Software Preservation Network, and others. The takeaway from these sessions was that the sheer resource-intensiveness of digital preservation means that institutions, both large and small, are going to have to collaborate in order to achieve their goals. PASIG seemed to be a place where attendees could foster and strengthen these collective efforts. Throughout the conference, presenters also highlighted failures of collaborative projects and the need for sustainable financial and governance models, particularly in light of recent developments at the Digital Preservation Network (DPN) and Digital Public Library of America (DPLA). I was particularly impressed by Mary Molinaro’s honest and informative discussion about the factors that led to the shuttering of DPN. Molinaro indicated that DPN would soon be publishing a final report in order to transparently share their model, flaws and all, with the broader community.

Touching on both of these themes, Carlos Martínez Suárez of Video Trópico Sur gave a moving keynote about his collaboration with Natalie M. Baur, Preservation Librarian at Colmex, to digitize and preserve video recordings he made while living with indigenous groups in the Mexican state of Chiapas. The question and answer portion of this session highlighted some of the ethical issues surrounding rights and consent when providing access to intimate documentation of people’s lives. While Colmex is not yet focusing on access to this collection, it was informative to hear Baur and others talk a bit about the ongoing technical, legal, and ethical challenges of a work-in-progress collaboration.

Presenters also provided some awesome practical tools for attendees to take home with them. One of the many great open resources session leaders shared was Frances Harrell (NEDCC) and Alexandra Chassanoff (Educopia)’s DigiPET: A Community Built Guide for Digital Preservation Education + Training Google document, a living resource for compiling educational tools that you can add to using this form. Julian Morley also shared a Preservation Storage Cost Model Google sheet that contains a template with a wealth of information about estimating the cost of different digital preservation storage models, including comparisons for several cloud providers. Amy Rudersdorf (AVP), Ben Fino-Radin (Small Data Industries), and Frances Harrell (NEDCC) also discussed helpful frameworks for conducting self-assessments.

Selina Aragon, Daina Bouquin, Don Brower, and Seth Anderson discuss the challenges of software preservation.

PASIG closed out by spending some time on the challenges involved with preserving emerging and complex formats. On the last afternoon of sessions, Amelia Acker (University of Texas at Austin) spoke about the importance of preserving APIs, terms of service, and other “born-networked” formats when archiving social media. She was followed by a panel of software preservationists who discussed different use cases for preserving binaries, source code, and other software artifacts.

Conference slides are all available online.

Thanks to the wonderful work of the PASIG 2019 steering, program, and local arrangements committees!


Kelly Bolding is the Project Archivist for Americana Manuscript Collections at Princeton University Library, as well as the team leader for bloggERS! She is interested in developing workflows for processing born-digital and audiovisual materials and making archival description more accurate, ethical, and inclusive.

Advertisements

Contribute to an ERS Community Project!

Please take this short survey to contribute to the 2019 ERS Community Project! The survey closes on Friday, March 29.

In December 2018, the ERS Steering Committee put out a call for ideas for a 2019 ERS community project. We’re thankful for the community input and are pleased to announce that we’re building a master list of digital archives and digital preservation resources that can be used for reference, or to provide a resource overlay for existing best practice and workflow documentation. The Committee has begun compiling resources and thinking about how they connect, but broader input is essential to this project’s success.

At this stage, we are interested in getting a sense of what the most useful resources are in our community. Please take our survey to share your top three go-to resources as well as any areas of electronic records work that you feel lack guidance and documentation. We are thinking of resources broadly, so feel free to suggest your three favorite journal articles, blogs, handbooks, workflows, tools and manuals, or any other style of resource that helps you process and preserve born-digital collections.

After the survey closes on Friday, March 29, we’ll compile and share the results. We also hope to eventually open up a community documentation space where anyone can add to our current list of resources. Once the data collection period is over, we’ll determine the best way to share a more polished version of this resource list.

On behalf of the ERS Steering Committee, thank you for participating!

  • Jessica Farrell
  • Jane Kelly
  • Susan Malsbury
  • Donald Mennerich
  • Kelsey O’Connell
  • Alice Prael
  • Jessica Venlet
  • Dorothy Waugh

Just do it: Building technical capacity among Princeton’s Archival Description and Processing Team

by Alexis Antracoli

This is the fifth post in the bloggERS Making Tech Skills a Strategic Priority series.

ArchivesSpace, Archivematica, BitCurator, EAD, the list goes on! The contemporary archivist is tasked with not only processing paper collections, but also with processing digital records and managing the descriptive data we create. This work requires technical skills that archivists twenty or even ten years ago didn’t need to master. It’s also rare that archivists get extensive training in the technical aspects of the field during their graduate programs. So, how can a team of archivists build the skills they’ll need to meet the needs of an increasingly technical field? At the Princeton University Library, the newly formed Archival Description and Processing Team (ADAPT), is committed to meeting these challenges by building technical capacity across the team. We are achieving this by working on real-world projects that require technical skills, and by leveraging existing knowledge and skills in the organization, seeking outside training, and championing supervisor support for using time to grow our technical skills.

One of the most important requirements for growing technical capacity on the processing team is supervisor support for the effort. Workshops, training, and solving technical problems take a significant amount of time. Without management support for the time needed to develop technical skills, the team would not be able experiment, attend trainings, or practice writing code. As the manager of ADAPT, I make this possible by encouraging staff to set specific goals related to developing technical skills on their yearly performance evaluations; I also accept that it might take us a little longer to complete all of our processing. To fit this work into my own schedule, I identify real-world problems and block out time on my schedule to work on them or arrange meetings with colleagues who can assist me. Blocking out time in advance helps me stick to my commitment to building my technical skills. While the time needed to develop these skills means that some work happens more slowly today, the benefit of having a team that can manipulate data and automate processes is an investment in the future that will result in a more productive and efficient processing team.

With the support to devote time to building technical skills, ADAPT staff use a number of resources to improve their skills. Working with internal staff who already have skills they want to learn has been one successful approach. This has generally paired well with the need to solve real-world data problems. For example, we recently identified the need to move some old container information to individual component-level scope and content notes in a finding aid. We were able to complete this after several in-house training sessions on XPath and XQuery taught by a Library staff member. This introductory training helped us realize that the problem could be solved with XQuery scripting and we took on the project, while drawing on the in-house XQuery expert for assistance. This combination of identifying real-world problems and leveraging existing knowledge within the organization leads both to increased technical skills and projects getting done. It also builds confidence and knowledge that can be more easily applied to the next situation that requires a particular kind of technical expertise.

Finally, building in-house expertise requires allowing staff to determine what technical skills they want to build and how they might go about doing it. Often that requires outside training. Over the past several years, we have brought workshops to campus on working with the command line and using the ArchivesSpace API. Staff have also identified online courses and classes offered by the Office of Information Technology as important resources for building their technical skills. Providing support and time to attend these various trainings or complete online courses during the work day creates an environment where individuals can explore their interests and the team can build a variety of technical skills that complement each other.

As archival work evolves, having deeper technology skills across the team improves our ability to get our work done. With the right support, tapping into in-house resources, and seeking out additional training, it’s possible to build increased technological capability with the processing team. In turn, the team will increasingly be able to more efficiently tackle day-to-day technical challenges needed to manage digital records and descriptive data.


Alexis Antracoli is Assistant University Archivist for Technical Services at Princeton University Library where she leads the Archival Processing and Description Team. She has published on web archiving and the archiving of born-digital audio visual content. Alexis is active in the Society of American Archivists, where she serves as Chair of the Web Archiving Section and on the Finance Committee. She is also active in Archives for Black Lives in Philadelphia, an informal group of local archivists who work on projects that engage issues at the intersection of the archival profession and the Black Lives Matter movement. She is especially interested in applying user experience research and user-center design to archival discovery systems, developing and applying inclusive description practices, and web archiving. She holds an M.S.I. in Archives and Records Management from the University of Michigan, a Ph.D. in American History from Brandeis University, and a B.A. in History from Boston College.

Digitizing the Stars: Harvard University’s Glass Plate Collection

by Shana Scott

When our team of experts at Anderson Archival isn’t busy with our own historical collection preservation projects, we like to dive into researching other preservation and digitization undertakings. We usually dedicate ourselves to the intimate collections of individuals or private institutions, so we relish opportunities to investigate projects like Harvard University’s Glass Plate Collection.

For most of the sciences, century-old information would be considered at best a historical curiosity and at worst obsolete. But for the last hundred and forty years, Harvard College’s Observatory has housed one of the most comprehensive collections of photographs of the night’s sky as seen from planet Earth, and this data is more than priceless—it’s breakable. For nearly a decade, Harvard has been working to not only protect the historical collection but to bring it—and its enormous amount of underutilized data—into the digital age.

Star Gazing in Glass

Before computers and cameras, the only way to see the stars was to look up with the naked eye or through a telescope. With the advent of the camera, a whole new way to study the stars was born, but taking photographs of the heavens isn’t as easy as pointing and clicking. Photographs taken by telescopes were produced on 8″x10″ or 8″x14″ glass plates coated in a silver emulsion exposed over a period of time. This created a photographic negative on the glass that could be studied during the day.

(DASCH Portion of Plate b41215) Halley’s comet taken on April 21, 1910 from Arequipa, Peru.

This allowed a far more thorough study of the stars than one night of stargazing could offer. By adjusting the telescopes used and exposure times, stars too faint for the human eye to see could be recorded and analyzed. It was Henry Draper who took this technology to the next level.

In 1842, amateur astronomer Dr. Henry Draper used a prism over the glass plate to record the stellar spectrum of stars and was the first to successfully record a star’s spectrum. Dr. Draper and his wife, Anna, intended to devote his retirement to the study of stellar spectroscopy, but he died before they could begin. To continue her husband’s work, Anna Draper donated much of her fortune and Dr. Draper’s equipment to the Harvard Observatory for the study of stellar spectroscopy. Harvard had already begun photographing on glass plates, but with Anna Draper’s continual contributions, Harvard expanded its efforts, photographing both the stars and their spectrums.

Harvard now houses over 500,000 glass plates of both the northern and southern hemispheres, starting in 1882 and ending in 1992 when digital methods outpaced traditional photography. This collection of nightly recordings, which began as the Henry Draper Memorial, has been the basis for many of astronomy’s advancements in understanding the universe.

The Women of Harvard’s Observatory

Edward C. Pickering was the director of the Harvard Observatory when the Henry Draper Memorial was formed, but he did more than merely advance the field through photographing of the stars. He fostered the education and professional study of some of astronomy’s most influential members—women who, at that time, might never have received the chance—or credit—Pickering provided.

Instead of hiring men to study the plates during the day, Pickering hired women. He felt they were more detailed, patient, and, he admitted, cheaper. Williamina Fleming was one of those female computers.  She developed the Henry Draper Catalogue of Stellar Spectra and is credited with being the first to see the Horsehead nebula through her work examining the plates.

The Horsehead nebula taken by the Hubble Space Telescope in infrared light in 2013.
Image Credit: NASA/ESA/Hubble Heritage Team
(DASCH Portion of Plate b2312) The collection’s first image of the Horsehead Nebula taken on February 7, 1888 from Cambridge.

 

 

 

 

 

 

 

 

 

The Draper Catalogue included the first classification of stars based on stellar spectra, as created by Fleming. Later, this classification system would be modified by another notable female astronomer at Harvard, Annie Jump Cannon. Cannon’s classification and organizational scheme became the official method of cataloguing stars by the International Solar Union in 1910, and it continues to be used today.

Another notable female computer was Henrietta Swan Leavitt, who figured out a way to judge the distance of stars based on the brightness of stars in the Small Megellanic Cloud. Leavitt’s Law is still used to determine astronomical distances. The Glass Universe by Dava Sobel chronicles the stories of many of the female computers and the creation of Harvard Observatory’s plate collection.

Digital Access to a Sky Century @ Harvard (DASCH)

The Harvard Plate Collection is one of the most comprehensive records of the night’s sky, but less than one percent of it has been studied. For all of the great work done by the Harvard women and the astronomers who followed them, the fragility of the glass plates meant someone had to travel to Harvard to see them, and then the study of even a single star over a hundred years required a great deal of time. For every discovery made from the plate collection, like finding Pluto, hundreds or thousands more are waiting to be found.

(DASCH Single scan tile from Plate mc24889) First discovery image of Pluto with Clyde Tombaugh’s notes written on the plate. Taken at Cambridge on April 23, 1930.
Initial enhanced color image of Pluto released in July 2015 during New Horizon’s flyby.
Source: NASA/JHUAPL/SwRI
This is a more accurate image of the natural colors of Pluto as the human eye would see it. Taken by New Horizons in July 2015.
Source: NASA/Johns Hopkins University Applied Physics Laboratory/Southwest Research Institute/Alex Parker

 

 

 

 

 

 

 

 

 

With all of this unused, breakable data and advances in computing ability, Professor Jonathan Grindlay began organizing and funding DASCH in 2003 in an effort to digitize the entire hundred-year plate historical document collection. But Grindlay had an extra obstacle to overcome. Many of the plates had handwritten notes written by the female computers and other astronomers. Grindlay had to balance the historical significance of the collection with the vast data it offered. To do this, the plates are scanned at low resolution with the marks in place, then they are cleaned and rescanned at the extremely high resolution necessary for data recording.

A custom scanner had to be designed and constructed specifically for the glass plates and new software was created to bring the digitized image into line with current astronomical data methods. The project hasn’t been without its setbacks, either. Finding funding for the project is a constant problem, and in January 2016, the Observatory’s lowest level flooded. Around 61,000 glass plates were submerged and had to be frozen immediately to prevent mold from damaging the negatives. While the plates are intact, many still need to be unfrozen and restored before being scanned. The custom scanner also had to be replaced because of the flooding.

George Champine Logbook Archive

In conjunction with the plate scanning, a second project is necessary to make the plates useable for extended study. The original logbooks of the female computers contain more than their observations of the plates. These books record the time, date, telescope, emulsion type, and a host of other identifying information necessary to place and digitally extrapolate the stars on the plates. Over 800 logbooks (nearly 80,000 images in total) were photographed by volunteer George Champine.

Those images are now in the time-consuming process of being manually transcribed. Harvard Observatory partnered with the Smithsonian Institution to enlist volunteers who work every day reading and transcribing the vital information in these logbooks. Without this data, the software can’t accurately use the star data scanned from the plates.

Despite all the challenges and setbacks, 314,797 plates have been scanned as of December 2018. The data released and analyzed from the DASCH project has already made new discoveries about variable stars. Once the entire collection of historical documents is digitized, more than a hundred years will be added to the digital collection of astronomical data, and they will be free for anyone to access and study, professional or amateur.

The Harvard Plate Collection is a great example of an extraordinary resource to its community being underused due to the medium. Digital conversion of data is a great way to help any field of research. While Harvard’s plate digitization project provides a model for the conversion of complex data into digital form, not all institutions have the resources to attempt such a large enterprise. If you have a collection in need of digitization, contact Anderson Archival today at 314.259.1900 or email us at info@andersonarchival.com.


Shana Scott is a Digital Archivist and Content Specialist with Anderson Archival, and has been digitally preserving historical materials for over three years. She is involved in every level of the archiving process, creating collections that are relevant, accessible, and impactful. Scott has an MA in Professional Writing and Publishing from Southeast Missouri State University and is a member of SFWA.

Call for Contributions: Conversations Series!

Digital skills have become increasingly important for both new and established archivists of all stripes, not just those with “digital” in their job titles. This series aims to foster relationships and facilitate the sharing of knowledge between archivists who are already working with born-digital records and those who are interested in building their digital skills. In collaboration with SAA’s Students and New Professionals (SNAP) section, the Electronic Records Section seeks students and new professionals to conduct brief interviews of people working with born-digital records about what it’s like on a daily basis, as well as career pathways, helpful skill sets, and other topics. Students/new professionals will then write up the interviews for publication on both the ERS and SNAP section blogs. We are currently seeking volunteers for both interviewers and interviewees. Please see additional information about both roles below, and fill out this short Google form to sign up!

Call for interviewers:

  • You are: a student or new professional (or anyone else interested in learning more about what digital archivists do)
  • You will:
    • Get paired with an archivist who is well-versed in digital records
    • Schedule and conduct a brief interview (via chat/email, video, phone, etc.), using your own interview questions (plus a few we’ll suggest)
    • Write up the interview into a blog post and run it by your interviewee for review
    • Build a relationship with a cool archivist; learn and help others learn about born-digital archives work

Call for interviewees:

  • You are: a digital archivist or an archivist with any job title who works with born-digital records
  • You will:
    • Get paired with a student/new professional
    • Participate in a brief interview (via chat/email, video, phone, etc.)
    • Review the interview write-up prior to publication
    • Build a relationship with an awesome student/new professional; generously share your expertise/wisdom with others in the field

Writing for bloggERS! “Conversations” Series

  • Written content should be roughly 600-800 words in length (ok to exceed a bit)
  • Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
  • Align with other editorial guidelines as outlined in the bloggERS! guidelines for writers.

Posts for this series will start in March, so let us know if you are interested in participating by filling out this (short) Google form ASAP. Send any questions to ers.mailer.blog@gmail.com!

Students Reflect (Part 2 of 2): Failure and Learning Tech Skills

This is the fourth post in the bloggERS Making Tech Skills a Strategic Priority series.

As part of our “Making Tech Skills a Strategic Priority” series, the bloggERS team asked five current or recent MLIS/MSIS students to reflect on how they have learned the technology skills necessary to tackle their careers after school. In this post, Anna Speth and Jane Kelly reflect thoughtfully on adapting their mindsets to embrace new challenges and learn from failure.

Anna Speth, 2017 graduate, Simmons College

I am about to celebrate a year in my first full-time position, Librarian for Emerging Technology and Digital Projects at Pepperdine University.  In this role I work on digital initiatives, often in tandem with the archive, and direct our emerging technology makerspace. By choosing to center my graduate career on digital archiving, I felt well prepared for the digital initiatives piece.  However, running the makerspace has been a whirlwind of grappling with the world of emerging tech. My best piece of advice (which we’ve all heard a million times) is to maintain a “learner mindset.” I’m a traditional learner who has mastered the lecture-memorize-regurgitate academic system. This approach doesn’t do much when it comes to hands-on tech.  I am faced with 3D printers, VR systems, arduinos, ozobots, CONTENTdm, and more with minimal instruction. I watch tutorials, but these rarely offer a path to in-depth understanding. Instead, I’ve had to overcome the mindset that I’m not a tech person and will make something worse by messing with it. If the 3D printer doesn’t work, you certainly aren’t going to make it worse by taking it apart and trying to put it back together. If you don’t know how to reorder a multipage object on the backend of CONTENTdm, create a hidden sandbox collection and start experimenting.  Remember that the internet – Google, user forums, Reddit, company reps – is your friend. Also remember (and I tell this to kids in the makerspace just as often as I tell it to myself) that failure is your friend. If you mess something up, then all you’ve done is learn more about how the system works by learning how it doesn’t work. Iteration and perseverance are key. And, as this traditional learner has realized, a whole lot of fun!

Jane Kelly, 2018 grad, University of Illinois at Urbana-Champaign

Developing new tech skills has, at least for me, been a process of learning to fail. The intensive Introduction to Computer Science course I took several years ago was supposed to be fun – a benefit of being able to take college courses for almost nothing as a staff member on campus. It might have been fun for the first three weeks of the semester, but that was followed by a lot of agonizing, handwringing, and tears.

I now reflect on my time in that course as an intensive introduction to failure. This shift in mentality – learning how to fail, and how to accept it – has been key for me in being open to developing my tech skills on the job. I don’t worry so much about messing up, not knowing the answer, or the possibility of breaking my computer.

As a humanities student, it simply was never acceptable to me to turn in an assignment incomplete or “wrong.” In that computer science class, and in the information processing course I took at the iSchool at the University of Illinois a couple years later, an incomplete assignment could be a stellar attempt, proof of lessons learned, and an indication of where help is required. The rubric for good work is different for a computer science problem set than a history paper. It has been a valuable lesson to revisit as I try to develop my skills independently and in the workplace.

I have acquired and maintained my tech skills through a combination of computer science coursework before and during library school, an in-person SAA pre-conference sessions that my employer paid for, and, of course, the internet. Apps like Learn to Code with Python or free online courses can be an introduction to a programming language or a quick refresher since I inevitably forget much of what I learn in class before I can put it to work at a job. Google and Stack Exchange are lifesavers, both because I can often find the answer to my question about the mysterious error code I see in the terminal window and reassure myself that I’m not the first person to pose the question.

More than anything, my openness to what I once thought of as failure has been pivotal to my development. It can take a long time to learn and understand exactly what is going on under the hood with some new software or process, but that’s okay. Sometimes a fake-it-til-you-make-it mentality is exactly what’s needed to push yourself to tackle a new challenge. For me, learning tech skills is learning to be okay with failure as a learning process.


 

Speth-Anna_800x450Anna Speth is the Librarian for Emerging Technology and Digital Projects at Pepperdine’s Payson Library where she co-directs a makerspace and works with digital initiatives. Anna focuses on the point of connection between technology and history.  She holds a BA from Duke University and a MLIS from Simmons College.

 

ERS_jane-kellyJane Kelly is the Web Archiving Assistant for the #metoo Digital Media Collection at the Schlesinger Library on the History of Women in America and a 2018 graduate of the iSchool at the University of Illinois. Her interests lie at the intersection of digital archives and the people who use them.

Preserve This Podcast!

by Molly Schwartz

Mary Kidd (MLIS ’14) and Dana Gerber-Margie (MLS ’13) first met at a Radio Preservation Task Force meeting in 2016. They bonded over experiences of conference fatigue, but quickly moved onto topics near and dear to both of their hearts: podcasts and audio archiving. Dana Gerber-Margie has been a long-time podcast super-listener. She is subscribed to over 1400 podcasts, and she regularly listens to 40-50 of them. She launched a podcast recommendation newsletter when she was getting her MLS, called “The Audio Signal,” which has grown into a popular podcast publication called The Bello Collective. Mary was a National Digital Stewardship Resident at WNYC, where she was creating a born-digital preservation strategy for their archives. She had worked on analog archives projects in the past — scanning and transferring collections of tapes — but she’s embraced the madness and importance of preserving born-digital audio. Mary and Dana stayed in touch and continued to brainstorm ideas, which blossomed into a workshop about podcast preservation that they taught at the Personal Digital Archives conference at Stanford in 2017, along with Anne Wootton (co-founder of Popup Archive, now at Apple Podcasts).

Then Mary and I connected at the National Digital Stewardship Residency symposium in Washington, DC in 2017. I got my MLS back in 2013, but since then I’ve been working more at the intersection of media, storytelling, and archives. I had started a podcast and was really interested, for selfish reasons, in learning the most up-to-date best practices for born-digital audio preservation. I marched straight up to Mary and said something like, “hey, let’s work together on an audio preservation project.” Mary set up a three-way Skype call with Dana on the line, and pretty soon we were talking about podcasts. How we love them. How they are at risk because most podcasters host their files on commercial third-party platforms. And how we would love to do a massive outreach and education program where we teach podcasters that their digital files are at risk and give them techniques for preserving them. We wrote these ideas into a grant proposal, with a few numbers and a budget attached, and the Andrew W. Mellon Foundation gave us $142,000 to make it happen. We started working on this grant project, called “Preserve This Podcast,” back in February 2018. We’ve been able to hire people who are just as excited about the idea to help us make it happen. Like Sarah Nguyen, a current MLIS student at the University of Washington and our amazing Project Coordinator.

Behaviors chart from the Preserve This Podcast! survey.

One moral of this story is that digital archives conferences really can bring people together and inspire them to advance the field. The other moral of the story is that, after months of consulting audio preservation experts and interviewing podcasters and getting 556 podcasters to take a survey and reading about the history of podcasting, we can confirm that podcasts are disappearing and podcast producers are not adequately equipped to preserve their work against the onslaught of forces working against the long-term endurance of digital information rendering devices. There is more information on our website about the project (preservethispodcast.org) and in the report about the survey findings. Please reach out to mschwartz@metro.org or snguyen@metro.org if you have any thoughts or ideas.


Molly Schwartz is the Studio Manager at the Metropolitan New York Library Council (METRO). She is the host and producer of two podcasts about libraries and archives — Library Bytegeist and Preserve This Podcast. Molly did a Fulbright grant at the Aalto University Media Lab in Helsinki, was part of the inaugural cohort of National Digital Stewardship Residents in Washington, D.C., and worked at the U.S. State Department as a data analyst. She holds an MLS with a specialization in Archives, Records and Information Management from the University of Maryland at College Park and a BA/MA in History from the Johns Hopkins University.

Students Reflect (Part 1 of 2): Tech Skills In and Out of the Classroom

By London Stever, Hayley Wilson, and Adriana Casarez

This is the third post in the bloggERS Making Tech Skills a Strategic Priority series.

As part of our “Making Tech Skills a Strategic Priority” series, the bloggERS team asked five current and recent MLIS/MSIS students to reflect on how they have learned the technology skills necessary to tackle their careers after school. One major theme, as expressed by these three writers, is the need for a balance of learning inside and outside the classroom.

London Stever, 2018 graduate, University of Pittsburgh

Approaching the six-month anniversary of my MLIS graduation, I find myself reflecting on my technological growth. Going into graduate school, I expected little technology training. Naively, I believed that most archival jobs were paper-only, excepting occasional digitization projects. Imagine my surprise upon finding out the University of Pittsburgh required an introduction to HTML. This trend continued, as the university insisted students have balanced knowledge.

I took technology-focused courses ranging from a history of computers (useful for those expecting to work with older hardware) to an overview of open-source library repositories and learning management systems (not to be discounted by those going into academia). The most useful of these classes was the required digital humanities course. Since graduating, I have applied the practical introduction to ArchivesSpace and Archivematica – and the in-depth explanation of discoverability, access, and web crawling – to my current work at SAE International.

However, none of the information I learned in those classes would be helpful on its own. University did not prepare me for talking to the IT Department. Terminology used in archives and in IT often overlaps, but usage does not. Custom, in-house programs require troubleshooting, and university technology classes did not teach me those skills. Libraries and archives often need to work with software not specially designed for them, but the university did not address this.

Self-taught classes, YouTube videos, and outside certifications were the most useful technology education for me. Using these, I customized my education to meet the needs companies mention and my own learning needs, which focus on practical application I did not get in university. I understand troubleshooting, allowing me to use programs built fifteen years ago. Creating a blog or using a content services platform to increase discoverability and internal access is a breeze. In addition to the balanced digital to analog education of university, I also needed a balance of library and general technology education.

Hayley Wilson, current student, University of North Carolina at Chapel Hill

When registering for classes at UNC Chapel Hill prior to the Fall semester of 2017, I was informed that I was required to fulfill a technology competency requirement. I had the option to either take an at home test or take a technology course (for no credit). I decided to take the technology course because I assumed it would be beneficial to other classes I would be required to take as an MLS student.

As it turns out, as a library science student on the archives and records management track, I had a very strict set of courses I was required to take, with room for only two electives. None of these required courses were focused on technology or building technology skills. I have friends on the Information Science side of the program who are required to take numerous courses that have a strong focus on technology. Fortunately, while at SILS I have had numerous opportunities outside of the classroom to learn and build my technology skills through my various internships and graduate assistant positions. However, I don’t think that every student has the opportunity to do so in their jobs.

Adriana Cásarez, 2018 graduate, University of Texas at Austin

Entering my MSIS program with an interest in digital humanities, I expected my coursework would provide most of the expertise I needed to become a more tech-savvy researcher. Indeed, a survey course in digital humanities gave me an overview of digital tools and methodologies. Additionally, a more intense programming course for cultural data analysis taught me specialized coding for data analysis, machine learning and data visualization. The programming was challenging and using the command line was daunting, but I was fortunate to develop a network of motivated peers who also wanted to develop their technical aptitude.  

Sometimes, I felt I was learning just as many technical skills outside of my general coursework. The university library offered workshops on digital scholarship tools for the academic community. My technical skills and knowledge of trends in topics like text analysis, data curation, and metadata grew by attending as many as I could. The Digital Scholarship Librarian and I also organized co-working sessions for students working on digital scholarship projects. These sessions created a community of practice to share expertise, feedback, and support with others interested in developing their technical aptitude in a productive space. We discussed the successes and frustrations with our projects and with the technology that we were often independently teaching ourselves to use. These community meetups were invaluable avenues to learn from each other and further develop our technical capabilities.

With increased focus on digital archives, libraries and scholarship, students often feel expected to just know or to teach themselves technical skills independently. My experience in my MSIS program taught me that often others are in the same boat, experiencing similar frustrations but too embarrassed to ask for help or admit ignorance. Communities of practice are essential to create an environment where students felt comfortable discussing obstacles and developing technical skills together.


Stever-LondonLondon Stever is an archival consultant at SAE International, where she balances company culture with international and industry standards, including bridging the gap between IT and discovery partners. London graduated from the University of Pittsburgh’s MLIS – Archives program and is currently working on her CompTIA certifications. She values self-education and believes multilingualism and technological literacy are the keys to archival accessibility. Please email london.stever@outlook.com or go to londonstever.com to contact London.

IMG_0186-2

Hayley Wilson is originally from San Diego but moved to New York to attend New York University. She graduated from NYU with a BA in Art History and stayed in NYC to work for a couple of years before moving abroad to work. She then moved to North Carolina for graduate school and will be graduating in May with her master’s degree in Library Science with a concentration in Archives and Records Management.

casarez_headshotAdriana Cásarez is a recent MSIS graduate from the University of Texas at Austin. She has worked as a research assistant on a digital classics project for the Quantitative Criticism Lab. She also developed a digital collection of artistic depictions of the Aeneid using cultural heritage APIs. She aspires to work in digital scholarship and advocate for diversity and inclusivity in libraries.

More skills, less pain with Library Carpentry

By Jeffrey C. Oliver, Ph.D

This is the second post in the bloggERS Making Tech Skills a Strategic Priority series.

Remember that scene in The Matrix where Neo wakes and says “I know kung fu”? Library Carpentry is like that. Almost. Do you need to search lots of files for pieces of text and tire of using Ctrl-F? In the UNIX shell lesson you’ll learn to automate tasks and rapidly extract data from files. Are you managing datasets with not-quite-standardized data fields and formats? In the OpenRefine lesson you’ll easily wrangle data into standard formats for easier processing and de-duplication. There are also Library Carpentry lessons for Python (a popular scripting programming language), Git (a powerful version control system), SQL (a commonly used relational database interface), and many more.

But let me back up a bit.

Library Carpentry is part of the Carpentries, an organization is designed to provide training to scientists, researchers, and information professionals on the computational skills necessary for work in this age of big data.

The goals of Library Carpentry align with this series’ initial call for contributions, providing resources for those in data- or information-related fields to work “more with a shovel than with a tweezers.” Library Carpentry workshops are primarily hands-on experiences with tools to make work more efficient and less prone to mistakes when performing repeated tasks.

One of the greatest parts about a Library Carpentry workshop is that they begin at the beginning. That is, the first lesson is an Introduction to Data, which is a structured discussion and exercise session that breaks down jargon (“What is a version control system”) and sets down some best practices (naming things is hard).

Not only are the lessons designed for those working in library and information professions, but they’re also designed by “in the trenches” folks who are dealing with these data and information challenges daily. As part of the Mozilla Global Sprint, Library Carpentry ran a two-day hackathon in May 2018 where lessons were developed, revised, remixed, and made pretty darn shiny by contributors at ten different sites. For some, the hackathon itself was an opportunity to learn how to use GitHub as a collaboration tool.

Furthermore, Library Carpentry workshops are led by librarians, like the most recent workshop at the University of Arizona, where lessons were taught by our Digital Scholarship Librarian, our Geospatial Specialist, our Liaison Librarian to Anthropology (among other domains), and our Research Data Management Specialist.

Now, a Library Carpentry workshop won’t make you an expert in Python or the UNIX command line in two days. Even Neo had to practice his kung fu a bit. But workshops are designed to be inclusive and accessible, myth-busting, and – I’ll say it – fun. Don’t take my word for it, here’s a sampling of comments from our most recent workshop:

  • Loved the hands-on practice on regular expressions
  • Really great lesson – I liked the challenging exercises, they were fun! It made SQL feel fun instead of scary
  • Feels very powerful to be able to navigate files this way, quickly & in bulk.

So regardless of how you work with data, Library Carpentry has something to offer. If you’d like to host a Library Carpentry workshop, you can use our request a workshop form. You can also connect to Library Carpentry through social media, the web, or good old fashioned e-mail. And since you’re probably working with data already, you have something to offer Library Carpentry. This whole endeavor runs on the multi-faceted contributions of the community, so join us, we have cookies. And APIs. And a web scraping lesson. The terrible puns are just a bonus.

IEEE Big Data 2018: 3rd Computational Archival Science (CAS) Workshop Recap

by Richard Marciano, Victoria Lemieux, and Mark Hedges

Introduction

The 3rd workshop on Computational Archival Science (CAS) was held on December 12, 2018, in Seattle, following two earlier CAS workshops in 2016 in Washington DC and in 2017 in Boston. It also built on three earlier workshops on ‘Big Humanities Data’ organized by the same chairs at the 2013-2015 conferences, and more directly on a symposium held in April 2016 at the University of Maryland. The current working definition of CAS is:

A transdisciplinary field that integrates computational and archival theories, methods and resources, both to support the creation and preservation of reliable and authentic records/archives and to address large-scale records/archives processing, analysis, storage, and access, with aim of improving efficiency, productivity and precision, in support of recordkeeping, appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival material [1].

The workshop featured five sessions and thirteen papers with international presenters and authors from the US, Canada, Germany, the Netherlands, the UK, Bulgaria, South Africa, and Portugal. All details (photos, abstracts, slides, and papers) are available at: http://dcicblog.umd.edu/cas/ieee-big-data-2018-3rd-cas-workshop/. The keynote focused on using digital archives to preserve the history of WWII Japanese-American incarceration and featured Geoff Froh, Deputy Director at Densho.org in Seattle.

Keynote speaker Geoff Froh, Deputy Director at Densho.org in Seattle presenting on “Reclaiming our Story: Using Digital Archives to Preserve the History of WWII Japanese American Incarceration.”

This workshop explored the conjunction (and its consequences) of emerging methods and technologies around big data with archival practice and new forms of analysis and historical, social, scientific, and cultural research engagement with archives. The aim was to identify and evaluate current trends, requirements, and potential in these areas, to examine the new questions that they can provoke, and to help determine possible research agendas for the evolution of computational archival science in the coming years. At the same time, we addressed the questions and concerns scholarship is raising about the interpretation of ‘big data’ and the uses to which it is put, in particular appraising the challenges of producing quality – meaning, knowledge and value – from quantity, tracing data and analytic provenance across complex ‘big data’ platforms and knowledge production ecosystems, and addressing data privacy issues.

Sessions

  1. Computational Thinking and Computational Archival Science
  • #1:Introducing Computational Thinking into Archival Science Education [William Underwood et al]
  • #2:Automating the Detection of Personally Identifiable Information (PII) in Japanese-American WWII Incarceration Camp Records [Richard Marciano, et al.]
  • #3:Computational Archival Practice: Towards a Theory for Archival Engineering [Kenneth Thibodeau]
  • #4:Stirring The Cauldron: Redefining Computational Archival Science (CAS) for The Big Data Domain [Nathaniel Payne]
  1. Machine Learning in Support of Archival Functions
  • #5:Protecting Privacy in the Archives: Supervised Machine Learning and Born-Digital Records [Tim Hutchinson]
  • #6:Computer-Assisted Appraisal and Selection of Archival Materials [Cal Lee]
  1. Metadata and Enterprise Architecture
  • #7:Measuring Completeness as Metadata Quality Metric in Europeana [Péter Királyet al.]
  • #8:In-place Synchronisation of Hierarchical Archival Descriptions [Mike Bryant et al.]
  • #9:The Utility Enterprise Architecture for Records Professionals [Shadrack Katuu]
  1. Data Management
  • #10:Framing the scope of the common data model for machine-actionable Data Management Plans [João Cardoso et al.]
  • #11:The Blockchain Litmus Test [Tyler Smith]
  1. Social and Cultural Institution Archives
  • #12:A Case Study in Creating Transparency in Using Cultural Big Data: The Legacy of Slavery Project [Ryan CoxSohan Shah et al]
  • #13:Jupyter Notebooks for Generous Archive Interfaces [Mari Wigham et al.]

Next Steps

Updates will continue to be provided through the CAS Portal website, see: http://dcicblog.umd.edu/cas and a Google Group you can join at computational-archival-science@googlegroups.com.

Several related events are scheduled in April 2019: (1) a 1 ½ day workshop on “Developing a Computational Framework for Library and Archival Education” will take place on April 3 & 4, 2019, at the iConference 2019 event (See: https://iconference2019.umd.edu/external-events-and-excursions/ for details), and (2) a “Blue Sky” paper session on “Establishing an International Computational Network for Librarians and Archivists” (See: https://www.conftool.com/iConference2019/index.php?page=browseSessions&form_session=356).

Finally, we are planning a 4th CAS Workshop in December 2019 at the 2019 IEEE International Conference on Big Data (IEEE BigData 2019) in Los Angeles, CA. Stay tuned for an upcoming CAS#4 workshop call for proposals, where we would welcome SAA member contributions!

References

[1] “Archival records and training in the Age of Big Data”, Marciano, R., Lemieux, V., Hedges, M., Esteva, M., Underwood, W., Kurtz, M. & Conrad, M.. See: LINK. In J. Percell , L. C. Sarin , P. T. Jaeger , J. C. Bertot (Eds.), Re-Envisioning the MLS: Perspectives on the Future of Library and Information Science Education (Advances in Librarianship, Volume 44B, pp.179-199). Emerald Publishing Limited. May 17, 2018. See: http://dcicblog.umd.edu/cas/wp-content/uploads/sites/13/2017/06/Marciano-et-al-Archival-Records-and-Training-in-the-Age-of-Big-Data-final.pdf


Richard Marciano is a professor at the University of Maryland iSchool where he directs the Digital Curation Innovation Center (DCIC). He previously conducted research at the San Diego Supercomputer Center at the University of California San Diego for over a decade. His research interests center on digital preservation, sustainable archives, cyberinfrastructure, and big data. He is also the 2017 recipient of Emmett Leahy Award for achievements in records and information management. Marciano holds degrees in Avionics and Electrical Engineering, a Master’s and Ph.D. in Computer Science from the University of Iowa. In addition, he conducted postdoctoral research in Computational Geography.

Victoria Lemieux is an associate professor of archival science at the iSchool and lead of the Blockchain research cluster, Blockchain@UBC at the University of British Columbia – Canada’s largest and most diverse research cluster devoted to blockchain technology. Her current research is focused on risk to the availability of trustworthy records, in particular in blockchain record keeping systems, and how these risks impact upon transparency, financial stability, public accountability and human rights. She has organized two summer institutes for Blockchain@UBC to provide training in blockchain and distributed ledgers, and her next summer institute is scheduled for May 27-June 7, 2019. She has received many awards for her professional work and research, including the 2015 Emmett Leahy Award for outstanding contributions to the field of records management, a 2015 World Bank Big Data Innovation Award, a 2016 Emerald Literati Award and a 2018 Britt Literary Award for her research on blockchain technology. She is also a faculty associate at multiple units within UBC, including the Peter Wall Institute for Advanced Studies, Sauder School of Business, and the Institute for Computers, Information and Cognitive Systems.

Mark Hedges is a Senior Lecturer in the Department of Digital Humanities at King’s College London, where he teaches on the MA in Digital Asset and Media Management, and is also Departmental Research Lead. His original academic background was in mathematics and philosophy, and he gained a PhD in mathematics at University College London, before starting a 17-year career in the software industry, before joining King’s in 2005. His research is concerned primarily with digital archives, research infrastructures, and computational methods, and he has led a range of projects in these areas over the last decade. Most recently has been working in Rwanda on initiatives relating to digital archives and the transformative impact of digital technologies.