Software Preservation Network: Community Roadmapping for Moving Forward

By Susan Malsbury

This is the fifth post in our series on the Software Preservation Network 2016 Forum.
____

Software Preservation Network logo

The final session of the Software Preservation Forum was a community roadmapping activity with two objectives: to synthesize topics, patterns, and projects that came up during the forum, and to articulate steps and the time frame for future work. This session built off of two earlier activities in the day: an icebreaker in the morning and a brainstorming activity in the afternoon.

For the morning icebreaker, participants –armed with blank index cards and a pen–found someone in the room they hadn’t met before. After brief introductions they each shared one challenge that their organization faced with software and/or software preservation, and they wrote their partner’s challenge on their own index card. After five rounds of this, participants returned to their tables for the opening remarks from the Jessica Meyerson and Zach Vowell, and Cal Lee.

At the afternoon brainstorming activity, participants took the cards form the morning icebreaker as well as fresh cards and again paired with someone they hadn’t met. Each pair looked over their notes from the morning and wrote out goals, tasks, and projects that could respond to the challenges. By that point, we had three excellent sessions as well as casual conversations over lunch and coffee breaks to further inform potential projects.

I paired with Amy Stevenson from the Microsoft Corporation. Even though her organization is very different from mine (the New York Public Library), we easily identified projects that would address our own challenges as well as the challenges we gathered in the morning. The projects we identified  included the need for a software registry, educational resources, and a clearinghouse to provide discovery for software. We then placed our cards on a butcher paper timeline at the front of the room that spanned from right now to 2022–a six-year time frame with the first full year being 2017.

During the fourth session on partnerships, Jessica Meyerson entered the goals, projects, and ideas from the timeline into a spreadsheet so that for the fifth session we were ready to get road mapping! For this session we broke into three groups to discuss the roadmap and to work on our own group’s copy of the spreadsheet. Our group subdivided into smaller groups who each took a year of the timeline to edit and comment on. While we all focused on our year, conversation between subgroups flowed freely and people felt comfortable moving projects into other years or streamlining ideas across the entire time frame. Links to the master spreadsheet and our three versions can be found here.

Despite having  three separate groups, it was remarkable how much our edited roadmaps aligned with the others. Not surprisingly, most people felt like it was important to front-load steps regarding research, developing platforms for sharing information, and identifying similar projects to form partnerships. Projects in the later years would grow from this earlier research: creating the registry, establishing a coalition, and developing software metadata models.

I found the forum and this session in particular to be energizing. I had attended the talk that Jessica Meyerson and Zach Vowell gave at SAA in 2014 when they first formed the Software Preservation Network. While I was intrigued by the idea of software preservation it seemed a far off concept to me. At that time, there were still many other issues regarding digital archives that seemed far more pressing. When I heard other people’s challenges at the forum, and had space to think about my own,  I realized how important and timely software preservation is. As digital archives best practices are being codified, more and more we are realizing how dependent we are on (often obsolete) software to do our work.

____

Susan Malsbury is the Digital Archivist for The New York Public Library, working with born digital archival material across the three research centers of the Library. In this role, she assists curators with acquisitions; oversees technical services staff handling ingest and processing; and coordinates with public service staff to design and implement access systems for born digital content. Susan has worked with archives at NYPL in various capacities since 2007.

Pathways to Automated Appraisal for Born-Digital Records: An SAA 2016 ERS Breakout Discussion Recap

By Lora Davis
____

In a stroke of brilliant SAA scheduling (or, perhaps, blind chance) the 2016 Electronic Records Section’s annual business meeting immediately followed Thursday afternoon’s session 201 “From 0 to 400 GB: Confronting the Challenges of Born-Digital Photographs.” During this session, panelists Kristen Yarmey, Ed Busch, Chris Prom, Molly Tighe, and Gregory Wiedeman discussed a variety of steps they’ve taken to answer the question “What next?” following the (physical or digital) delivery of born-digital campus photographs to their repositories. I listened intently as Wiedeman recounted how he has employed the API of his campus’ chosen cloud-based online public photo database (SmugMug) to automate the description of born-digital campus photographs at large scale. By reusing the existing photographer-generated descriptive metadata stored in SmugMug, Wiedeman’s campus photographs “describe themselves.” This struck a chord with me as I look forward to my own institution’s upcoming National Digital Stewardship Residency project “Large-Scale Digital Stewardship: Preserving Johns Hopkins University’s Born-Digital Visual History.” But, I wondered, could a similar method be employed to automate appraisal?

As the formal portion of the ERS business meeting concluded, the Section broke into several unconference-style small group discussions. Inspired by the above, I volunteered to lead one on potential methods for automating the appraisal of born-digital records. Breakout participant Tammi Kim kept discussion notes, as a group of about 20 ERS members engaged in discussion. As is often the case, our conversation occasionally deviated from the primary topic of appraisal, but even these tangents proved fruitful. Some of the topics discussed and questions raised include:

  • The differences and distinctions between born-digital appraisal and weeding. Is the goal of minimizing the total size of digital records ingested (say, reducing 50TB of born-digital campus photographs to 10TB) analogous to actually doing appraisal on these records?
  • Could the type of facial recognition software discussed in session 201 be used not only for description purposes, but also to identify VIPs and other photographic content that would inform appraisal decisions?
  • If the record’s creator (say, a campus photographer) assigned rights or permissions metadata to a digital object, might that rights metadata be employed for appraisal in an MPLP-like fashion?
  • What are the differences between photographic and text-based digital records? Is automated, machine-actionable appraisal more likely to succeed with one type of record than another? (E.g. It is easier to search for text in word processing documents and OCRed PDFs than it is to “search” in photographs.)
  • How can “micro-tools” like ArchiveFinder (product mentioned, but I cannot locate a GitHub page) and FileAnalyzer help with the appraisal of large, complex directories of digital files? Additionally, while tools like ExifTool can read, write, and edit embedded technical metadata, how useful is technical metadata to appraisal decisions?
  • How might content creators be brought into appraisal decisions after content has been transferred to a repository? Can we ask creators to enhance or add metadata after the fact?
  • Where does appraisal actually fit in with processing workflows, especially when working with larger files like video and disk images? How do you manage the need for increased storage even at the appraisal stage?
  • What “traditional” approaches to analog appraisal do not necessarily apply to digital? Where does potential future use of records fit in with born-digital appraisal decisions?
  • Are born digital archives even sustainable monetarily or ecologically? Are we building the Tower of Babel? What about server farms and the offset of dirty fuels?

I encourage anyone who attended this discussion to add to this post and/or correct any of my poor-memory-induced misstatements above by commenting below. Similarly, whether you attended the breakout or not, let’s continue this conversation in the comments section!

Lora Davis is Digital Archivist at Johns Hopkins University, where she is tasked with creating, documenting, and managing workflows for acquiring, describing, processing, preserving, and providing access to born‐digital materials. Prior to her appointment at JHU in January 2016, Lora worked at Colgate University and the University of Delaware.

 

Announcing the First-Ever #bdaccess Twitter Chats: 10/27 @ 2 and 9pm EST

By Jess Farrell and Sarah Dorpinghaus

This post is the fifteenth in a bloggERS series about access to born-digital materials.

____

Contemplating how to provide access to born-digital materials? Wondering how to meet researcher needs for accessing and analyzing files? We are too! Join us for a Twitter chat on providing access to born digital records.

*When?* Thursday, October 27 at 2:00pm and 9:00pm EST
*How?* Follow #bdaccess for the discussion
*Who?* Researchers, information professionals, and anyone else interested in using born-digital records

Newly-conceived #bdaccess chats are organized by an ad-hoc group that formed at the 2015 SAA annual meeting. We are currently developing a bootcamp to share ideas and tools for providing access to born-digital materials and have teamed up with the Digital Library Federation to spread the word about the project.

Understanding how researchers want to access and use digital archives is key to our curriculum’s success, so we’re taking it to the Twitter streets to gather feedback from digital researchers. The following five questions will guide the discussion:

Q1. _What research topic(s) of yours and/or content types have required the use of born digital materials?_

Q2. _What challenges have you faced in accessing and/or using born digital content? Any suggested improvements?_

Q3. _What discovery methods do you think are most suitable for research with born digital material?_

Q4. _What information or tools do/could help provide the context needed to evaluate and use born digital material?_

Q5. _What information about collecting/providing access would you like to see accompanying born digital archives?_

Can’t join on the 27th? Follow #bdaccess for ongoing discussion and future chats!

____

Jess Farrell is the curator of digital collections at Harvard Law School. Along with managing and preserving digital history, she’s currently fixated on inclusive collecting, labor issues in libraries, and decolonizing description.

Sarah Dorpinghaus is the Director of Digital Services at the University of Kentucky Libraries Special Collections Research Center. Although her research interests lie in the realm of born-digital archives, she has a budding pencil collection.

Software Preservation Network: Prospects in Software Preservation Partnerships

By Karl-Rainer Blumenthal

This is the fourth post in our series on the Software Preservation Network 2016 Forum.
____

Software Preservation Network logoTo me, the emphases on the importances of partnership and collaboration were the brightest highlights of August’s Software Preservation Network (SPN) Forum at Georgia State University. The event’s theme, “Action Research: Empowering the Cultural Heritage Community and Mapping Out Next Steps for Software Preservation,” permeated early panels, presentations, and brainstorming exercises, empowering as they did the attending stewards of cultural heritage and technology to advocate the next steps most critical to their own goals in order to build the most broadly representative community. After considering surveys of collection and preservation practices, and case studies evocative of their legal and procedural challenges, attendees collaboratively summarized the specific obstacles to be overcome, strategies worth pursuing together, and goals that represent success. Four stewards guided us through this task with the day’s final panel of case studies, ideas, and a participatory exercise. Under the deceptively simple title of “Partnerships,” this group grounded its discourse in practical cases and progressively widened its circle to encompass the variously missioned parties needed to make software preservation a reality at scale.

Tim Walsh (@bitarchivist), Digital Archivist at the Canadian Centre for Architecture (CCA), introduced the origins of his museum’s software preservation mission in its research program Archaeology of the Digital. Advancing one of the day’s key motifs–of software as environment beyond mere artifact–Walsh explained that the CCA’s ongoing mission to preserve tools of the design trades compels it to preserve whole systems environments in order to provide researcher access to obsolete computer-assisted design (CAD) programs and their files. “There are no valid migration pathways,” he assured us; rather emulation is necessary to sustain access even when it is limited to the reading room. Attaining even that level of accessibility required CCA to reach license agreements with the creators/owners of legacy software, one of the first, most foundational partnerships that any stewarding organization must consider. To grow further still, these partnerships will need to include technical specialists and resource providers beyond CCA’s limited archives and IT staff.

Aliza Leventhal (@alizaleventhal), Corporate Librarian/Archivist at Sasaki Associates, confronts these challenges in her role within a multi-disciplinary design practice, where unencumbered access to the products of at least 14 different CAD programs is a regular need. To meet that need she has similarly reached out to software proprietors, but likewise cultivated an expanding community of stewards in the form of the SAA Architectural Records Roundtable’s CAD/BIM Taskforce. The Taskforce embraces a clearinghouse role for resources “that address the legal, technical and curatorial complexities” of preserving especially environmentally-dependent collections in repositories like her own and Walsh’s. In order to do so, however, Leventhal reminded us that more definitive standards for the actual artifacts, environments, and documentation that we seek to preserve must first be established by independent and (inter-)national authorities like International Organization for Standardization (ISO), the American Institute of Architects (AIA), the National Institute of Building Sciences, and yet unfounded organizations in the design arts realm. Among other things, after all, more technical alignment in this regard could enable multi-institutional repositories to distribute and share acquisition, storage, and access resources and expertise.

Nicholas Taylor (@nullhandle), Web Archiving Service Manager at Stanford University Libraries, asked attendees to imagine a future SPN serving such a role itself–as a multi-institutional service partnership that distributes legal, technical, and curatorial repository management responsibilities in the model of the LOCKSS Program. Citing the CLOCKSS Archive and other private networks as a complementary example from the realms of digital images, government documents, and scholarly publications, Taylor posited that such a partnership would empower participants to act independently as centralizing service nodes, and together in overarching governance. A community-governed partnership would need to meet functional technical requirements for preservation, speak to representative use cases, and, critically, articulate a sustainable business model in order to engender buy-in. If successful though, it could among other things consolidate the broader field’s needs to for licensing and IP agreements like CCA’s.

In addition to meeting its member organizations’ needs, this version of SPN, or a partnership like it, could benefit an even wider international community. Ryder Kouba (@rsko83), Digital Collections Archivist at the American University in Cairo, spoke to this potential from his perspective on the Technology and Research Working Group of UNESCO’s PERSIST Project. The project has already produced guidance on selecting digital materials for preservation among UNESCO’s 200+ member states. Its longer term ambitions, however, include the maintenance of the virtual environments in which members’ legacy software can be preserved and accessed. Defining the functional requirements and features of such a global resource will take the sustained and detailed input of a similarly globally-spanning community, beginning in the room in which the SPN Forum took place, but continuing on to the International Conference on Digital Preservation (iPres) and international convocations beyond.

blumenthal_spn_ersblog_1 blumenthal_spn_ersblog_2

 

 

 

 

 

 

Attendees compose matrices of software preservation needs, challenges, strategies, and outcomes. Photos by Karl-Rainer Blumenthal (left) and @karirene69 (right), CC BY-NC 2.0.

The different scales of partnership thus articulated, the panelists ended their session by facilitating breakout groups in the mapping of discrete problems that partnerships can solve through their necessary steps and towards ideal outcomes. At my table, for instance, the issue of “orphaned” software–software without advocates for long-term preservation–was projected through consolidation in a kind of PRONOM-like registry to get the maintenance that they deserve from partners invested in a LOCKSS-like network. Conceptually simple as each suggestion could be, it could also prompt such different valuations and/or reservations from among just the people in the room as to illustrate how difficult the prioritization of software preservation work can be for a team of partners, rather than independent actors. To accomplish the Forum attendees’ goals equitably as well as efficiently, more consensus needed to be reached concerning the timeline of next steps and meaningful benchmarks, something that we tackled in a final brainstorming session that Susan Malsbury will describe next!

____

Karl-Rainer Blumenthal is a Web Archivist for the Internet Archive’s Archive-It service, where he works with 450+ partner institutions to preserve and share web heritage. Karl seeks to steward collaboration among diversely missioned and resourced cultural heritage organizations through his professional work and research, as we continuously seek new, broadly accessible solutions to the challenges of complex media preservation.