Software Preservation Network: Community Roadmapping for Moving Forward

By Susan Malsbury

This is the fifth post in our series on the Software Preservation Network 2016 Forum.
____

Software Preservation Network logo

The final session of the Software Preservation Forum was a community roadmapping activity with two objectives: to synthesize topics, patterns, and projects that came up during the forum, and to articulate steps and the time frame for future work. This session built off of two earlier activities in the day: an icebreaker in the morning and a brainstorming activity in the afternoon.

For the morning icebreaker, participants –armed with blank index cards and a pen–found someone in the room they hadn’t met before. After brief introductions they each shared one challenge that their organization faced with software and/or software preservation, and they wrote their partner’s challenge on their own index card. After five rounds of this, participants returned to their tables for the opening remarks from the Jessica Meyerson and Zach Vowell, and Cal Lee.

At the afternoon brainstorming activity, participants took the cards form the morning icebreaker as well as fresh cards and again paired with someone they hadn’t met. Each pair looked over their notes from the morning and wrote out goals, tasks, and projects that could respond to the challenges. By that point, we had three excellent sessions as well as casual conversations over lunch and coffee breaks to further inform potential projects.

I paired with Amy Stevenson from the Microsoft Corporation. Even though her organization is very different from mine (the New York Public Library), we easily identified projects that would address our own challenges as well as the challenges we gathered in the morning. The projects we identified  included the need for a software registry, educational resources, and a clearinghouse to provide discovery for software. We then placed our cards on a butcher paper timeline at the front of the room that spanned from right now to 2022–a six-year time frame with the first full year being 2017.

During the fourth session on partnerships, Jessica Meyerson entered the goals, projects, and ideas from the timeline into a spreadsheet so that for the fifth session we were ready to get road mapping! For this session we broke into three groups to discuss the roadmap and to work on our own group’s copy of the spreadsheet. Our group subdivided into smaller groups who each took a year of the timeline to edit and comment on. While we all focused on our year, conversation between subgroups flowed freely and people felt comfortable moving projects into other years or streamlining ideas across the entire time frame. Links to the master spreadsheet and our three versions can be found here.

Despite having  three separate groups, it was remarkable how much our edited roadmaps aligned with the others. Not surprisingly, most people felt like it was important to front-load steps regarding research, developing platforms for sharing information, and identifying similar projects to form partnerships. Projects in the later years would grow from this earlier research: creating the registry, establishing a coalition, and developing software metadata models.

I found the forum and this session in particular to be energizing. I had attended the talk that Jessica Meyerson and Zach Vowell gave at SAA in 2014 when they first formed the Software Preservation Network. While I was intrigued by the idea of software preservation it seemed a far off concept to me. At that time, there were still many other issues regarding digital archives that seemed far more pressing. When I heard other people’s challenges at the forum, and had space to think about my own,  I realized how important and timely software preservation is. As digital archives best practices are being codified, more and more we are realizing how dependent we are on (often obsolete) software to do our work.

____

Susan Malsbury is the Digital Archivist for The New York Public Library, working with born digital archival material across the three research centers of the Library. In this role, she assists curators with acquisitions; oversees technical services staff handling ingest and processing; and coordinates with public service staff to design and implement access systems for born digital content. Susan has worked with archives at NYPL in various capacities since 2007.

Pathways to Automated Appraisal for Born-Digital Records: An SAA 2016 ERS Breakout Discussion Recap

By Lora Davis
____

In a stroke of brilliant SAA scheduling (or, perhaps, blind chance) the 2016 Electronic Records Section’s annual business meeting immediately followed Thursday afternoon’s session 201 “From 0 to 400 GB: Confronting the Challenges of Born-Digital Photographs.” During this session, panelists Kristen Yarmey, Ed Busch, Chris Prom, Molly Tighe, and Gregory Wiedeman discussed a variety of steps they’ve taken to answer the question “What next?” following the (physical or digital) delivery of born-digital campus photographs to their repositories. I listened intently as Wiedeman recounted how he has employed the API of his campus’ chosen cloud-based online public photo database (SmugMug) to automate the description of born-digital campus photographs at large scale. By reusing the existing photographer-generated descriptive metadata stored in SmugMug, Wiedeman’s campus photographs “describe themselves.” This struck a chord with me as I look forward to my own institution’s upcoming National Digital Stewardship Residency project “Large-Scale Digital Stewardship: Preserving Johns Hopkins University’s Born-Digital Visual History.” But, I wondered, could a similar method be employed to automate appraisal?

As the formal portion of the ERS business meeting concluded, the Section broke into several unconference-style small group discussions. Inspired by the above, I volunteered to lead one on potential methods for automating the appraisal of born-digital records. Breakout participant Tammi Kim kept discussion notes, as a group of about 20 ERS members engaged in discussion. As is often the case, our conversation occasionally deviated from the primary topic of appraisal, but even these tangents proved fruitful. Some of the topics discussed and questions raised include:

  • The differences and distinctions between born-digital appraisal and weeding. Is the goal of minimizing the total size of digital records ingested (say, reducing 50TB of born-digital campus photographs to 10TB) analogous to actually doing appraisal on these records?
  • Could the type of facial recognition software discussed in session 201 be used not only for description purposes, but also to identify VIPs and other photographic content that would inform appraisal decisions?
  • If the record’s creator (say, a campus photographer) assigned rights or permissions metadata to a digital object, might that rights metadata be employed for appraisal in an MPLP-like fashion?
  • What are the differences between photographic and text-based digital records? Is automated, machine-actionable appraisal more likely to succeed with one type of record than another? (E.g. It is easier to search for text in word processing documents and OCRed PDFs than it is to “search” in photographs.)
  • How can “micro-tools” like ArchiveFinder (product mentioned, but I cannot locate a GitHub page) and FileAnalyzer help with the appraisal of large, complex directories of digital files? Additionally, while tools like ExifTool can read, write, and edit embedded technical metadata, how useful is technical metadata to appraisal decisions?
  • How might content creators be brought into appraisal decisions after content has been transferred to a repository? Can we ask creators to enhance or add metadata after the fact?
  • Where does appraisal actually fit in with processing workflows, especially when working with larger files like video and disk images? How do you manage the need for increased storage even at the appraisal stage?
  • What “traditional” approaches to analog appraisal do not necessarily apply to digital? Where does potential future use of records fit in with born-digital appraisal decisions?
  • Are born digital archives even sustainable monetarily or ecologically? Are we building the Tower of Babel? What about server farms and the offset of dirty fuels?

I encourage anyone who attended this discussion to add to this post and/or correct any of my poor-memory-induced misstatements above by commenting below. Similarly, whether you attended the breakout or not, let’s continue this conversation in the comments section!

Lora Davis is Digital Archivist at Johns Hopkins University, where she is tasked with creating, documenting, and managing workflows for acquiring, describing, processing, preserving, and providing access to born‐digital materials. Prior to her appointment at JHU in January 2016, Lora worked at Colgate University and the University of Delaware.

 

Software Preservation Network: Legal and Policy Aspects of Software Preservation

By Brandon Butler

This is the second post in our series on the Software Preservation Network 2016 Forum.
____

Software Preservation Network logoThe legal landscape surrounding software is a morass. (That’s a legal term of art; Black’s Law Dictionary tells us it is synonymous with “dumpster fire” and “Trump rally.”) Do you own the software on your computer? (Some of it, maybe, but some you merely lease.) Can you resell it? (In some cases you cannot.) Can you repair it? (Kinda! Or not….) Can you crack the DRM on software for research? (In a few, narrowly-defined contexts.) When are you bound by a 1000-page software license agreement—when you break a printed seal on a CD-Rom, check a box during an app store checkout process, or ignore the small print on a download website? (Don’t even try to sort that one; anarchy prevails.) Should some software even be copyrightable? (Don’t ask!) And on and on.

Those are just the questions we could ask about software in the abstract. Things get even more interesting when you talk about preserving and providing broad access to specific software titles, especially old ones. And so we did, at the very first session of the Software Preservation Network (SPN) Forum in Atlanta. (Notes and resources for the session are here.)

Our intrepid guides through this fog were Zach Vowell of California Polytechnic University, a Co-PI on the Software Preservation Network project, and Henry Lowood of Stanford University, whose Cabrinety Archive is a well-known trove of software history.

Zach kicked off the discussion with a brief description of the scope of the SPN’s IMLS-funded investigation. He then described what they had learned so far from the advice of Harvard Law School’s Cyberlaw Clinic, which SPN retained to help map the legal landscape. The Clinic identified several areas of law implicated by software preservation, and handicapped their relevance:

  • Copyright – the chief concern by far.
  • Contract law issues – another relatively big issue, given the prevalence of software license agreements.
  • The Digital Millennium Copyright Act (DMCA) – significant where software is protected by DRM (like dongles, encryption, and so on).
  • Trademark dilution – because providing access to old software associated with valuable trademarks might harm the value of the brand. (This has been litigated and seems less worrisome, at least to me.)
  • Patent – a much shorter duration than copyright, and harder to obtain, but some software may be protected by patent.
  • The Computer Fraud and Abuse Act (CFAA) – an anti-hacking statute that mostly addresses unauthorized interaction with servers and networks, so only an issue for software that accesses a third-party server.

Zach suggested a two-tier/hybrid approach had emerged from the Clinic’s analysis:

  1. For older, orphaned, and relatively low-risk works (obscure or out-of-business publishers, etc.), fair use should in principle allow many research and preservation uses. The Clinic said there has not been a case specifically on point, but the general principles of fair use should favor archives.
  2. For newer works, with larger commercial owners still in business, libraries might pursue licenses to allow preservation and research use.

Henry Lowood brought the discussion down from abstract issues to more concrete questions he has faced in working with a substantial collection of software. Chief among them: what should a software deed of gift look like? Well, ideally it should convey copyrights or broad use rights (samples from Stanford treat IP ownership expressly and are in the Google Drive folder for this session, and the ARL Model Deed of Gift also does this well) as well as the physical property. This is often impossible, however, because software, like other media given to libraries, is often donated by mere owners of copies who have no copyrights to convey. For digital objects, copies without rights are especially problematic.

Perhaps the most remarkable part of Lowood’s discussion was his account of the relative futility of searching for copyright owners and asking permission. Like others before him, Lowood reported finding very few possible owners, and getting even fewer useful responses. Indeed, software seems to have a special version of the orphan works problem: even when you find a software publisher, they are often unable to say whether they still own the copyright, citing confusing, long-lost, and short-term agreements with independent developers. Lowood said that they could only find putative owners around 25-30% of the time, and, when found, 50% would disclaim ownership.

Discussion after the panel raised several interesting points. I suggested the use of “quitclaim deeds” that would allow putative owners to grant permission without requiring them to promise they were, indeed, the owners. Others suggested a clearinghouse of information about rights and of documents to use for licensing and transfer of software and IP. Participants also suggested leveraging current licensing negotiations with big firms to obtain perpetual rights (or “life of file” rights—models from video and ebook licensing were discussed), and perhaps rights to older titles. In general, it was agreed that advocacy was needed to put this issue on the radar for university counsel and others involved in negotiating software deals. There was agreement that reading room access should be an absolute floor of access, and that the community should push to adopt “virtual” reading rooms online as a reasonable extension of that practice into the online realm.

____

Brandon Butler is the first Director of Information Policy at the University of Virginia Library. He provides guidance and education to the Library and its user community on intellectual property and related issues, and advocates on the Library’s behalf for provisions in law and policy at the federal, state, local, and campus level that enable broad access to information in support of education and research. Butler is the author or co-author of a range of articles, book chapters, guides, presentations, and infographics about copyright, with a focus on libraries and the fair use doctrine.

Software Preservation Network Series

By Jessica Meyerson and Zach Vowell

This post is the first in our series on the Software Preservation Network 2016 Forum.

____

Software Preservation Network logoThe Software Preservation Network (SPN) 2016 Forum was held Monday, August 1st, 2016 on the Georgia State University campus in downtown Atlanta, Georgia. The SPN 2016 Forum theme, “Action Research: Empowering the Cultural Heritage Community and Mapping Out Next Steps for Software Preservation” reflected the mission of the Software Preservation Network (SPN) — to solicit community input and build consensus around next steps for preserving software at scale as part of the larger effort to ensure long-term access to digital objects. Over the next few weeks, bloggERS will be publishing a series of posts about the Forum, written by attendees. This blog post series speaks to the core beliefs of the Software Preservation Network team:

  • Reflection is essential to our practice. Our Volunteer Blog Post Authors represent a team of Reflective Practitioners — helping us to derive and articulate insights from their embodied experience as Forum attendees and participants.  
  • The practice of critical reflection around software preservation must incorporate members from complementary domains to actively participate in a coordinated effort to develop a sustainable, national strategy for proprietary software licensing and collection — pulling heavily from the collective, embodied experience and expertise of researcher-practitioners in law, archives, libraries, museums, software development and other domains.

Community participation was key to the Forum’s success and proposals were invited on topics including:

  • Current collaborations/consortial efforts
  • Collective software licensing approaches
  • Preservation efforts
  • Emulated or virtualized access options
  • Organizational structures that have worked for other multi-institutional initiatives that may work for software preservation

Our call for proposals received an enthusiastic response — so much so, that we embarked on a happy experiment to push the conversation forward, and closer to actionable next steps. We asked our participants to scrap their original proposal and work together in teams to identify overlaps/intersections across projects AND design an activity to facilitate meaningful engagement among attendees. They all said yes — to ambiguity, to experimentation, and to dedicating more of their time and energy towards making the Forum a valuable experience. The final Forum schedule can be found here, but for a preview of what you’ll be hearing about over the course of this blog post series, below is a list of sessions and their participants:

ICE BREAKER ACTIVITY

SESSION 1 – Legal and Policy Aspects of Software Preservation

  • Henry Lowood – Stanford University
  • Zach Vowell – Software Preservation Network

SESSION 2 – Current Collecting, Processing of and Access to Legacy Software

  • Glynn Edwards – Stanford University
  • Jason Scott – Internet Archive
  • Doug White – National Software Reference Library
  • Paula Jabloner – Computer History Museum

SESSION 3 – Research and Data on Software Preservation

  • Micah Altman – Massachusetts Institute of Technology
  • Jessica Meyerson & Zach Vowell – Software Preservation Network

BRAINSTORMING BREAK

SESSION 4 – Partnerships Forming Around Software Preservation

  • Aliza Leventhal – Sasaki Associates
  • Tim Walsh – Canadian Centre for Architecture
  • Nicholas Taylor – Stanford University
  • Ryder Kouba – The American University in Cairo

SESSION 5 – Community Roadmapping

As you read the posts in this series, if you are inspired to get involved with this growing community of dedicated colleagues, there are several ways to dive in:

  • Submit a use case. We ask, for the sake of easier analysis/comparison (finding common themes across use cases) that you follow this general structure.
  • We are scheduled to send out a version of our software preservation community roadmap on these listservs — please let us know if there are other groups of folks that might be interested.
  • Sign up to participate in the working groups that have been formed around the community roadmap.

____

Zach Vowell has worked with born-digital collection material since 2007, and has served as Digital Archivist at at the Robert E. Kennedy Library, California Polytechnic State University, San Luis Obispo since 2013. At Cal Poly, he is co-primary investigator of the IMLS-funded Software Preservation Network project, and leads digital preservation efforts within Kennedy Library’s Special Collections. Zach has long recognized the need to strategically preserve software in order to provide long-term access to archival collections.

Jessica Meyerson is Digital Archivist at the Briscoe Center for American History at the University of Texas in Austin, where she is responsible for building infrastructure to support digital preservation and access. Jessica earned her M.S.I.S. from the University of Texas at Austin with specializations in digital archives and preservation. She is Co-PI on the IMLS-funded Software Preservation Network – a role that allows her to promote the essential role of software preservation in responsible and effective digital stewardship.