By Tim Walsh
This is the third post in our series on the Software Preservation Network 2016 Forum.
Session Two of the Software Preservation Network Forum (Google folder here), held at Georgia State University on August 1, 2016, focused on the activities of institutions already collecting and providing access to software. Despite the differences in institutional practice and use cases, the three presenters highlighted a number of common themes and challenges.
The first presenter to speak was Glynn Edwards, Head of Technical Services for Special Collections at Stanford University. Edwards spoke largely about the question of metadata for software collections, identifying the need for standard vocabularies based on the metadata that is useful for access, discovery, delivery (e.g. in emulation-as-a-service platforms), and preservation of software titles. Stanford’s own experiences with software collections such as the Cabrinety Collection and the Richard Bartle papers demonstrate a need for new metadata schemas for describing software; MODS, used within Stanford’s digital repository, has proven insufficient. Edwards highlighted other work going on in this area, including the Game Metadata and Citation Project (GAMECIP), which cultural heritage institutions can and should coordinate with in defining metadata schemas for software. Regarding access, Edwards reiterated Henry Lowood’s point from earlier in the Forum that Stanford defaults to reading room-only access to software titles in its collections, reaching out to rights holders to gain permission for broader distribution models such as worldwide internet access.
Glynn was followed by Doug White, Computer Scientist at the National Institute of Standards and Technology (NIST). White and his team at NIST maintain the National Software Reference Library (NSRL), a software collection initially designed to aid law enforcement in forensic investigations. The NSRL publishes metadata about titles as the NIST Standard Reference Data Set, and has developed other tools such as SWIDTags, which can aid in software ID tagging. White discussed some of the potential applications of the NSRL for cultural heritage, including in identification and cataloging of software titles. White stressed that the NSRL is open to collaborating with other collections and to providing access to titles in the NSRL collection to researchers.
The session’s final presenter was Paula Jabloner, Director of Digital Collections at the Computer History Museum (CHM). The CHM has long collected physical artifacts and is beginning to actively process a historic software collection acquired over many years. In some cases, the CHM has received software titles with perpetual licensing agreements rather than deeds of gift. The Museum’s new Center for Software History will explore software through a curatorial lens while expanding access and preservation activities. The CHM has also openly released source code for several key pieces of early software, including MS-DOS, early Photoshop, Apple II DOS, and MacPaint. Some of these source code collections have had hundreds of thousands of page views, but it’s not yet known what users are doing with the code. Jabloner stressed that the CHM also has issues describing software with existing metadata schemas, and that shared schemas to enable interoperability of metadata are much needed.
Following the individual sessions—and Jabloner’s comment that the Computer History Museum’s internal use cases for software collections (exhibits, curatorial research, etc.) are much clearer at this point than public or general access use cases—participants broke out in groups to begin brainstorming and developing use cases for software collections based on their institutional contexts, as well as to examine issues of metadata in context.
As a whole, Session Two of the SPN Forum was extremely informative and interesting. The need for standard metadata practice surrounding software was clearly demonstrated, as was the need to collaborate with other communities working on similar questions around software, such as game preservation and forensic investigation. Collecting institutions may also need to rethink their models of ownership and rights transfer for software titles (as the Computer History Museum has done) by utilizing perpetual licensing. Finally, the session made clear that we as archivists, librarians, and curators need a much clearer idea of who our users are and what the use cases are for software collections—a project that will be continued by Software Preservation Network volunteers in the months to come!
Tim Walsh is the Digital Archivist at the Canadian Centre for Architecture in Montreal, Quebec, where he develops and oversees workflows for processing, preservation, and access of born-digital materials, including computer-aided design (CAD) and other software-dependent file formats.