SAA 2019 recap | Session 504: Building Community History Web Archives: Lessons Learned from the Community Webs Program

by Steven Gentry


Introduction

Session 504 focused on the Community Webs program and the experiences of archivists who worked at either the Schomburg Center for Research in Black Culture or the Grand Rapids Public Library. The panelists consisted of Sylvie Rollason-Cass (Web Archivist, Internet Archive), Makiba Foster (Manager, African American Research Library and Cultural Center, formerly the Assistant Chief Librarian, the Schomburg Center for Research in Black Culture), and Julie Tabberer (Head of Grand Rapids History & Special Collections).

Note: The content of this recap has been paraphrased from the panelists’ presentations and all quoted content is drawn directly from the panelists’ presentations.

Session summary

Sylvie Rollason-Cass began with an overview of web archiving and web archives, including:

  • The definition of web archiving.
  • The major components of web archives, including relevant capture tools (e.g. web crawlers, such as Wget or Heritrix) and playback software (e.g. Webrecorder Player).
  • The ARC and WARC web archive file formats. 

Rollason-Cass then noted both the necessity of web archiving—especially due to the web’s ephemeral nature—and that many organizations archiving web content are higher education institutions. The Community Webs program was therefore designed to get more public library institutions involved in web archiving, which is critical given that these institutions often collect unique local and/or regional material.

After a brief description of the issues facing public libraries and public library archives—such as a lack of relevant case studies—Rollason-Cass provided information about the institutions that joined the program, the resources provided by the Internet Archive as part of the program (e.g. a multi-year subscription to Archive-It), and the project’s results, including:

  • The creation of more than 200 diverse web archives (see the Remembering 1 October web archive for one example).
  • Institutions’ creation of collection development policies pertaining specifically to web archives, in addition to other local resources.
  • The production of an online course entitled “Web Archiving for Public Libraries.” 
  • The creation of the Community Webs program website.

Rollason-Cass concluded by noting that although some issues—such as resource limitations—may continue to limit public libraries’ involvement in web archiving, the Community Webs program has greatly increased the ability for other institutions to confidently archive web content. 

Makiba Foster then addressed her experiences as a Community Webs program member. After a brief description of the Schomburg Center, its mission, and unique place where “collections, community, and current events converge”, Foster highlighted the specific reasons for becoming more engaged with web archiving:

  • Like many other institutions, the Schomburg Center has long collected clippings files—and web archiving would allow this practice to continue.
  • Materials that document the experiences of the black community are prominent on the World Wide Web.
  • Marginalized community members often publish content on the Web.

Foster then described the #HashtagSyllabusMovement collection, a web archive of educational material “related to publicly produced and crowd-sourced content highlighting race, police violence, and other social justice issues within the Black community.” Foster had known this content could be lost, so—even before participating in the Community Webs program—she began collecting URLs. Upon joining the Community Webs program, Foster used Archive-It to archive various relevant materials (e.g. Google docs, blog posts, etc.) dated from 2014 to the current. Although some content was lost, the #HashtagSyllabusMovement collection both continues to grow—especially if, as Foster hopes, it begins to include international educational content—and shows the value of web archiving. 

In her conclusion, Foster addressed various success, challenges, and future endeavors:

  • Challenges:
    • Learning web archiving technology and having confidence in one’s decisions.
    • Curating content for the Center’s five divisions.
    • “Getting institutional support.”
  • Future Directions:
    • A new digital archivist will work with each division to collect and advocate for web archives.
    • Considering how to both do outreach for and catalog web archives.
    • Ideally, working alongside community groups to help them implement web archiving practices.

The final speaker, Julie Tabberer, addressed the value of public libraries’ involvement in web archives. After a brief overview of the Grand Rapids Public Library, the necessity of archives, and the importance of public libraries’ unique collecting efforts, Tabberer posited the following question: “Does it matter if public libraries are doing web archiving?” 

To test her hypothesis that “public libraries document mostly community web content [unlike academic archives],” Tabberer analyzed the seed URLs of fifty academic and public libraries to answer two specific questions:

  • “Is the institution crawling their own website?”
  • “What type of content [e.g. domain types] is being crawled [by each institution]?”

After acknowledging some caveats with her sampling and analysis—such as the fact that data analysis is still ongoing and that only Archive-It websites were examined—Taberrer showed audience members several graphics that revealed academic libraries 1.) Typically crawled their websites more so than public libraries and 2.) Captured more academic websites than public libraries.

Tabberer then concluded with several questions and arguments for the audience to consider:

  • In addition to encouraging more public libraries to archive web content—especially given their values of access and social justice—what other information institutions are underrepresented in this community?
  • Are librarians and archivists really collecting content that represents the community?
  • Even though resource limitations are problematic, academic institutions must expand their web archiving efforts.

Steven Gentry currently serves a Project Archivist at the Bentley Historical Library. His responsibilities include assisting with accessioning efforts, processing complex collections, and building various finding aids. He previously worked at St. Mary’s College of Maryland, Tufts University, and the University of Baltimore.

Advertisements

SAA 2019 recap | Session 204: Demystifying the Digital: Providing User Access to Born-Digital Records in Varying Contexts

by Steven Gentry


Introduction

Session 204 addressed how three dissimilar institutions—North Carolina State University (NCSU), the Wisconsin Historical Society (WHS), and the Canadian Centre for Architecture (CCA)—are connecting their patrons with born-digital archival content. The panelists consisted of Emily Higgs (NCSU Libraries Fellow, North Carolina State University), Hannah Wang (Electronic Records & Digital Preservation Archivist, Wisconsin Historical Society), and Stefana Breitwieser (Digital Archivist, Canadian Centre for Architecture). In addition, Kelly Stewart (Director of Archival and Digital Preservation Services, Artefactual Systems) briefly spoke about the development of SCOPE, the tool featured in Breitwieser’s presentation.

Note: The content of this recap has been paraphrased from the panelists’ presentations and all quoted content is drawn directly from the panelists’ presentations.

Session summary

Emily Higgs’s presentation focused on the different ways that NCSU’s Special Collections Research Center (SCRC) staff enhance access to their born-digital archives. After a brief overview of NCSU’s collections, Higgs first described their lightweight workflow for bridging researchers and requested digital content, a process that involves SCRC staff accessing an administrator account on a reading room Macbook; transferring copies of requested content to a read-only folder shared with a researcher account; and limiting the computer’s overall  capabilities, such as restricting its internet and ports (the latter is accomplished via Endpoint Protector Basic). Should a patron want copies of the material, they simply drag and drop those resources into another folder for SCRC staff to review.

Higgs then described an experimental Named Entity Recognition (NER) workflow that employs spaCy and which allows archivists to better describefiles in NCSU’s finding aids.The workflow employs a Jupyter notebook (see her Github repository for more information) to automate the following process:

  • “Define directory [to be analyzed by spaCy].”
  • “Walk directory…[to retrieve] text files [such as PDFs].”
  • “Extract text (textract).”
  • “Process and NER (spaCy).”
  • “Data cleaning.”
  • “Ranked output of entities (csv) [which is based on the number of times a particular name appears in the files].”

Once the process is completed, the most frequent 5-10 names are placed in an ArchivesSpace scope and content note. Higgs concluded by emphasizing this workflow’s overall ease of use and noting that—in the future—staff will integrate application programming interfaces (APIs) to enhance the workflow’s efficiency.

Next to speak was Hannah Wang, who addressed how Wisconsin Historical Society (WHS) has made its born-digital state government records more accessible. Wang began her presentation by discussing the Wisconsin State Preservation of Electronic Records Project (WiSPER) Project and its two goals:

  • “Ingest a minimum of 75 GB of scheduled [and processed] electronic records from state agencies.”
  • “Develop public access interface.” 

And explained the reasons behind Preservica’s selection:

  • WHS’s lack of significant IT support meant an easily implementable tool was preferred over open-source and/or homegrown solutions.
  • Preservica allowed WHS to simultaneously preserve and provide (tiered) access to digital records.
  • Preservica has a public-facing WordPress site, which fulfilled the second WiSPER grant objective.

Wang then addressed how WHS staff appropriately restricted access to digital records by placing records into one of three groupings:

  • “Content that has a legal restriction.”
  • “Content that requires registration and onsite viewing [such as email addresses].”
  • “Open, unrestricted content.” 

WHS staff actually achieved this goal by employing different methods to locate and restrict digital records:

  • For identification: 
    • Reviewing “[record] retention schedules…[and consulting with] agency [staff who would notify WHS personnel of sensitive content].” 
    • Using resources like bulk extractor
    • Reading records if necessary.
  • For restricting records:
    • Employing scripts—such as batch scripts—to transfer and restrict individual files and whole SIPs.

Wang demonstrated how WHS makes its restricted content accessible via Preservica:

  • “Content that has a legal restriction”: Only higher levels of description can be searched by external researchers, although patrons have information concerning how to access this content.
  • “Content that requires registration and onsite viewing”: Individual files can be located by external researchers, although researchers still need to visit the WHS to view materials. Again, information concerning how to access this content is provided.

Wang concluded her presentation by describing efforts to link materials in Preservica with other descriptive resources, such as WHS’s MARC records; expressing hope that WHS will integrate Preservica with their new ArchivesSpace instance; and discussing the usability testing that resulted in several upgrades to the WHS Electronic Records Portal prior to its release.

The penultimate speaker was Stefana Breitwieser, who spoke about SCOPE and its features. Breitwieser first discussed the “Archaeology of the Digital” project and how—through this project—the CCA acquired the bulk of its digital content, more than “660,000 files (3.5 TB).” In order to better enhance access to these resources, Breitwieser stressed that two problems had to be addressed:

  • “[A] long access workflow [that involved twelve steps].”
  • “Low discoverability.” Breitwieser stressed some issues with their current access tool included its inability to search across collections and its non-usage of metadata in Archivematica.

CCA staff ultimately decided on working alongside Artefactual Systems to build SCOPE, “an access interface for DIPs from Archivematica.” The goals of this project included:

  • “Direct user access to access copies of digital archives from [the] reading room.”
  • “Minimal reference intervention [by CCA staff].”
  • “Maximum discoverability using [granular] Archivematica-generated metadata.”
  • “Item-level searching with filtering and facetting.” 

To illustrate SCOPE’s capabilities, Breitwieser demonstrated the tool and its features (e.g. its ability to download DIPs) for the audience. During the presentation, she emphasized that although incredibly useful, SCOPE will ultimately supplement—rather than replace—the CCA’s finding aids. 

Breitwieser concluded by describing the CCA’s reading room—which include computers that possess a variety of useful software (e.g. computer-aided design, or CAD, software) and, like NCSU’s workstation, only limited technical capabilities—and highlighting CCA’s much simpler 5-step access workflow.

The final speaker, Kelly Stewart, spoke of SCOPE’s development process. Heavily emphasized during this presentation were Artefactual’s use of CCA user stories to develop “feature files”—or “logic-based, structured descriptions” of these user stories—that were used by Artefactual staff to build SCOPE. After its completion, Stewart noted that “user acceptance testing” occurred repeatedly until SCOPE was deemed ready. Stewart concluded her presentation with the hope that other archivists will implement and improve upon SCOPE.


Steven Gentry currently serves a Project Archivist at the Bentley Historical Library. His responsibilities include assisting with accessioning efforts, processing complex collections, and building various finding aids. He previously worked at St. Mary’s College of Maryland, Tufts University, and the University of Baltimore.

ml4arc – Machine Learning, Deep Learning, and Natural Language Processing Applications in Archives

by Emily Higgs


On Friday, July 26, 2019, academics and practitioners met at Wilson Library at UNC Chapel Hill for “ml4arc – Machine Learning, Deep Learning, and Natural Language Processing Applications in Archives.” This meeting featured expert panels and participant-driven discussions about how we can use natural language processing – using software to understand text and its meaning – and machine learning – a branch of artificial intelligence that learns to infer patterns from data – in the archives.

The meeting was hosted by the RATOM Project (Review, Appraisal, and Triage of Mail).  The RATOM project is a partnership between the State Archives of North Carolina and the School of Information and Library Science at UNC Chapel Hill. RATOM will extend the email processing capabilities currently present in the TOMES software and BitCurator environment, developing additional modules for identifying and extracting the contents of email-containing formats, NLP tasks, and machine learning approaches. RATOM and the ml4arc meeting are generously supported by the Andrew W. Mellon Foundation.

Presentations at ml4arc were split between successful applications of machine learning and problems that could potentially be addressed by machine learning in the future. In his talk, Mike Shallcross from Indiana University identified archival workflow pain points that provide opportunities for machine learning. In particular, he sees the potential for machine learning to address issues of authenticity and integrity in digital archives, PII and risk mitigation, aggregate description, and how all these processes are (or are not) scalable and sustainable. Many of the presentations addressed these key areas and how natural language processing and machine learning can lend aid to archivists and records managers. Additionally, attendees got to see presentations and demonstrations from tools for email such as RATOM, TOMES, and ePADD. Euan Cochrane also gave a talk about the EaaSI sandbox and discussed potential relationships between software preservation and machine learning.

The meeting agenda had a strong focus on using machine learning in email archives; collecting and processing emails is a large encumbrance in many archives that can stand to benefit greatly from machine learning tools. For example, Joanne Kaczmarek from the University of Illinois presented a project processing capstone email accounts using an e-discovery and predictive coding software called Ringtail. In partnership with the Illinois State Archives, Kaczmarek used Ringtail to identify groups of “archival” and “non-archival” emails from 62 capstone accounts, and to further break down the “archival” category into “restricted” and “public.” After 3-4 weeks of tagging training data with this software, the team was able to reduce the volume of emails by 45% by excluding “non-archival” messages, and identify 1.8 million emails that met the criteria to be made available to the public. Manually, this tagging process could have easily taken over 13 years of staff time.

After the ml4arc meeting, I am excited to see the evolution of these projects and how natural language processing and machine learning can help us with our responsibilities as archivists and records managers. From entity extraction to PII identification, there are myriad possibilities for these technologies to help speed up our processes and overcome challenges.


Emily Higgs is the Digital Archivist for the Swarthmore College Peace Collection and Friends Historical Library. Before moving to Swarthmore, she was a North Carolina State University Libraries Fellow. She is also the Assistant Team Leader for the SAA ERS section blog.


OSS4Pres 2.0: Developing functional requirements/features for digital preservation tools

By Heidi Elaine Kelly

____

This is the final post in the bloggERS series describing outcomes of the #OSS4Pres 2.0 workshop at iPRES 2016, addressing open source tool and software development for digital preservation. This post outlines the work of the group tasked with “ developing functional requirements/features for OSS tools the community would like to see built/developed (e.g. tools that could be used during ‘pre-ingest’ stage).” 

The Functional Requirements for New Tools and Features Group of the OSS4Pres workshop aimed to write user stories focused on new features that developers can build out to better support digital preservation and archives work. The group was largely comprised of practitioners who work with digital curation tools regularly, and was facilitated by Carl Wilson of the Open Preservation Foundation. While their work largely involved writing user stories for development, the group also came up with requirement lists for specific areas of tool development, outlined below. We hope that these lists help continue to bridge the gap between digital preservation professionals and open source developers by providing a deeper perspective of user needs.

Basic Requirements for Tools:

  • Mostly needed for Mac environment
  • No software installation on donor computer
  • No software dependencies requiring installation (e.g., Java)
  • Must be GUI-based, as most archivists are not skilled with the command line
  • Graceful failure

Descriptive Metadata Extraction Needs (using Apache Tika):

  • Archival date
  • Author
  • Authorship location
  • Subject location
  • Subject
  • Document type
  • Removal of spelling errors to improve extracted text

Technical Metadata Extraction Needs:

  • All datetime information available should be retained (minimum of LastModified Date)
  • Technical manifest report
  • File permissions and file ownership permissions
  • Information about the tool that generated the technical manifest report:
    • tool – name of the tool used to gather the disk image
    • tool version – the version of the tool
    • signature version – if the tool uses ‘signatures’ or other add-ons, e.g. which virus scanner software signature – such as signature release July 2014 or v84
    • datetime process run – the datetime information of when the process ran (usually tools will give you when the process was completed) – for each tool that you use

Data Transfer Tool Requirements:

  • Run from portable external device
  • Bag-It standard compliant (build into a “bag”)
  • Able to select a subset of data – not disk image the whole computer
  • GUI-based tool
  • Original file name (also retained in tech manifest)
  • Original file path (also retained in tech manifest)
  • Directory structure (also retained in tech manifest)
  • Address these issues in filenames (record the actual filename in the tech manifest): Diacritics (e.g. naïve ), Illegal characters ( \ / : * ? “ < > | ), Spaces, M-dashes, n-dashes, Missing file extensions, Excessively long file and folder names, etc
  • Possibly able to connect to “your” FTP site/cloud thingy and send the data there when ready for transfer

Checksum Verification Requirements:

  • File-by-file checksum hash generation
  • Ability to validate the contents of the transfer

Reporting Requirements:

  • Ability to highlight/report on possibly problematic files/folders in a separate file

Testing Requirements:

  • Access to a test corpora, with known issues, to test tool

Smart Selection & Appraisal Tool Requirements:

  • DRM/TPMs detection
  • Regular expressions/fuzzy logic for finding certain terms – e.g. phone numbers, security numbers, other predefined personal data
  • Blacklisting of files – configurable list of blacklist terms
  • Shortlisting a set of “questionable” files based on parameters that could then be flagged for a human to do further QA/QC

Specific Features Needed by the Community:

  • Gathering/generating quantitative metrics for web harvests
  • Mitigation strategies for FFMPEG obsolescence
  • TESSERACT language functionality

____

heidi-elaine-kellyHeidi Elaine Kelly is the Digital Preservation Librarian at Indiana University, where she is responsible for building out the infrastructure to support long-term sustainability of digital content. Previously she was a DiXiT fellow at Huygens ING and an NDSR fellow at the Library of Congress.

OSS4Pres 2.0: Sharing is Caring: Developing an online community space for sharing workflows

By Sam Meister

____

This is the third post in the bloggERS series describing outcomes of the #OSS4Pres 2.0 workshop at iPRES 2016, addressing open source tool and software development for digital preservation. This post outlines the work of the group tasked with “developing requirements for an online community space for sharing workflows, OSS tool integrations, and implementation experiences” See our other posts for information on the groups that focused on feature development and design requirements for FOSS tools.

Cultural heritage institutions, from small museums to large academic libraries, have made significant progress developing and implementing workflows to manage local digital curation and preservation activities. Many institutions are at different stages in the maturity of these workflows. Some are just getting started, and others have had established workflows for many years. Documentation assists institutions in representing current practices and functions as a benchmark for future organizational decision-making and improvements. Additionally, sharing documentation assists in creating cross-institutional understanding of digital curation and preservation activities and can facilitate collaborations amongst institutions around shared needs.

One of the most commonly voiced recommendations from iPRES 2015 OSS4PRES workshop attendees was the desire for a centralized location for technical and instructional documentation, end-to-end workflows, case studies, and other resources related to the installation, implementation, and use of OSS tools. This resource could serve as a hub that would enable practitioners to freely and openly exchange information, user requirements, and anecdotal accounts of OSS initiatives and implementations.

At the OSS4Pres 2.0 workshop, the group of folks looking at developing an online space for sharing workflows and implementation experience started by defining a simple goal and deliverable for the two hour session:

Develop a list of minimal levels of content that should be included in an open online community space for sharing workflows and other documentation

The group the began a discussion on developing this list of minimal levels by thinking about the potential value of user stories in informing these levels. We spent a bit of time proposing a short list of user stories, just enough to provide some insight into the basic structures that would be needed for sharing workflow documentation.

User stories

  • I am using tool 1 and tool 2 and want to know how others have joined them together into a workflow
  • I have a certain type of data to preserve and want to see what workflows other institutions have in place to preserve this data
  • There is a gap in my workflow — a function that we are not carrying out — and I want to see how others have filled this gap
  • I am starting from scratch and need to see some example workflows for inspiration
  • I would like to document my workflow and want to find out how to do this in a way that is useful for others
  • I would like to know why people are using particular tools – is there evidence that they tried another tool, for example, that wasn’t successful?

The group then proceeded to define a workflow object as a series of workflow steps with its own attributes, a visual representation, and organizational context:

Workflow step
Title / name
Description
Tools / resources
Position / role

Visual workflow diagrams / model
Organizational Context
            Institution type
            Content type

Next, we started to draft out the different elements that would be part of an initial minimal level for workflow objects:

Level 1:

Title
Description
Institution / organization type
Contact
Content type(s)
Status
Link to external resources
Download workflow diagram objects
Workflow concerns / reflections / gaps

After this effort the group focused on discussing next steps and how an online community space for sharing workflows could be realized. This discuss led towards pursuing the expansion of COPTR to support sharing of workflow documentation. We outlined a roadmap for next steps toward pursuing this goal:

  • Propose / approach COPTR steering group on adding workflows space to COPTR
  • Develop home page and workflow template
  • Add examples
  • Group review
  • Promote / launch
  • Evaluation

The group has continued this work post-workshop and has made good progress setting up a Community Owned Workflows section to COPTR and developing an initial workflow template. We are in the midst of creating and evaluating sample workflows to help with revising and tweaking as needed. Based on this process we hope to launch and start promoting this new online space for sharing workflows in the months ahead. So stay tuned!

____

meister_photoSam Meister is the Preservation Communities Manager, working with the MetaArchive Cooperative and BitCurator Consortium communities. Previously, he worked as Digital Archivist and Assistant Professor at the University of Montana. Sam holds a Master of Library and Information Science degree from San Jose State University and a B.A. in Visual Arts from the University of California San Diego. Sam is also an Instructor in the Library of Congress Digital Preservation Education and Outreach Program.

 

Building Bridges and Filling Gaps: OSS4Pres 2.0 at iPRES 2016

By Heidi Elaine Kelly and Shira Peltzman

____

This is the first post in a bloggERS series describing outcomes of the #OSS4Pres 2.0 workshop at iPRES 2016.

Organized by Sam Meister (Educopia), Shira Peltzman (UCLA), Carl Wilson (Open Preservation Foundation), and Heidi Kelly (Indiana University), OSS4PRES 2.0 was a half-day workshop that took place during the 13th annual iPRES 2016 conference in Bern, Switzerland. The workshop aimed to bring together digital preservation practitioners, developers, and administrators in order to discuss the role of open source software (OSS) tools in the field.

Although several months have passed since the workshop wrapped up, we are sharing this information now in an effort to raise awareness of the excellent work completed during this event, to continue the important discussion that took place, and to hopefully broaden involvement in some of the projects that developed. First, however, a bit of background: The initial OSS4PRES workshop was held at iPRES 2015. Attended by over 90 digital preservation professionals from all areas of the open source community, individuals reported on specific issues related to open source tools, which were followed by small group discussions about the opportunities, challenges, and gaps that they observed. The energy from this initial workshop led to both the proposal of a second workshop, as well as a report that was published in Code4Lib Journal, OSS4EVA: Using Open-Source Tools to Fulfill Digital Preservation Requirements.

The overarching goal for the 2016 workshop was to build bridges and fill gaps within the open source community at large. In order to facilitate a focused and productive discussion, OSS4PRES 2.0 was organized into three groups, each of which was led by one of the workshop’s organizers. Additionally, Shira Peltzman floated between groups to minimize overlap and ensure that each group remained on task. In addition to maximizing our output, one of the benefits of splitting up into groups was that each group was able to focus on disparate but complementary aspects of the open source community.

Develop user stories for existing tools (group leader: Carl Wilson)

Carl’s group was comprised principally of digital preservation practitioners. The group scrutinized existing pain points associated with the day-to-day management of digital material, identified tools that had not yet been built that were needed by the open source community, and began to fill this gap by drafting functional requirements for these tools.

Define requirements for online communities to share information about local digital curation and preservation workflows (group leader: Sam Meister)

With an aim to strengthen the overall infrastructure around open source tools in digital preservation, Sam’s group focused on the larger picture by addressing the needs of the open source community at large. The group drafted a list of requirements for an online community space for sharing workflows, tool integrations, and implementation experiences, to facilitate connections between disparate groups, individuals, and organizations that use and rely upon open source tools.

Define requirements for new tools (group leader: Heidi Kelly)

Heidi’s group looked at how the development of open source digital preservation tools could be improved by implementing a set of minimal requirements to make them more user-friendly. Since a list of these requirements specifically for the preservation community had not existed previously, this list both fills a gap and facilitates the building of bridges, by enabling developers to create tools that are easier to use, implement, and contribute to.

Ultimately OSS4PRES 2.0 was an effort to make the open source community more open and diverse, and in the coming weeks we will highlight what each group managed to accomplish towards that end. The blog posts will provide an in-depth summary of the work completed both during and since the event took place, as well as a summary of next steps and potential project outcomes. Stay tuned!

____

peltzman_140902_6761_barnettShira Peltzman is the Digital Archivist for the UCLA Library where she leads the development of a sustainable preservation program for born-digital material. Shira received her M.A. in Moving Image Archiving and Preservation from New York University’s Tisch School of the Arts and was a member of the inaugural class of the National Digital Stewardship Residency in New York (NDSR-NY).

heidi-elaine-kellyHeidi Elaine Kelly is the Digital Preservation Librarian at Indiana University, where she is responsible for building out the infrastructure to support long-term sustainability of digital content. Previously she was a DiXiT fellow at Huygens ING and an NDSR fellow at the Library of Congress.

The Best of BDAX: Five Themes from the 2016 Born Digital Archiving & eXchange

By Kate Tasker

———

Put 40 digital archivists, programmers, technologists, curators, scholars, and managers in a room together for three days, give them unlimited cups of tea and coffee, and get ready for some seriously productive discussions.

This magic happened at the Born Digital Archiving & eXchange (BDAX) unconference, held at Stanford University on July 18-20, 2016. I joined the other BDAX attendees to tackle the continuing challenges of acquiring, discovering, delivering and preserving born-digital materials.

The discussions highlighted five key themes to me:

1) Born-digital workflows are, generally, specific

We’re all coping with the general challenges of born-digital archiving, but we’re encountering individual collections which need to be addressed with local solutions and resources. BDAXers generously shared examples of use cases and successful workflows, and, although these guidelines couldn’t always translate across diverse institutions (big/small, private/public, IT help/no IT help), they’re a foundation for building best practices which can be adapted to specific needs.

2) We need tools

We need reliable tools that will persist over time to help us understand collections, to record consistent metadata and description, and to discover the characteristics of new content types. Project demos including ePADD, BitCurator Access, bwFLA – Emulation as a Service, UC Irvine’s Virtual Reading Room, the Game Metadata and Citation Project, and the University of Michigan’s ArchivesSpace-Archivematica-DSpace Integration project gave encouragement that tools are maturing and will enable us to work with more confidence and efficiency. (Thanks to all the presenters!)

3) Smart people are on this

A lot of people are doing a lot of work to guide and document efforts in born-digital archiving. We need to share these efforts widely, find common points of application, and build momentum – especially for proposed guidelines, best guesses, and continually changing procedures. (We’re laying this train track as we go, but everybody can get on board!) A brilliant resource from BDAX is a “Topical Brain Dump” Google doc where everyone can share tips related to what we each know about born-digital archives (hat-tip to Kari Smith for creating the doc, and to all BDAXers for their contributions).

4) Talking to each other helps!

Chatting with BDAX colleagues over coffee or lunch provided space to compare notes, seek advice, make connections, and find reassurance that we’re not alone in this difficult endeavor. Published literature is continually emerging on born-digital archiving topics (for example, born-digital description), but if we’re not quite ready to commit our own practices to paper magnetic storage media, then informal conversations allow us to share ideas and experiences.

5) Born-digital archiving needs YOU

BDAX attendees brainstormed a wide range of topics for discussion, illustrating that born-digital archiving collides with traditional processes at all stages of stewardship, from appraisal to access. All of these functions need to be re-examined and potentially re-imagined. It’s a big job (*understatement*) but brings with it the opportunity to gather perspective and expertise from individuals across different roles. We need to make sure everyone is invited to this party.

How to Get Involved

So, what’s next? The BDAX organizers and attendees recognize that there are many, many more colleagues out there who need to be included in these conversations. Continuing efforts are coalescing around processing levels and metrics for born-digital collections; accurately measuring and recording extent statements for digital content; and managing security and storage needs for unprocessed digital accessions. Please, join in!

You can read extensive notes for each session in this shared Google Drive folder (yes, we did talk about how to archive Google docs!) or catch up on Tweets at #bdax2016.

To subscribe to the BDAX email listserv, please email Michael Olson (mgolson[at]stanford[dot]edu), or, to join the new BDAX Slack channel, email Shira Peltzman (speltzman[at]library[dot]ucla[dot]edu).

———

ktasker-profile-picKate Tasker works with born-digital collections and information management systems at The Bancroft Library, University of California, Berkeley. She has an MLIS from San Jose State University and is a member of the Academy of Certified Archivists. Kate attended Capture Lab in 2015 and is currently designing workflows to provide access to born-digital collections.

Digital Preservation in the News: Copyright and Abandonware

Heads up for anyone with an interest in video game preservation…

The Electronic Frontier Foundation (EFF) is seeking an exemption to the Prohibition on Circumvention of Copyright Protection Systems for Access Control Technologies (17 U.S.C. § 1201(a)(1)). The exemption is proposed for users who want to modify “videogames that are no longer supported by the developer, and that require communication with a server,” in order to serve player communities who want to keep maintain the functionality of their games–as well as “archivists, historians, and other academic researchers who preserve and study video games[.]” The proposal emphasizes that the games impacted by this exemption would not be persistent worlds (think World of Warcraft or Eve Online), but rather those games “that must communicate with a remote computer (a server) in order to enable core functionality, and that are no longer supported by the developer.”

The exemption is opposed by the Entertainment Software Association, representing major American (ESA) game publishers and platform providers. The ESA response to the EFF proposal argues that the scope of the proposed exemption is too broadly defined, and that “permitting circumvention of video game access controls would increase piracy, significantly reduce users’ options to access copyrighted works on platforms and devices, anddecrease the value of these works for copyright owners[.]”

In addition to the comments by the EFF, ESA, and their respective supporters, there are also a number of articles which go into much greater detail on this issue.

What do you think? Should there be a legal exemption for modifying unsupported (but still copyright-protected) video games to ensure their enduring usability?

The latest round of public comment on the proposed exemption closes on May 1, 2015. To voice your opinion, follow this link to Copyright.gov, where you can learn more and submit a comment voicing your opinion on this and other existing proposals.

Martin Gengenbach is an Assistant Archivist at the Gates Archive.