by Nicole Shibata
San José is in many ways an apt location for a tech-centered library conference like Code4Lib. It is the largest city in Santa Clara Valley (aka Silicon Valley) and home to San Jose State University, one of the biggest library science programs in the country. Yet the tone of the 14th annual Code4Lib conference, which convened on February 19-22, 2019, was cautious and at times critical of the “big tech” landscape. In her opening keynote, Sarah Roberts, Assistant Professor of Information Studies at UCLA, talked about her research on social media content moderation. She said that while this work is deemed critical by social media companies to manage lewd or disturbing content, it is also emotionally taxing, low-paying, and executed by a mostly invisible global labor force. In keeping this work hidden, consumers are led to believe that social media content is either unmediated, or that content moderation is somehow automated. This push towards transparency and openness—in how we manipulate our code, technologies, content, and even our labor practices—was a recurring theme throughout the conference.
As a first-time Code4Lib attendee with very little programming experience, seeing session titles like “Building REST API-backed Single Page Applications (SPAs) with Vue.js” and “Ringers of Jupyter: The Jupyter Notebook As Faux Web App,” was pretty intimidating. While I did find myself Googling a lot of acronyms throughout the conference, I found that there was a balanced mix of talks for every skill level, and even the more programming-heavy sessions were surprisingly accessible. Pro-tip: follow one of the Code4Lib backchannels like Twitter or Slack for fun, thoughtful side discussions throughout the conference.
There were a number of archivists and archives-adjacent folks attending the conference and a handful of interesting sessions related to digital archives. In a talk entitled “Natural Language Processing for Discovery of Born-Digital Records,” NCSU Libraries Fellow Emily Higgs discussed her exploration of named entity recognition (NER) to aid in describing digital collections. Using the open source natural language processing software, spaCy, Higgs extracted personal names to a CSV file, with entities ranked by frequency, and included the top five to ten names in the Scope and Content section of the finding aid. She also tested a discovery tool, Open Semantic Desktop Search, to enable researchers to more easily browse through a digital collection using the reading room computer. She noted that while it offered faceted browsing as well as fuzzy and semantic search capabilities, the major drawback was the long indexing time for larger digital collections.
In the realm of web-archiving, Ilya Kreymer of Rhizome presented a demo of Webrecorder, a set of free and open source tools for creating and viewing web archives. Funded by two Mellon Foundation grants, Webrecorder is a browser-based application that focuses on capturing high-fidelity web archives. Unlike the more traditional web crawlers, Webrecorder is meant to be used as a more curated approach to web archiving—think quality over quantity. In his demo, Kreymer quickly and easily archived audio files from a SoundCloud library as well as the most recent Code4Lib conference hashtag posts on Twitter. One of Webrecorder’s most impressive features is its ability to emulate legacy browsers to record things like flash-based websites. Webrecorder has a lot going for it—it’s free and easy to use, with an attractive and intuitive interface. While Kreymer was quick to point out that they haven’t solved web-archiving, it was nonetheless exciting to see a concentrated effort towards refining it.
As a metadata librarian, I am probably a little biased here, but one of the most exciting talks of the conference was given by Dhanushka Samarakoon and Harish Maringanti of the University of Utah’s Marriott Library. Inspired by a story they heard on NPR about PoetiX, a sonnet-writing competition where judges are asked to determine if a sonnet was written by man or machine, Samarakoon and Maringati began to think about the implications of machine learning on metadata creation. Recognizing that metadata is typically where the bottleneck occurs in digital library workflows, they wanted to explore how machine learning technology might simplify descriptive metadata creation for historical image collections. To do this they created a model using data from Imagenet, a database of over 14 million images designed for use in visual object recognition software research; and over 470 photographs with high quality human-generated metadata from their own digital library collections. Once this data was introduced into a pre-trained neural network, they ran a collection of photographs through the system to see how well the model worked. It wasn’t perfect—for instance, a photo of a man standing next to a cow was described as “Mary Jane standing by a cow,” apparently due to the many people identified as “Mary Jane” in the original digital library dataset. However, it was exciting to see the possibilities of AI in image analysis and the implications this might have for future metadata automation.
At one point during the conference someone took a quick visual poll of how many first-time attendees were in the audience. There were a lot of us. But there were also a lot of Code4Lib veterans. During a lightning talk about the origin of the conference, Karen Coombs, Ryan Wick, and Roy Tennant recalled wanting to create a conference with a “no spectators” motto—where attendees had ample opportunities to engage, participate, and have their voices heard. Unlike most other library conferences, Code4Lib doesn’t have competing programming. Everyone gathers in one large room and attends the same talks and sessions. It was this model of inclusivity, equality, and innovation that I found most appealing about Code4Lib, and will no doubt draw me back in coming years.
For more information about the conference, including streaming video and slides, visit the Code4Lib 2019 website.
Nicole Shibata is the Metadata Librarian at California State University, Northridge.