Philly Born Digital Access Bootcamp

by Faith Charlton

As Princeton University Library’s Manuscripts Division processing team continues to move forward in terms of managing its born-digital materials, much of its focus as of late has been on providing access to this content (else, why preserve it?). So, the timing of the Born Digital Access bootcamp that was held in Philadelphia this past summer was very opportune. Among other takeaways, it was helpful and comforting to learn how other institutions are grappling with the issue of providing or restricting access in relation to what Princeton is currently doing.  

The bootcamp, led by Alison Clemens from Yale and Greg Weideman from SUNY Albany, was well-organized and very informative; and I really appreciate how community-driven and participatory this initiative is, down to the community notes prepared by one of its organizers,Rachel Appel who was in attendance. I also appreciated that the content provided a holistic and comprehensive approach to access, including reinforcement of the fact that the ability to provide access to born-digital materials starts at the point of record creation; and that once implemented, the effectiveness of the means by which institutions are providing access should be determined through frequent user testing.

One point in particular that Alison and Greg emphasized that stood out to me is how the discovery of born-digital content is often almost as difficult as the delivery of that content. This was exemplified during the user testing portion of the bootcamp where attendees had the opportunity to interact with several discovery platforms that describe and/or provide access to digital records. The testing demonstrated that the barriers that remain in terms of locating and accessing digital content are still fairly significant.

The issues surrounding discovery and delivery are something that archivists at Princeton are trying to manage and improve upon. For example, I’m part of two working groups that are tackling these issues from different angles: the Description and Access to Born Digital Archival Collections and the User Experience working groups. The latter has started to embark on both formal and informal user testing of our finding aids site. One aspect that we’re paying particular attention to is the ease with which users can locate and access digital content. I had the opportunity to contribute one of Princeton’s finding aids as a use case for the user testing portion of the workshop; and received helpful feedback, both positive and negative, from bootcamp attendees about the description and delivery methods found on our site. Although one can access the digital records from this collection, there are some impediments in actually viewing the files; namely, one would have to download a program like Thunderbird in order to view the mbox file of emails, a fact that’s not evident to the user.     

Untitled drawing

Technical Services archivists at Princeton are also collaborating with colleagues in Public Services and Systems to determine how we might best provide various methods of access to our born-digital records. Because much of the content in Manuscripts Division collections is (at the moment) restricted due to issues related to copyright, privacy, and donor concerns, we’re trying to determine how we can provide mediated access to content both on and off-site. I was somewhat relieved to learn that, like Princeton, many institutions represented at the bootcamp are still relying on non-networked “frankenstein” computers in the reading room as the only other means of providing access aside from having content openly available online. Hopefully Princeton will be able to provide better forms of mediated access in the near future as we intend to implement a pilot version of networked access in the reading room for various forms of digital content, including text, image, and AV files. The next step could be to implement a “virtual reading room” where users can access content via authentication. As these initiatives are realized, we’ll continue to conduct user testing to make sure that what we’re providing is actually useful to patrons. Princeton staff look forward to continuing to participate in the initiatives of the Born Digital Access group as a way to both learn from and share our experiences with this community.    


Untitled drawing (1)

Faith Charlton is Lead Processing Archivist for Manuscripts Division Collections at Princeton University Library. She is a certified archivist and holds an MLIS from Drexel University, an MA in History from Villanova University, and a BA in History from The College of New Jersey.

Advertisements

DLF Forum & Digital Preservation 2017 Recap

By Kelly Bolding


The 2017 DLF Forum and NDSA’s Digital Preservation took place this October in Pittsburgh, Pennsylvania. Each year the DLF Forum brings together a variety of digital library practitioners, including librarians, archivists, museum professionals, metadata wranglers, technologists, digital humanists, and scholars in support of the Digital Library Federation’s mission to “advance research, learning, social justice, and the public good through the creative design and wise application of digital library technologies.” The National Digital Stewardship Alliance follows up the three-day main forum with Digital Preservation (DigiPres), a day-long conference dedicated to the “long-term preservation and stewardship of digital information and cultural heritage.” While there were a plethora of takeaways from this year’s events for the digital archivist community, for the sake of brevity, this recap will focus on a few broad themes, followed by some highlights related to electronic records specifically.

As an early career archivist and a first-time DLF/DigiPres attendee, I was impressed by the DLF community’s focus on inclusion and social justice. While technology was central to all aspects of the conference, the sessions centered the social and ethical aspects of digital tools in a way that I found both refreshing and productive. (The theme for this year’s DigiPres was, in fact, “Preservation is Political.”) Rasheedah Phillips, a Philadelphia-based public interest attorney, activist, artist, and science fiction writer opened the forum with a powerful keynote about the Community Futures Lab, a space she co-founded and designed around principles of Afrofuturism and Black Quantum Futurism. By presenting an alternate model of archiving deeply grounded in the communities affected, Phillips’s talk and Q&A responses brought to light an important critique of the restrictive nature of archival repositories. I left Phillips’s talk thinking about how we might allow the the liberatory “futures” she envisions to shape how we design online spaces for engaging with born-digital archival materials, as opposed to modeling these virtual spaces after the physical reading rooms that have alienated many of our potential users.

Other conference sessions echoed Phillips’s challenge to archivists to better engage and center the communities they document, especially those who have been historically marginalized. Ricky Punzalan noted in his talk on access to dispersed ethnographic photographs that collaboration with documented communities should now be a baseline expectation for all digital projects. Rosalie Lack and T-Kay Sangwand spoke about UCLA’s post-custodial approach to ethically developing digital collections across international borders using a collaborative partnership framework. Martha Tenney discussed concrete steps taken by archivists at Barnard College to respect the digital and emotional labor of students whose materials the archives is collecting to fill in gaps in the historical record.

Eira Tansey, Digital Archivist and Records Manager at the University of Cincinnati and organizer for Project ARCC, gave her DigiPres keynote about how our profession can develop an ethic of environmental justice. Weaving stories about the environmental history of Pittsburgh throughout her talk, Tansey called for archivists to commit firmly to ensuring the preservation and usability of environmental information. Related themes of transparency and accountability in the context of preserving and providing access to government and civic data (which is nowadays largely born-digital) were also present through the conference sessions. Regarding advocacy and awareness initiatives, Rachel Mattson and Brandon Locke spoke about Endangered Data Week; and several sessions discussed the PEGI Project. Others presented on the challenges of preserving born-digital civic and government information, including how federal institutions and smaller universities are tackling digital preservation given their often limited budgets, as well as how repositories are acquiring and preserving born-digital congressional records.

Collaborative workflow development for born-digital processing was another theme that emerged in a variety of sessions. Annalise Berdini, Charlie Macquarie, Shira Peltzman, and Kate Tasker, all digital archivists representing different University of California campuses, spoke about their process in coming together to create a standardized set of UC-wide guidelines for describing born-digital materials. Representatives from the OSSArcFlow project also presented some initial findings regarding their research into how repositories are integrating open source tools including BitCurator, Archivematica, and ArchivesSpace within their born-digital workflows; they reported on concerns about the scalability of various tools and standards, as well as desires to transition from siloed workflows to a more holistic approach and to reduce the time spent transforming the output of one tool to be compatible with another tool in the workflow. Elena Colón-Marrero of the Computer History Museum’s Center for Software History provided a thorough rundown of building a software preservation workflow from the ground-up, from inventorying software and establishing a controlled vocabulary for media formats to building a set of digital processing workstations, developing imaging workflows for different media formats, and eventually testing everything out on a case study collection (and she kindly placed her whole talk online!)

Also during the forum, the DLF Born-Digital Access Group met over lunch for an introduction and discussion. The meeting was well-attended, and the conversation was lively as members shared their current born-digital access solutions, both pretty and not so pretty (but never perfect); their wildest hopes and dreams for future access models; and their ideas for upcoming projects the group could tackle together. While technical challenges certainly figured into the discussion about impediments to providing better born-digital access, many of the problems participants reported had to do with their institutions being unwilling to take on perceived legal risks. The main action item that came out of the meeting is that the group plans to take steps to expand NDSA’s Levels of Preservation framework to include Levels of Access, as well as corresponding tiers of rights issues. The goal would be to help archivists assess the state of existing born-digital access models at their institutions, as well as give them tools to advocate for more robust, user-friendly, and accessible models moving forward.

For additional reports on the conference, reflections from several DLF fellows are available on the DLF blog. In addition to the sessions I mentioned, there are plenty more gems to be found in the openly available community notes (DLF, DigiPres) and OSF Repository of slides (DLF, DigiPres), as well as in the community notes for the Liberal Arts Colleges/HBCU Library Alliance unconference that preceded DLF.


Kelly Bolding is a processing archivist for the Manuscripts Division at Princeton University Library, where she is responsible for the arrangement and description of early American history collections and has been involved in the development of born-digital processing workflows. She holds an MLIS from Rutgers University and a BA in English Literature from Reed College.

Stanford Hosts Pivotal Session of Personal Digital Archiving Conference

By Mike Ashenfelder


In March, Stanford University Libraries hosted Personal Digital Archiving 2017, a conference about preservation and access of digital stuff for individuals and for aggregations of individuals. Presenters included librarians, data scientists, academics, data hobbyists, researchers, humanitarian citizens and more. PDA 2017 differed from previous PDA conferences though, when an honest, intense discussion erupted about race, privilege and bias.

Topics did not fall into neat categories. Some people collected data, some processed it, some managed it, some analyzed it. But often the presenters’ interests overlapped. Here are just some of the presentations, grouped by loosely related themes.

  • Joan Jeffri’s (Research Center for Arts & Culture/The Actors Fund) project archives the legacy of older performing artists. Jessica Moran (National Library of New Zealand) talked about the digital archives of a contemporary New Zealand composer and Shu-Wen Lin (NYU) talked about archiving an artist’s software-based installation.
  • In separate projects, Andrea Prichett (Berkeley Copwatch), Stacy Wood and Robin Margolis (UCLA), and Ina Kelleher (UC Berkeley) talked about citizens tracking the actions of police officers and holding the police accountable.
  • Stace Maples (Stanford) helped digitize 500,000+ consecutive photos of buildings along Sunset Strip. Pete Schreiner (North Carolina State University) archived the debris from a van that three bands shared for ten years. Schreiner said, “(The van) accrued the detritus of low-budget road life.”
  • Adam Lefloic Lebel (University of Montreal) talked about archiving video games and Eric Kaltman (UC Santa Cruz) talked about the game-research tool, GISST.
  • Robert Douglas Ferguson (McGill) examined personal financial information management among young adults. Chelsea Gunn (University of Pittsburgh) talked about the personal data that service providers collect from their users.
  • Rachel Foss (The British Library) talked about users of born-digital archives. Dorothy Waugh and Elizabeth Russey Roke (Emory) talked about how digital archiving have evolved since Emory acquired the Salman Rushdie collection.
  • Jean-Yves Le Meur (CERN) called for a non-profit international collaboration to archive personal data, “…so that individuals would know they have deposited stuff in a place where they know it is safe for the long term.”
  • Sarah Slade (State Library Victoria) talked about Born Digital 2016, an Australasian public-outreach program. Natalie Milbrodt (Queens Public Library) talked about helping her community archive personal artifacts and oral histories. Russell Martin (DC Public Library) talked about helping the DC community digitize their audio, video, photos and documents. And Jasmyn Castro (Smithsonian African American History Museum) talked about digitizing AV stuff for the general public.
  • Melody Condron (University of Houston) reviewed tools for handling and archiving social media and Wendy Hagenmaier (Georgia Tech) introduced several custom-built resources for preservation and emulation.
  • Sudheendra Hangal and Abhilasha Kumar (Ashoka University) talked about using personal email as a tool to research memory. And Stanford University Libraries demonstrated their ePADD software for appraisal, ingest, processing, discovery and delivery of email archives. Stanford also hosted a hackathon.
  • Carly Dearborn (Purdue), talked about data analysis and management for researchers. Leisa Gibbons (Kent State) analyzed interactions between YouTube and its users. and Nancy Van House (UC Berkeley) and Smiljana Antonijevic Ubois (Penn State) talked about digital scholarly workflow. Gary Wolf (Quantified Self) talked about himself.

Some presentations addressed cultural sensitivity and awareness. Barbara Jenkins (University of Oregon) discussed a collaborative digital project in Afghanistan. Kim Christen (Washington State) demonstrated Mukurtu, built with indigenous communities, and Traditional Knowledge Labels, a metadata tool for adding local cultural protocols. Katrina Vandeven (University of Denver) talked about a transformative insight she had during a Women’s March project, where she suddenly became aware of the bias and “privileged understanding” she brought to it.

The conference ended with observations from a group of panelists who have been involved with the PDA conferences since the beginning: Cathy Marshall, Howard Besser (New York University), Jeff Ubois (MacArthur Foundation), Cliff Lynch (Coalition for Networked Information) and me.

Marshall said, “I still think there’s a long-term tendency toward benign neglect, unless you’re a librarian, in which case you tend to take better care of your stuff than other people do.” She added that cloud services have improved to the point where people can trust online backup. Marshall said, “Besides, it’s much more fun to create new stuff than to be the steward of your own of your own mess.”

Lynch agreed about automated backup. “There used to be a view that catastrophic data loss was part of life and you’d just lose your stuff and start over,” Lynch said. “It was cleansing and terrifying at the same time.” He said the possibility of data loss is still real but less urgent.

Marshall spoke of backing the same stuff up again and again, and how it’s “all over the place.”

Besser described a conversation he had with his sister that they carried out over voicemail, WhatsApp, text and email. “All this is one conversation,” Besser said. “And it’s all over the place.” Lynch predicted that the challenge of organizing digital things is “…going to shift as we see more and more…automatic correlation of things.”

Ubois said, “I think emulation has been proven as something that we can trust.” He also indicated the “cognitive diversity” around the room. He said, “Many of the best talks at PDA over the years have been by persistent fanatics who had something and they were going to grab it and make it available.”

Besser said, “Things that we were talking about…years ago, that were far out ideas, have entered popular discourse…One example is what happens to your digital stuff when you die…Now we have laws in some states about it…and the social media have stepped up somewhat.”

I noted that the first PDA conference included presentations about personal medical records and about genealogy, but those two areas haven’t been covered since. Lynch made a similar statement about how genealogy “…richly deserves a bit more exploration.” I also noted that the general public still needs expert information about digitizing and digital preservation, and we see more examples of university and public librarians taking the initiative to help their communities with PDA.

In a Q&A session, Charles Ransom (University of Michigan), raised the bias issue again when he said, “I was wondering…how privilege plays a part in all of this. Look at the audience and it’s clear that it does,” referring to the overwhelmingly white audience.

Besser said that NYU reaches out to activist, ethnic and neighborhood communities. “Most of us (at this conference) …work with disenfranchised communities,” said Besser. “It doesn’t bring people here to this meeting…but it does mean that at least some of those voices are being heard through outreach.” Besser said that when NYU hosted PDA 2015, they worked toward diversity. “We still had a pretty white audience,” Besser said. “But…it’s more than just who gets the call for proposal…It’s a big society problem that is not really easy to solve and we all have to really work on it.”

I said it was a challenge for PDA conference organizers to reach a wide, diverse audience just through the host institution’s social media tools and a few newsgroups.  When asked what the racial mix was at the PDA 2016 conference (which University of Michigan hosted), Ransom said it was about the same as this conference. He said, “I specifically reached out to presenters that I knew and the pushback I got from them was ‘We don’t have a budget to go to Ann Arbor for four days and pay for hotel and travel and registration fees.’ “

Audience members suggested having the PDA host invite more local community organizations, so travel and lodging won’t matter, and possibly waiving fees. The University of Houston will host PDA 2018; Melody Condron said UH has a close relationship with Houston community organizations and she will explore ways to involve them in the conference.

Lynch, whose continuous conference travels afford him a global technological perspective, said of the PDA 2017 conference, “I’m really encouraged, particularly by the way we seem to be moving the deeper and harder problems into focus…We’re just now starting to understand the contours of the digital environment.”

The conference videos are available online at https://archive.org/details/pda2017.

Call for Contributions: Collaborating Beyond the Archival Profession

The work of archivists is highly collaborative in nature. While the need for and benefits of collaboration are widely recognized, the practice of collaboration can be, well, complicated.

This year’s ARCHIVES 2017 program featured a number of sessions on collaboration: archivists collaborating with users, Indigenous communities, secondary school teachers, etc. We would like to continue that conversation in a series of posts that cover the practical issues that arise when collaborating with others outside of the archival profession at any stage of the archival enterprise. Give us your stories about working with technologists, videogame enthusiasts, artists, musicians, activists, or anyone else with whom you find yourself collaborating!

A few potential topics and themes for posts:

  • Posts written by non-traditional archivists or others working to preserve heritage materials outside of traditional archival repositories
  • Posts co-written by archivists and collaborators
  • Tips for translating archive jargon, and suggestions for working with others in general
  • Incorporating researcher feedback into archival work
  • The role of empathy in digital archives practice

Writing for bloggERS! Collaborating Beyond the Archival Profession series

  • We encourage visual representations: Posts can include or largely consist of comics, flowcharts, a series of memes, etc!
  • Written content should be 600-800 words in length
  • Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
  • Align with other editorial guidelines as outlined in the bloggERS! guidelines for writers.

Posts for this series will start in November, so let us know if you are interested in contributing by sending an email to ers.mailer.blog@gmail.com!

Hybrid like Frankenstein, but not helpful like a Spork

By Gabby Redwine

This is the third post in the bloggERS series on Archiving Digital Communication.


The predictions have come true: acquisitions of born-digital materials are on the rise, and for the foreseeable future many of these collections will include a combination of digital and analog materials. Working with donors prior to acquisition can help a collecting body ensure that the digital archives it receives fall within its collecting scope in terms of both format and content. This holds true for institutional repositories, collecting institutions of all sorts, and archives gathered and stored by communities and individuals rather than institutions. Donors sometimes can provide insight into how born-digital items in an acquisition relate to each other and to non-digital materials, which can be particularly helpful with acquisitions containing a hybrid of paper and born-digital correspondence.

I’ve helped transfer a few acquisitions containing different kinds of digital correspondence: word processing documents, email on hard drives and in the cloud, emails saved as PDFs, email mailboxes in archived formats, and others. Often the different formats represent an evolution in a person’s correspondence practices over time and across the adoption of different digital technologies. Just as often, a subset of these different types of digital correspondence are duplicated in analog form.

Examples include:

  • Letters originally written in word processing software that also exist as print-outs with corrections and scribbles, not to mention the paper copy received (and perhaps retained in some other archive) by the recipient.
  • Email that has been downloaded, saved, printed, and stored alongside analog letters.
  • An acquisition that includes email as the only correspondence after a particular date, all of which is downloaded and saved as individual PDF files, but only the most important ones are printed and stored among paper records.
  • Email folders received annually from staff with significant duplication in content.
  • Tens of thousands of emails stored in the cloud which have been migrated across different clients/hosts over the last 20 years, some with different foldering and labeling practices.

When the time comes to transfer all or some of this to an archives, the donor and the collecting body must make decisions about what, if anything, is important to include and how to represent the relationship between the different correspondence formats. Involvement with donors early on can be incredibly beneficial, but it can also cause a significant drain on staff resources, particularly in one-person shops.

What is the minimum level of support staff can provide to every donor with digital materials? What are levels of service that could be added in particular circumstances—for example, when a collection is of particular value or a donor requires additional technological support? And how can staff ensure that the minimum level of service provided doesn’t inadvertently place an undue burden on a donor—for example, someone who may not have the resources to hire technological support or might not like to ask for help—that results in important materials being excluded from the historical record?

At the staff end, hybrid correspondence files also raise questions about whether and how to identify both paper and digital duplicates (is it worth the effort?), whether and how to dispose of them (is it worth the time and documentation?), and at what point in the process this work can realistically take place. Many of the individual components of hybrid correspondence archives seem familiar and perhaps even basic to archivists, but once assembled they present challenges that resemble a more complex monster—one that perhaps not even the creator can explain.

I’m writing from the perspective of someone who has been involved with hybrid collections primarily at the acquisition and accessioning end of the spectrum. If any readers have an example of an archival collection in which the hybrid nature of the materials has been helpful (like a Spork!), perhaps during arrangement & description or even to a researcher, please share your experience in the comments.


Gabby Redwine is Digital Archivist at the Beinecke Rare Book & Manuscript Library at Yale.

Call for Contributors: Archiving Digital Communications Series

Archives have long collected correspondence, but as communication has shifted to digital platforms, archivists must discover and develop new tools and methods.  From appraising one massive inbox to describing threaded messages, email has introduced many new challenges to the way we work with correspondence. Likewise, instant messaging, text messaging, collaborative online working environments, and other forms of digital communication have introduced new challenges and opportunities.

We want to hear how you and your institution are managing the acquisition, appraisal, processing, preservation and access to these complex digital collections.  Although the main focus of most programs is email, we’re also interested in hearing how you manage other formats of digital communication as well.

We’re interested in real-life solutions by working archivists: case studies, workflows, any kind of practical work with these collections describing the challenges of  the archival processes to acquire, preserve, and make accessible email and other forms of digital communication.

A few potential topics and themes for posts:

  • Evaluating tools to acquire and process email
  • Case studies on archiving email and other forms of digital communication
  • Integrating practices for digital correspondence with physical correspondence
  • Addressing privacy and legal issues in email collections
  • Collaborating with IT departments and donors to collect email

Writing for bloggERS!

  • Posts should be between 200-600 words in length
  • Posts can take many forms: instructional guides, in-depth tool exploration, surveys, dialogues, point-counterpoint debates are all welcome!
  • Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
  • Align with other editorial guidelines as outlined in the bloggERS! guidelines for writers.

Call for Contributors – Digital Archives Pathways Series

Archivists by their very nature are jacks of all trades, and the same goes for those who work with digital collection materials. Archives programs and iSchools are increasingly offering coursework in digital archives theory and practice, but not all digital archivists got their chops through academic channels, and for many archivists, digital only describes part of their responsibilities.

While all archivists must determine their own path for professional growth, the field of digital archives is also uniquely challenging. Preparation and training for this work require dedication, creativity, and engagement. Processing, preserving, and providing access to digital materials, and expertise in specialized content such as legacy media and web archiving are ever-expanding challenges.

In the Digital Archives Pathways series, we are looking for stories about the non-traditional, accidental, idiosyncratic, or unique path you took to become a digital archivist, however you define that in your work. What do you consider essential to your training, and what do you wish had been a larger part of it? How might your journey towards digital archives work be characterized as non-traditional? How do you plan on continuing your education in digital archives?

Writing for bloggERS! Digital Archives Pathways Series:

  • We encourage visual representations: Posts can include or consist of comics, flowcharts, a series of memes, etc!
  • Written content should be 200-600 words in length
  • Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
  • Align with other editorial guidelines as outlined in the bloggERS! guidelines for writers.

Posts for this series will start in July, so let us know ASAP if you are interested in contributing by sending an email to ers.mailer.blog@gmail.com!

Digital Preservation in NYC

This year’s meeting of the Preservation and Archiving Special Interest Group (PASIG) took place this October at the Museum of Modern Art in New York.  PASIG brings together an international community to share successes and challenges of digital preservation, with an emphasis on practical applications and solutions.

The conference was three days long, and kicked off with a day of “Bootcamp/101” sessions, focused on bringing everyone up to speed on what it is we’re preserving and how we can go about building infrastructures to support preservation.  Unfortunately I wasn’t able to arrive until Day 2, but many of the presentation slides are available online at the conference’s figshare page.

I arrived on Thursday morning, ready to jump into a morning of presentations and panel discussions on reproducibility and research data.  Vicky Steeves started the presentations with an explanation of reproducibility vs replication, a distinction well worth making especially for those with of us with less experience working with research data.  

“Reproducibility independently confirms results with the same data (and/or code) Replication independently confirms results with new data (and/or code)”

Steeves pointed out that the concerns of reproducibility are really an iceberg, because the environment in which the research was conducted often goes unnoticed–especially in a technological environment where research tools may rely on a certain version of a browser, hardware, or software tools.  These tools may be updated or change in a way that isn’t immediately visible.

One potential solution to this problem was presented by Fernando Chirigati of New York University.  He introduced the tool ReproZip, which allows the researcher to package the data files, libraries, and environment variables.  Reprozip runs in the background while the experiment is conducted, and documents the variables and technological dependencies that future researchers will need when reproducing an experiment in a future where tools and browsers may have changed.  The packaged data and environment variables can be archived, then unpackaged by ReproZip for future use.

Both Peter Brunhill from University of Edinburgh and Rachel Trent from George Washington University Libraries discussed the problem of reproducing research reliant on web resources.  Brunhill’s presentation, “Web Today, Gone Tomorrow” focused on the lack of persistence in web addresses, and the need for ongoing preservation of online articles and other academic resources.  To get an idea of the scope of this problem, 20-30% of referenced URLs are lost within 2 weeks of publication.  Brunhill presented the Hiberlink project, which aims to find solutions for this preservation gap through partnerships with academic publishing outlets.  Rachel Trent’s presentation, “Documenting the Demographic Imagination” discussed the challenges of preserving social media data for reproducible research.  Given the continued migration from one social media forum to another (myspace to facebook to twitter, etc), the archivist can’t assume that future researchers will understand the basis of any of these websites.  Trent discussed the usage of social media managers and web harvesters to automate the collection of social media data, and what metadata can be automatically extracted using these tools.  Trent and her team are now looking for feedback from the community on what’s missing from their social media metadata, and how researchers want to interact with this metadata.

After a brief lunch break, we dove into the challenges of preserving complex and very large data.  Karen Cariani presented on the public broadcasting media library and archives of WGBH.  Working with audio and video files, the preservation needs are significant and uncompressed preservation masters are very large.  The formats are complicated and proxy files are necessary for access purposes.  Cariani discussed how the HydraDAM2 project worked to fill this preservation gap, by extending the HydraDAM system to work with the Fedora 4 repository and creating a Hydra “head” for digital A/V preservation.   

Ben Fino-Radin continued on the theme of preservation at scale, discussing the creation of workflows for digitized time-based media holdings at the MoMA.  The digital repository uses Archivematica for ingest, Arkivum for storage, and Binder for managing these digital assets.  A single 120 minute film once restored at 4k resolution contains 4 Terabytes of data, so the workflows and systems for managing these files have to move quickly and efficiently.  This also means that the MoMA must be efficient in prioritizing film digitization efforts.

Day three focused on sustainability, not only sustaining our cultural and scientific heritage through digital preservation, but also on sustaining our planet and communities.  Eira Tansey from University of Cincinnati pointed out the obvious but rarely discussed point that archives require energy, digital archives especially so.  She urged the audience to consider the energy required for preservation during their daily work in the archive.  Some common practices or digital preservation may be a wasteful use of resources, such as preserving every derivative file as it is migrated from one format to the next, or considering file compression as the enemy of preservation.  She posted the entire text of her talk online, The Voice of One Crying Out in the Wilderness: Preservation in the Anthropocene.

Elvia Arroyo-Ramirez, Processing Archivist for Latin American Manuscript Collections, Princeton University, presented “Invisible Defaults and Perceived Limitations: Processing the Juan Gelman Files.”  She discussed how the systems we use contain the biases of the people who create them, pointing to systems that require file names be ‘cleaned’ or ‘scrubbed’ to remove ‘illegal characters’ including Spanish-language diacritic glyphs.  When working with a born-digital collection created in another language, those glyphs are vital to the understanding of those records.  She asked the community how we can intervene to make our tools and technologies reflect our mission to preserve the records and ‘do no harm.’  

The conference was concluded by Ingrid Burrington, neither an archivist nor a digital preservationist, but self-described writer, mapmaker and joke maker, and author of Networks of New York:  An Illustrated Field Guide to Urban Internet Infrastructure.  She discussed the physical infrastructure that makes up the internet and the corporate infrastructures that keep it running. She pointed to social media as crafting communication and products like Google Maps crafting our understanding of the world’s geography.  Companies like Google can skew their products away from reality–be that the blurring of sensitive government installation or their own data centers. Corporate interest and the public need for information do not always align.

This change of perspective was a great end to the conference, bringing us out of our technical comfort zones and making the audience consider how the work of digital preservation has larger and potentially more dire effects than we may realize.

 

profilephoto

 

Alice Sara Prael is the Digital Accessioning Archivist at Beinecke Rare Book & Manuscript Library at Yale University.  She works with born digital archival material through a centralized accessioning service.

Call for Posts: International Perspectives on Digital Preservation

The BloggERS editorial team is planning a series of blog posts to present an international view on digital preservation, and we would like to invite you to participate.

We like to think of our topical blog series as a chance for digital archivists to share information about issues they are facing, solutions they have implemented, and new projects they are working on. We’ve had some great series in the past on digital processing and access, so we thought it might be valuable to get perspectives on digital preservation from various countries and cultures.

We have several goals that we hope the series might reach:

  1. We want to highlight similarities across borders, which will foster information sharing and can lead to fruitful collaborations;
  2. We want to discover differences in practice based on local laws, values, practices, histories; differences in practice give fresh perspective into one’s own work as well as provide new ideas for moving forward;
  3. We want to use the ERS blog to facilitate in the development of an international dialogue about the values, technologies, and practices that shape digital preservation needs across the globe;
  4. We hope to encourage future collaborative relationships by giving repositories worldwide a chance to describe their problems and solutions;
  5. We want to offer the blog as a common space for discussions of digital preservation with international points of view.

We want this series of posts to be useful to anyone working anywhere around the globe, not just in the United States. If you’ve run into issues specific to your country or culture and want to describe your issues and share your solutions, or if you’ve got a cool project that might interest an international audience, we’d love to hear from you.

Contact us with post ideas at ers.mailer.blog@gmail.com

Also, check out our Guidelines for Writers.

Get to know the candidates: Greg Wiedeman

The 2016 elections for Electronic Records Section leadership are upon us! Over the next two weeks, we will be presenting additional information provided by the 2016 nominees for ERS leadership positions. For more information about the slate of candidates, you can check out the full 2016 ERS elections site. ERS Members: be sure to vote! Polls are open July 8 through the 22!

Candidate name: Gregory Wiedeman

Running for: Steering Committee

What made you decide you wanted to become an archivist?

As a graduate history student without funding, working in archives was an attractive alternative to poverty. Then I found out how awesome a job it is. I was lucky enough to get great hands-on experience on big projects and I found the work more complex, interesting, and enjoyable than my graduate research. I love all of the problem-solving it takes, and making all of the content we have available for public use.

What is one thing you’d like to see the Electronic Records Section accomplish during your time on the steering committee?

I really see the role of the section to encourage communication about techniques and best practices. Digital records have tremendous potential to make our collections much more accessible, but there are major hurdles – many of which can be helped by sharing the work many of us are doing independently. The ERS has already done a great job with this bloggERS! series which has provided a forum for the sharing and discussion of some of the really innovative projects that are pushing the community forward. Yet, I’ve found that there are many electronic records efforts are smaller, ad hoc, and more continuous than formal or polished. I’d like to find a way to share all the workflows, draft policies, and small scripts so that our peers can reuse and build upon them. In addition to the continuation of the bloggERS! series, I’d like the ERS to look into a mechanism for archivists to self-submit these small, but useful, efforts in a way that promotes permissive reuse and documentation.

What is your favorite GIF?

This time of year it’s always Bartolo:

giphy2