Latest #bdaccess Twitter Chat Recap

By Daniel Johnson and Seth Anderson

This post is the eighteenth in a bloggERS series about access to born-digital materials.

____

In preparation for the Born Digital Access Bootcamp: A Collaborative Learning Forum at the New England Archivists spring meeting, an ad-hoc born-digital access group with the Digital Library Federation recently held a set of #bdaccess Twitter chats. The discussions aimed to gain insight into issues that archives and library staff face when providing access to born-digital.

Here are a few ideas that were discussed during the two chats:

  • Backlogs, workflows, delivery mechanisms, lack of known standards, appraisal and familiarity with software were major barriers to providing access.
  • Participants were eager to learn more about new tools, existing functioning systems, providing access to restricted material and complicated objects, which institutions are already providing access to data, what researchers want/need, and if any user testing has been done.
  • Access is being prioritized by user demand, donor concerns, fragile formats and a general mandate that born-digital records are not preserved unless access is provided.
  • Very little user testing has been done.
  • A variety of archivists, IT staff and services librarians are needed to provide access.

You can search #bdaccess on Twitter to see how the conversation evolves or view the complete conversation from these chats on Storify.

The Twitter chats were organized by a group formed at the 2015 SAA annual meeting. Stay tuned for future chats and other ways to get involved!

____

Daniel Johnson is the digital preservation librarian at the University of Iowa, exploring, adapting, and implementing digital preservation policies and strategies for the long-term protection and access to digital materials.

Seth Anderson is the project manager of the MoMA Electronic Records Archive initiative, overseeing the implementation of policy, procedures, and tools for the management and preservation of the Museum of Modern Art’s born-digital records.

#bdaccess Twitter Chat Recap

By Jess Farrell and Sarah Dorpinghaus

This post is the sixteenth in a bloggERS series about access to born-digital materials.

____

An ad-hoc born-digital access group with the Digital Library Federation recently held two successful and informative #bdaccess Twitter chats that scratched the surface of the born-digital access landscape. The discussions aimed to gain insight on how researchers want to access and use digital archives and included questions on research topics, access challenges, and discovery methods.

Here are a few ideas that were discussed during the two chats:

You can search #bdaccess on Twitter to see how the conversation evolves or view the complete conversation from these chats on Storify.

The Twitter chats were organized by a group formed at the 2015 SAA annual meeting. We are currently developing a bootcamp to share ideas and tools for providing access to born-digital materials and have teamed up with the Digital Library Federation to spread the word about the project. Stay tuned for future chats and other ways to get involved!

____

Jess Farrell is the curator of digital collections at Harvard Law School. Along with managing and preserving digital history, she’s currently fixated on inclusive collecting, labor issues in libraries, and decolonizing description.

Sarah Dorpinghaus is the Director of Digital Services at the University of Kentucky Libraries Special Collections Research Center. Although her research interests lie in the realm of born-digital archives, she has a budding pencil collection.

Software Preservation Network: Community Roadmapping for Moving Forward

By Susan Malsbury

This is the fifth post in our series on the Software Preservation Network 2016 Forum.
____

Software Preservation Network logo

The final session of the Software Preservation Forum was a community roadmapping activity with two objectives: to synthesize topics, patterns, and projects that came up during the forum, and to articulate steps and the time frame for future work. This session built off of two earlier activities in the day: an icebreaker in the morning and a brainstorming activity in the afternoon.

For the morning icebreaker, participants –armed with blank index cards and a pen–found someone in the room they hadn’t met before. After brief introductions they each shared one challenge that their organization faced with software and/or software preservation, and they wrote their partner’s challenge on their own index card. After five rounds of this, participants returned to their tables for the opening remarks from the Jessica Meyerson and Zach Vowell, and Cal Lee.

At the afternoon brainstorming activity, participants took the cards form the morning icebreaker as well as fresh cards and again paired with someone they hadn’t met. Each pair looked over their notes from the morning and wrote out goals, tasks, and projects that could respond to the challenges. By that point, we had three excellent sessions as well as casual conversations over lunch and coffee breaks to further inform potential projects.

I paired with Amy Stevenson from the Microsoft Corporation. Even though her organization is very different from mine (the New York Public Library), we easily identified projects that would address our own challenges as well as the challenges we gathered in the morning. The projects we identified  included the need for a software registry, educational resources, and a clearinghouse to provide discovery for software. We then placed our cards on a butcher paper timeline at the front of the room that spanned from right now to 2022–a six-year time frame with the first full year being 2017.

During the fourth session on partnerships, Jessica Meyerson entered the goals, projects, and ideas from the timeline into a spreadsheet so that for the fifth session we were ready to get road mapping! For this session we broke into three groups to discuss the roadmap and to work on our own group’s copy of the spreadsheet. Our group subdivided into smaller groups who each took a year of the timeline to edit and comment on. While we all focused on our year, conversation between subgroups flowed freely and people felt comfortable moving projects into other years or streamlining ideas across the entire time frame. Links to the master spreadsheet and our three versions can be found here.

Despite having  three separate groups, it was remarkable how much our edited roadmaps aligned with the others. Not surprisingly, most people felt like it was important to front-load steps regarding research, developing platforms for sharing information, and identifying similar projects to form partnerships. Projects in the later years would grow from this earlier research: creating the registry, establishing a coalition, and developing software metadata models.

I found the forum and this session in particular to be energizing. I had attended the talk that Jessica Meyerson and Zach Vowell gave at SAA in 2014 when they first formed the Software Preservation Network. While I was intrigued by the idea of software preservation it seemed a far off concept to me. At that time, there were still many other issues regarding digital archives that seemed far more pressing. When I heard other people’s challenges at the forum, and had space to think about my own,  I realized how important and timely software preservation is. As digital archives best practices are being codified, more and more we are realizing how dependent we are on (often obsolete) software to do our work.

____

Susan Malsbury is the Digital Archivist for The New York Public Library, working with born digital archival material across the three research centers of the Library. In this role, she assists curators with acquisitions; oversees technical services staff handling ingest and processing; and coordinates with public service staff to design and implement access systems for born digital content. Susan has worked with archives at NYPL in various capacities since 2007.

Announcing the First-Ever #bdaccess Twitter Chats: 10/27 @ 2 and 9pm EST

By Jess Farrell and Sarah Dorpinghaus

This post is the fifteenth in a bloggERS series about access to born-digital materials.

____

Contemplating how to provide access to born-digital materials? Wondering how to meet researcher needs for accessing and analyzing files? We are too! Join us for a Twitter chat on providing access to born digital records.

*When?* Thursday, October 27 at 2:00pm and 9:00pm EST
*How?* Follow #bdaccess for the discussion
*Who?* Researchers, information professionals, and anyone else interested in using born-digital records

Newly-conceived #bdaccess chats are organized by an ad-hoc group that formed at the 2015 SAA annual meeting. We are currently developing a bootcamp to share ideas and tools for providing access to born-digital materials and have teamed up with the Digital Library Federation to spread the word about the project.

Understanding how researchers want to access and use digital archives is key to our curriculum’s success, so we’re taking it to the Twitter streets to gather feedback from digital researchers. The following five questions will guide the discussion:

Q1. _What research topic(s) of yours and/or content types have required the use of born digital materials?_

Q2. _What challenges have you faced in accessing and/or using born digital content? Any suggested improvements?_

Q3. _What discovery methods do you think are most suitable for research with born digital material?_

Q4. _What information or tools do/could help provide the context needed to evaluate and use born digital material?_

Q5. _What information about collecting/providing access would you like to see accompanying born digital archives?_

Can’t join on the 27th? Follow #bdaccess for ongoing discussion and future chats!

____

Jess Farrell is the curator of digital collections at Harvard Law School. Along with managing and preserving digital history, she’s currently fixated on inclusive collecting, labor issues in libraries, and decolonizing description.

Sarah Dorpinghaus is the Director of Digital Services at the University of Kentucky Libraries Special Collections Research Center. Although her research interests lie in the realm of born-digital archives, she has a budding pencil collection.

Software Preservation Network: Prospects in Software Preservation Partnerships

By Karl-Rainer Blumenthal

This is the fourth post in our series on the Software Preservation Network 2016 Forum.
____

Software Preservation Network logoTo me, the emphases on the importances of partnership and collaboration were the brightest highlights of August’s Software Preservation Network (SPN) Forum at Georgia State University. The event’s theme, “Action Research: Empowering the Cultural Heritage Community and Mapping Out Next Steps for Software Preservation,” permeated early panels, presentations, and brainstorming exercises, empowering as they did the attending stewards of cultural heritage and technology to advocate the next steps most critical to their own goals in order to build the most broadly representative community. After considering surveys of collection and preservation practices, and case studies evocative of their legal and procedural challenges, attendees collaboratively summarized the specific obstacles to be overcome, strategies worth pursuing together, and goals that represent success. Four stewards guided us through this task with the day’s final panel of case studies, ideas, and a participatory exercise. Under the deceptively simple title of “Partnerships,” this group grounded its discourse in practical cases and progressively widened its circle to encompass the variously missioned parties needed to make software preservation a reality at scale.

Tim Walsh (@bitarchivist), Digital Archivist at the Canadian Centre for Architecture (CCA), introduced the origins of his museum’s software preservation mission in its research program Archaeology of the Digital. Advancing one of the day’s key motifs–of software as environment beyond mere artifact–Walsh explained that the CCA’s ongoing mission to preserve tools of the design trades compels it to preserve whole systems environments in order to provide researcher access to obsolete computer-assisted design (CAD) programs and their files. “There are no valid migration pathways,” he assured us; rather emulation is necessary to sustain access even when it is limited to the reading room. Attaining even that level of accessibility required CCA to reach license agreements with the creators/owners of legacy software, one of the first, most foundational partnerships that any stewarding organization must consider. To grow further still, these partnerships will need to include technical specialists and resource providers beyond CCA’s limited archives and IT staff.

Aliza Leventhal (@alizaleventhal), Corporate Librarian/Archivist at Sasaki Associates, confronts these challenges in her role within a multi-disciplinary design practice, where unencumbered access to the products of at least 14 different CAD programs is a regular need. To meet that need she has similarly reached out to software proprietors, but likewise cultivated an expanding community of stewards in the form of the SAA Architectural Records Roundtable’s CAD/BIM Taskforce. The Taskforce embraces a clearinghouse role for resources “that address the legal, technical and curatorial complexities” of preserving especially environmentally-dependent collections in repositories like her own and Walsh’s. In order to do so, however, Leventhal reminded us that more definitive standards for the actual artifacts, environments, and documentation that we seek to preserve must first be established by independent and (inter-)national authorities like International Organization for Standardization (ISO), the American Institute of Architects (AIA), the National Institute of Building Sciences, and yet unfounded organizations in the design arts realm. Among other things, after all, more technical alignment in this regard could enable multi-institutional repositories to distribute and share acquisition, storage, and access resources and expertise.

Nicholas Taylor (@nullhandle), Web Archiving Service Manager at Stanford University Libraries, asked attendees to imagine a future SPN serving such a role itself–as a multi-institutional service partnership that distributes legal, technical, and curatorial repository management responsibilities in the model of the LOCKSS Program. Citing the CLOCKSS Archive and other private networks as a complementary example from the realms of digital images, government documents, and scholarly publications, Taylor posited that such a partnership would empower participants to act independently as centralizing service nodes, and together in overarching governance. A community-governed partnership would need to meet functional technical requirements for preservation, speak to representative use cases, and, critically, articulate a sustainable business model in order to engender buy-in. If successful though, it could among other things consolidate the broader field’s needs to for licensing and IP agreements like CCA’s.

In addition to meeting its member organizations’ needs, this version of SPN, or a partnership like it, could benefit an even wider international community. Ryder Kouba (@rsko83), Digital Collections Archivist at the American University in Cairo, spoke to this potential from his perspective on the Technology and Research Working Group of UNESCO’s PERSIST Project. The project has already produced guidance on selecting digital materials for preservation among UNESCO’s 200+ member states. Its longer term ambitions, however, include the maintenance of the virtual environments in which members’ legacy software can be preserved and accessed. Defining the functional requirements and features of such a global resource will take the sustained and detailed input of a similarly globally-spanning community, beginning in the room in which the SPN Forum took place, but continuing on to the International Conference on Digital Preservation (iPres) and international convocations beyond.

blumenthal_spn_ersblog_1 blumenthal_spn_ersblog_2

 

 

 

 

 

 

Attendees compose matrices of software preservation needs, challenges, strategies, and outcomes. Photos by Karl-Rainer Blumenthal (left) and @karirene69 (right), CC BY-NC 2.0.

The different scales of partnership thus articulated, the panelists ended their session by facilitating breakout groups in the mapping of discrete problems that partnerships can solve through their necessary steps and towards ideal outcomes. At my table, for instance, the issue of “orphaned” software–software without advocates for long-term preservation–was projected through consolidation in a kind of PRONOM-like registry to get the maintenance that they deserve from partners invested in a LOCKSS-like network. Conceptually simple as each suggestion could be, it could also prompt such different valuations and/or reservations from among just the people in the room as to illustrate how difficult the prioritization of software preservation work can be for a team of partners, rather than independent actors. To accomplish the Forum attendees’ goals equitably as well as efficiently, more consensus needed to be reached concerning the timeline of next steps and meaningful benchmarks, something that we tackled in a final brainstorming session that Susan Malsbury will describe next!

____

Karl-Rainer Blumenthal is a Web Archivist for the Internet Archive’s Archive-It service, where he works with 450+ partner institutions to preserve and share web heritage. Karl seeks to steward collaboration among diversely missioned and resourced cultural heritage organizations through his professional work and research, as we continuously seek new, broadly accessible solutions to the challenges of complex media preservation.

Indiana Archives and Records Administration’s Accession Profile Use in Bagger

By Tibaut Houzanme and John Scancella

This post is the seventh in our Spring 2016 series on processing digital materials. This quick report for the practitioner drew from the “Bagger’s Enhancements for Digital Accessions” post prepared for the Library of Congress’ blog The Signal.

———

Context

In the past, the Indiana Archives and Records Administration (IARA) would simply receive, hash and place digital accessions in storage, with the metadata keyed into a separate Microsoft Access Database. Currently, IARA is automating many of its records processes with the APPX-based Archival Enterprise Management system (AXAEM). When the implementation concludes, this open source, integrated records management and digital preservation system will become the main accessioning tool. For now, and for accessions outside AXAEM’s reach, IARA uses Bagger.  Both AXAEM and Bagger comply with the BagIt packaging standard: accessions captured with Bagger can later be readily ingested by AXAEM. IARA anticipates time gains and record/metadata silos reduction.

Initial Project Scope

IARA aims to capture required metadata for each accession in a consistent manner. Bagger allows this to be done through a standard profile. IARA developed a profile inspired by the fields and drop-down menus on its State Form (SF 48883). When that profile was initially implemented, Bagger scrambled the metadata fields order and the accession was not easily understood. John Scancella, the lead Bagger developer at the Library of Congress implemented a change that makes Bagger now keep the metadata sequence as originally intended in the profile. IARA then added additional metadata fields for preservation decisions.

Scope Expansion and Metadata Fields

With  colleagues’ feedback, it appeared IARA’s profile could be useful to other institutions. A generic version of the profile was then created, that uses more generic terms and made all the metadata fields optional. This way, each institution can decide which fields it would enforce the use of. This makes the generic profile useful to most digital records project and collecting institutions.

The two profiles display similar  metadata fields for context (provenance, records series), identity, integrity, physical, logical, inventory, administrative, digital originality, storage media or carriers types, appraisal and classification values, format openness and curation lifecycle information for each accession. Together with the hash values and files size that Bagger collects, this provides a framework to more effectively help evaluate, manage and preserve long term digital records.

Below are the profile fields:

Houzanme_IARAAccessionProfileUseinBagger_ERSblog_1
Figure 1: IARA Profile with Sample Accession Screen (1 of 2)

 

Houzanme_IARAAccessionProfileUseinBagger_ERSblog_2
Figure 2: IARA Profile with Sample Accession Screen (2 of 2)

 

The fictitious metadata  values in the figures above are for demonstration purposes and include hash value and size in the corresponding text file below:

Houzanme_IARAAccessionProfileUseinBagger_ERSblogs_3
Figure 3: Metadata Fields and Values in the bag-info.txt File after Bag Creation

This test accession used  random files accessible from the Digital Corpora and Open Preservation websites.

Adopting or Adapting Profiles

To use the IARA’s profile, its generic version or any other profile in Bagger, download the latest version (as of this writing 2.5.0). To start an accession, select the appropriate profile from the dropdown list. This will populate the screen with the profile-specific metadata fields. Select objects, enter values, save your bag.

For detailed instructions on how to edit metadata fields and obligation level, create  a new or change an existing profile to meet your project/institution’s requirements, please refer to the Bagger User Guide in the “doc” folder inside your downloaded Bagger.zip file.

To comment on IARA’s profiles, email erecords[at]iara[dot]in[dot]gov. For Bagger issues, open a GitHub ticket. For technical information on Bagger and these profiles, please refer to the LOC’s Blog.

———

Tibaut Houzanme is Digital Archivist with the Indiana Archives and Records Administration. John Scancella is Information Technology Specialist with the Library of Congress.

Processing Digital Research Data

By Elise Dunham

This is the sixth post in our Spring 2016 series on processing digital materials.

———

The University of Illinois at Urbana-Champaign’s (Illinois) library-based Research Data Service (RDS) will be launching an institutional data repository, the Illinois Data Bank (IDB), in May 2016. The IDB will provide University of Illinois researchers with a repository for research data that will facilitate data sharing and ensure reliable stewardship of published data. The IDB is a web application that transfers deposited datasets into Medusa, the University Library’s digital preservation service for the long-term retention and accessibility of its digital collections. Content is ingested into Medusa via the IDB’s unmediated self-deposit process.

As we conceived of and developed our dataset curation workflow for digital datasets ingested in the IDB, we turned to archivists in the University Archives to gain an understanding of their approach to processing digital materials. [Note: I am not specifying whether data deposited in the IDB is “born digital” or “digitized” because, from an implementation perspective, both types of material can be deposited via the self-deposit system in the IDB. We are not currently offering research data digitization services in the RDS.] There were a few reasons for consulting with the archivists: 1) Archivists have deep, real-world curation expertise and we anticipate that many of the challenges we face with data will have solutions whose foundations were developed by archivists and 2) If, through discussing processes, we found areas where the RDS and Archives have converging preservation or curation needs, we could communicate these to the Preservation Services Unit, who develops and manages Medusa, and 3) I’m an archivist by training and I jump on any opportunity to talk with archivists about archives!

Even though the RDS and the University Archives share a central goal–to preserve and make accessible the digital objects that we steward–we learned that there are some operational and policy differences between our approaches to digital stewardship that necessitate points of variance in our processing/curation workflow:

Appraisal and Selection

In my view, appraisal and selection are fundamental to the archives practice. The archives field has developed a rich theoretical foundation when it comes to appraisal and selection, and without these functions the archives endeavor would be wholly unsustainable. Appraisal and selection ideally tend to occur in the very early stages of the archival processing workflow. The IDB curation workflow will differ significantly–by and large, appraisal and selection procedures will not take place until at least five years after a dataset is published in the IDB–making our appraisal process more akin to that of an archives that chooses to appraise records after accessioning or even during the processing of materials for long-term storage. Our different approaches to appraisal and selection speak to the different functions the RDS and the University Archives fulfill within the Library and the University.

The University Archives is mandated to preserve University records in perpetuity by the General Rules of the University, the Illinois State Records Act. The RDS’s initiating goal, in contrast, is to provide a mechanism for Illinois researchers to be compliant with funder and/or journal requirements to make results of research publicly available. Here, there is no mandate for the IDB to accept solely what data is deemed to have “enduring value” and, in fact, the research data curation field is so new that we do not yet have a community-endorsed sense of what “enduring value” means for research data. Standards regarding the enduring value of research data may evolve over the long-term in response to discipline-specific circumstances.

To support researchers’ needs and/or desires to share their data in a simple and straightforward way, the IDB ingest process is largely unmediated. Depositing privileges are open to all campus affiliates who have the appropriate University log-in credentials (e.g., faculty, graduate students, and staff), and deposited files are ingested into Medusa immediately upon deposit. RDS curators will do a cursory check of deposits, as doing so remains scalable (see workflow chart below), and the IDB reserves the right to suppress access to deposits for a “compelling reason” (e.g., failure to meet criteria for depositing as outlined in the IDB Accession Policy, violations of publisher policy, etc.). Aside from cases that we assume will be rare, the files as deposited into the IDB, unappraised, are the files that are preserved and made accessible in the IDB.

Preservation Commitment

A striking policy difference between the RDS and the University Archives is that the RDS makes a commitment to preserving and facilitating access to datasets for a minimum of five years after the date of publication in the Illinois Data Bank.

The University Archives, of course, makes a long-term commitment to preserving and making accessible records of the University. I have to say, when I learned that the five-year minimum commitment was the plan for the IDB, I was shocked and a bit dismayed! But after reflecting on the fact that files deposited in the IDB undergo no formal appraisal process at ingest, the concept began to feel more comfortable and reasonable. At a time when terabytes of data are created, oftentimes for single projects, and budgets are a universal concern, there are logistical storage issues to contend with. Now, I fully believe that for us to ensure that we are able to 1) meet current, short-term data sharing needs on our campus and 2) fulfill our commitment to stewarding research data in an effective and scalable manner over time, we have to make a circumspect minimum commitment and establish policies and procedures that enable us to assess the long-term viability of a dataset deposited into the IDB after five years.

The RDS has collaborated with archives and preservation experts at Illinois and, basing our work in archival appraisal theory, have developed guidelines and processes for reviewing published datasets after their five-year commitment ends to determine whether to retain, deaccession, or dedicate more stewardship resources to datasets. Enacting a systematic approach to appraising the long-term value of research data will enable us to allot resources to datasets in a way that is proportional to the datasets’ value to research communities and its preservation viability.

Convergences

To show that we’re not all that different after all, I’ll briefly mention a few areas where the University Archives and the RDS are taking similar approaches or facing similar challenges:

  • We are both taking an MPLP-style approach to file conversion. In order to get preservation control of digital content, at minimum, checksums are established for all accessioned files. As a general rule, if the file can be opened using modern technology, file conversion will not be pursued as an immediate preservation action. Establishing strategies and policies for managing a variety of file formats at scale is an area that will be evolving at Illinois through collaboration of the University Archives, the RDS, and the Preservation Services Unit.
  • Accruals present metadata challenges. How do we establish clear accrual relationships in our metadata when a dataset or a records series is updated annually? Are there ways to automate processes to support management of accruals?
  • Both units do as much as they can to get contextual information about the material being accessioned from the creator, and metadata is enhanced as possible throughout curation/processing.
  • The University Archives and the RDS control materials in aggregation, with the University Archives managing at the archival collection level and the RDS managing digital objects at the dataset level.
  • More? Certainly! For both the research data curation community and the archives community, continually adopting pragmatic strategies to manage the information created by humans (and machines!) is paramount, and we will continue to learn from one another.

Research Data Alliance Interest Group

If you’re interested in further exploring the areas where the principles and practices in archives and research data curation overlap and where they diverge, join the Research Data Alliance (RDA) Archives and Records Professionals for Research Data Interest Group. You’ll need to register with the RDA, (which is free!), and subscribe to the group. If you have any questions, feel free to get in touch!

IDB Curation Workflow

The following represents our planned functional workflow for handling dataset deposits in the Illinois Data Bank:

Dunham_ProcessingDigitalReserachData_PublishedDepositScan_ERSblog_1
Workflow graphic created by Elizabeth Wickes. Click on the image to view it in greater detail.

Learn More

To learn more about the IDB policies and procedures discussed in this post, keep an eye on the Illinois Data Bank website after it launches next month. Of particular interest on the Policies page will be the Accession Policy and the Preservation Review, Retention, Deaccession, Revision, and Withdrawal Procedure document.

Acknowledgements

Bethany Anderson and Chris Prom of the University of Illinois Archives

The rest of the Research Data Preservation Review Policy/Procedures team: Bethany Anderson, Susan Braxton, Heidi Imker, and Kyle Rimkus

The rest of the RDS team: Qian Zhang, Elizabeth Wickes, Colleen Fallaw, and Heidi Imker

———

Dunham_ProcessingDigitalReserachData_PublishedDepositScan_ERSblog_2Elise Dunham is a Data Curation Specialist for the Research Data Service at the University of Illinois at Urbana-Champaign. She holds an MLS from the Simmons College Graduate School of Library and Information Science where she specialized in archives and metadata. She contributes to the development of the Illinois Data Bank in areas of metadata management, repository policy, and workflow development. Currently she co-chairs the Research Data Alliance Archives and Records Professionals for Research Data Interest Group and is leading the DACS workshop revision working group of the Society of American Archivists Technical Subcommittee for Describing Archives: A Content Standard.

Keeping Track of Time with Data Accessioner

By Kevin Dyke

This post is the fourth in our Spring 2016 series on processing digital materials.

———

When it comes to working to process large sets of electronic records, it’s all too easy to get so wrapped up in the task at hand that when you finally come up for air you look at the clock and think to yourself, “Where did the time go? How long was I gone?” Okay, that may sound rather apocalyptic, but tracking time spent is an important yet easily elided step in electronic records processing.

At the University of Minnesota Libraries, the members of the Electronic Records Task Force are charged with developing workflows and making estimates for future capacity and personnel needs. In an era of very tight budgets, making a strong, well-documented case for additional personnel and resources is critical. To that end, we’ve made some efforts to more systematically track our time as we pilot our workflows.

Chief among those efforts has been a customization of the Data Accessioner tool. Originally written for internal use at the David M. Rubenstein Rare Book & Manuscript Library at Duke University, the project has since become open source, with support for recent releases coming from the POWRR Project. Written in Java and utilizing the common logging library log4j, Data Accessioner is structured in a way that made it possible for someone like me (familiar with programming, but not much experience with Java) to enhance the time logging functionality.  As we know some accession tasks take a few minutes, others can run for many hours (if not days). Enhancing the logging functionality of Data Accessioner allows staff to accurately see how long any data transfer takes, without needing to be physically present. The additional functionality was in itself pretty minor: log the time and folder name before starting accessioning of a folder and upon completion. The most complex part of this process was not writing the additional code, but rather modifying the log4j configuration. Luckily, with an existing configuration file, solid documentation, and countless examples in the wild, I was able to produce a version of Data Accessioner that outputs a daily log as a plain text file, which makes time tracking accessioning jobs much easier. You can see more description of the changes I made and the log output formatting on GitHub. You can download a ZIP file with the application with this addition from that page as well, or use this download link.

Screenshots and a sample log file:

Main Data Accessioner Folder
Main Data Accessioner folder
Contents of Log Folder
Contents of log folder
Sample of the beginning and ending of log file showing the start time and end times for file migration
Sample of the beginning and ending of log file showing the start time and end times for file migration

With this change, we are now able to better estimate the time it takes to use Data Accessioner.  Do the tools you use keep track of the time it takes to run?  If not, how are you doing this?  Questions or comments can be sent to lib-ertf [at] umn [dot] edu.

———

Kevin DykeKevin Dyke is the spatial data analyst/curator at the University of Minnesota’s John R. Borchert Map Library. He’s a member of the University of Minnesota Libraries’ Electronic Records Task Force, works as a data curator for the Data Repository for the University of Minnesota (DRUM), and is also part of the Committee on Institutional Cooperation’s (CIC) Geospatial Data Discovery Project. He received a Masters degree in Geography from the University of Minnesota and can be reached at dykex005 [at] umn.edu.