Playful Work: Media Carriers and Computers

By Tracy Popp

___

This is the second post in the bloggERS! series Digital Archives Pathways, where archivists discuss the non-traditional, accidental, idiosyncratic, or unique paths they took to become a digital archivist.

Geek and Poke Cartoon, "How to Save your Digital Work for Posterity? Alternative 1: Put it on a Disc"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Although I am not by title a Digital Archivist, I work very closely with our University Archives and other special collections units to make born-digital content accessible and available for processing. So, how did I get to be the first Digital Preservation Coordinator at the University of Illinois Urbana-Champaign and what does that mean? Let me illuminate that through interpretive dance…no, wait. I’ll just tell you via the parts I *think* have contributed to my ending up here. Dance isn’t something I picked up along the way.

I’ve had a fascination with media carriers and digital stuff since childhood. When I was elementary school age, I recall spending time in the Southfield, MI Public Library loading up the microfilm machine to scroll through various newspapers committed to the reel. Was I engaged in some sort of deep historical research as a seven-year-old that required I review these polyester rolls for pertinent info? Nope. I seem to recall the process of loading the machine and staring at an illuminated screen while I scrolled through words and pictures engaging in and of itself. Little did I know how much of that I’d be doing later in life…

Through a varied avenue I found myself moving toward a career path in libraries and archives – one that I had not previously considered. I have a BFA in Photography and Intermedia, which, at the time, was the term used for making digital artwork. Concurrently, I picked up a Computer Information Systems minor after finding that building, breaking and rebuilding computer systems in my spare time also proved an engaging way to support myself.

By working on a visual resources project for an Art History professor where I converted slides and cleaned up images in PhotoShop, to a visit to the Conservation Lab at the Eastman House in Rochester, NY and via other library-related activities, I found my way to graduate school at GSLIS (now the iSchool) at the University of Illinois Urbana-Champaign. There, I had graduate assistant experiences in audiovisual media and visual resources and worked on a pilot project to recover content from legacy born-digital media. My understanding of computer storage media as well as familiarity with a range of operating systems and types of digital content served as a bedrock on this project. I also had the opportunity to build a digital forensics computing workstation and amaze colleagues with the ability to raise files from the dead with my magical powers. My present position reflects this culmination of education and desire to explore and apply a variety of experiences.

As the digital archives landscape is continually evolving, keeping up with professional organizations and meetings is incredibly important. Notably, I completed a Digital Archives Specialist certificate through the Society of American Archivists and recently attended the born-digital archives exchange at Stanford which was an excellent opportunity to meet with colleagues engaged in digital archives. A range of online resources are helpful too, such as the BloggERS! blog, the BitCurator Google group and myriad tech forums dedicated to solving hardware and software challenges.

Through experience I’ve learned to not be timid about thoroughly investigating hardware and software – modern computer systems aren’t as fragile as one may think – although static electricity can shut things down pretty quickly, so ground yourself. Hands on work is essential to understanding and continued learning. Presently, I’m deep into “breaking” a Linux system which has motivated me to learn command line tools for filtering, scripting and system administration. I’ve also lost personal data and learned the hard way about working with copies, making backups and the fallibility of computer media. So, before you experiment with content make sure it’s not the only copy, of course.   🙂

___

Tracy Popp serves as Digital Preservation Coordinator at the University of Illinois Urbana-Champaign. As part of her duties she manages the Born-Digital Reformatting Lab and works closely with Library and Archives colleagues to manage and preserve digital collections.

Digital Archivist in Disguise

By Amber D’Ambrosio

___

This is the first post in the bloggERS! series Digital Archives Pathways, where archivists discuss the non-traditional, accidental, idiosyncratic, or unique paths they took to become a digital archivist.

 

memebetter.com-20170622094502

This is the warning I’ve received at every conference and workshop since I started graduate school coursework in archives. When I applied for my current position as Processing Archivist & Records Manager, I knew that digital archiving was involved at some level because the job responsibilities included archiving the university’s website. There was also some discussion of digital archiving during the in-person interview, which made me wary.

memebetter.com-20170622094558

Prior to this position my experience with digital archiving consisted of a brief introduction to the home-grown system used by the Utah State Archives and some basic information about checksums and multiple copies in multiple locations. My previous position was at a small state university without the infrastructure, funding, or staffing to undertake any kind of digital archiving beyond saving digitized material in multiple places with occasional validation checks by the systems librarian. The closest I came to digital archiving was downloading important records off of the university website to the backed up shared drive used by the library.

I’m still within the first five years of my career as a librarian/archivist, and I remember my graduate program offered a single course on digital records management. I didn’t take it because I didn’t necessarily want to be a records manager, and I wasn’t terribly interested in digital archiving.  As an English major, I assumed that I didn’t have the technical knowledge base to make it a viable option anyway.

memebetter.com-20170622101037But here I am. Undercover digital archivist. I’m a digital archivist by necessity because the archives and records I process and manage as part of my job sometimes show up on hard drives and legacy media. I’m also responsible for archiving the website. How did I do it? How did I go from some vague idea of checksums and LOCKSS to undercover digital archivist? I read. A lot. Fortunately, my institution invested in Archive-It for archiving the website and ArchivesDirect (hosted Archivematica) for managing the bulk of the digital preservation activities. I read all of their documentation. I started reading bloggERS! and about the Bentley Historical Library’s Mellon-funded ArchivesSpace-Archivematica-DSpace Workflow Integration project. My predecessor created a preliminary workflow and processing manual based on the early attempt to self-host Archivematica, so I read that and tried to understand it all.

I started attending the Society of American Archivists’ Digital Archivist Specialist certificate courses being offered in this region. I talked to our systems team. I read some more. I looked up terminology on Wikipedia. I took more DAS courses, some of which were more helpful than others. I figured out the gaps in the workflow.

Do I feel like a digital archivist after all of that? Not really. I still feel like something of an imposter.memebetter.com-20170622095504

memebetter.com-20170622102324After all, I don’t get to do much digital archiving in the grand scheme of my job. It’s challenging to find time to focus on processing the digital material through our workflow because it is time consuming. For all that we have ArchivesDirect, there’s proper stewardship to consider prior to ingest into Archivematica. I have gradually added steps into the workflow, including verifying fixity when copying from media to our digital processing drive and when copying from that drive to the secure file transfer protocol provided by ArchivesDirect. There are also the inevitable technical hiccups that happen whenever systems are involved. Human errors play a role as well, like that time someone sent me a duplicate of their entire hard drive before they left their job with no warning or explanation of its contents.

memebetter.com-20170622112610

What have I learned? I can be a digital archivist if I have to be, and command line isn’t as intimidating as it always seemed. I learned the basics of the command line interface from our digital asset management librarian combined with the Internet and trial and error. I wouldn’t claim to have even intermediate knowledge of command line, but not being intimidated by it makes digital archiving much easier. Being a digital archivist seems to be mostly a willingness and ability to constantly reassess, learn, adapt, and try something else.

memebetter.com-20170622101204

___

amber_cornwall

Amber D’Ambrosio is Processing Archivist & Records Manager at Willamette University, a small, urban liberal arts college in Salem, Oregon, where she manages the collections and wrangles ArchivesSpace and Archivematica. In her spare time she writes, reads about early modern London, hikes, travels, and obsessively visits the Oregon Coast.

 

Inaugural #bdaccess Bootcamp: A Success Story

By Margaret Peachy

This post is the nineteenth in a bloggERS series about access to born-digital materials.

____

At this year’s New England Archivists Spring Meeting, archivists who work with born-digital materials had the opportunity to attend the inaugural Born-Digital Access Bootcamp. The bootcamp was an idea generated at the born-digital hackfest, part of a session at SAA 2015, where a group of about 50 archivists came together to tackle the problem facing most archival repositories: How do we provide access to born-digital records, which can have different technical and ethical requirements than digitized materials?  Since 2015, a team has come together to form a bootcamp curriculum, reach out to organizations outside of SAA, and organize bootcamps at various conferences.

Excerpt of results from a survey administered in advance of the Bootcamp.

Alison Clemens and Jessica Farrell facilitated the day-long camp, which had about 30 people in attendance from institutions of all sizes and types, though the majority were academic. The attendees also brought a broad range of experience to the camp, from those just starting out thinking about this issue, to those who have implemented access solutions.

Continue reading

A Case Study in Failure (and Triumph!) from the Records Management Perspective

By Sarah Dushkin

____

This is the sixth post in the bloggERS series #digitalarchivesfail: A Celebration of Failure.

I’m the Records Coordinator for a global energy engineering, procurement, and construction  contractor, herein referred to as the “Company.” The Company does design, fabrication, installation, and commissioning of upstream and downstream technologies for operators. I manage the program for our hard copy and electronic records produced from our Houston office.

A few years ago our Records Management team was asked by the IT department to help create a process to archive digital records of closed projects created out of the Houston office. I saw the effort as an opportunity to expand the scope and authority of our records program to include digital records. Up to this point, our practice only covered paper records, and we asked employees to apply the paper record policies to their own electronic records.

The Records Management team’s role was limited to providing IT with advice on how to deploy a software tool where files could be stored for a long-term period. We were not included in the discussions on which software tool to use. It took us over a year to develop the new process with IT and standardize it into a published procedure. We had many areas of triumph and failure throughout the process. Here is a synopsis of the project.

Objective:
IT was told that retaining closed projects files on the local server was an unnecessary cost and was tasked with removing them. IT reached out to Records Management to develop a process to maintain the project files for the long-term in a more cost-effective solution that was nearline or offline, where records management policies could be applied.

Vault:
The software chosen was a proprietary cloud-based file storage center or “vault.” It has search, tagging, and records disposition capabilities. It is more cost-effective than storing files on the local server.

Process:
At 80% project completion, Records Management reaches out to active projects to discover their methods for storing files and the project completion schedule. 80% engineering completion is an important timeline for projects because most of the project team is still involved and the bulk of the work is complete. Records Management also gains knowledge of the project schedule so we can accurately apply the two-year timespan to when the files will be migrated off the local server and to the vault.  The two-year time span was created to ensure that all project files would be available to the project team during the typical warranty period. Two years after a project is closed, all technical files and data are exported from the current management system and ingested into the vault, and access groups are created so employees can view and download the files for reference as needed.

Deployment:
Last year, we began to apply the process to large active projects that had passed 80% engineering completion. Large projects are those that have greater than 5 million in revenue.

Observations:
Recently we have begun to audit the whole project with IT, and are just now identifying our areas of failure and triumph. We will conduct an analysis of these areas and assess where we can make improvements.

Our big areas of failure were related to stakeholder involvement in the development, deployment, and utilization of the vault.

Stakeholders, including the Records Management team, were not involved in the selection or development of the vault software tool. As a result, the vault development project lacked the resources required to make it as successful as possible.

In the deployment of the vault, we did not create an outreach campaign with training courses that would introduce the tool across our very large company. Due to this, many employees are still unaware of the vault. When we talk with departments and projects about methods to save old files for less money they are reluctant to try the solution because it seems like another way for IT to save money from their budget without thinking about the greater needs of the company. IT is still viewed as a support function that is inessential to the Company’s philosophy.

Lastly, we did not have methods to export project files from all systems for ingest into the vault; nor did we, in North America, have the authority to develop that solution. To be effective, that type of decision and process can only be developed by our corporate office in another country. The Company also does not make information about project closure available to most employees. A project end date can be determined by several factors, including when the final invoice was received or the end of the warranty period. This type of information is essential to the information lifecycle of a project, and since we had no involvement from upper level management, we were not able to devise a solution for easily discovering this information.

We had some triumphs throughout the process, though. Our biggest triumph is that this project gave Records Management an opportunity to showcase our knowledge of records retention and its value as a method to save money and maintain business continuity. We were able to collaborate with IT and promulgate a process. It gave us a great opportunity to grow by harnessing better relationships with the business lines. Although some departments and teams are still skeptical about the value of the vault, when we advertise it to other project teams, they see the vault as evidence that the Company cares about preserving their work. We earned our seat at the table with these players, but we still have to work on winning over more projects and departments. We’ve also preserved more than 30 TB of records and saved the Company several thousands of dollars by ingesting inactive project files into the vault.

I am optimistic that when we have support from upper management, we will be able to improve the vault process and infrastructure, and create an effective solution for utilizing records management policies to ensure legal compliance, maintain business continuity, and save money.

____

Sarah Dushkin earned her MSIS from the University of Texas at Austin School of Information with a focus in Archival Enterprise and Records Management. Afterwards, she sought to diversify her expertise by working outside of the traditional archival setting and moved to Houston to work in oil and gas. She has collaborated with management from across her company to refine their records management program and develop a process that includes the retention of electronic records and data. She lives in Sugar Land, Texas with her husband.

Call for Contributors – Digital Archives Pathways Series

Archivists by their very nature are jacks of all trades, and the same goes for those who work with digital collection materials. Archives programs and iSchools are increasingly offering coursework in digital archives theory and practice, but not all digital archivists got their chops through academic channels, and for many archivists, digital only describes part of their responsibilities.

While all archivists must determine their own path for professional growth, the field of digital archives is also uniquely challenging. Preparation and training for this work require dedication, creativity, and engagement. Processing, preserving, and providing access to digital materials, and expertise in specialized content such as legacy media and web archiving are ever-expanding challenges.

In the Digital Archives Pathways series, we are looking for stories about the non-traditional, accidental, idiosyncratic, or unique path you took to become a digital archivist, however you define that in your work. What do you consider essential to your training, and what do you wish had been a larger part of it? How might your journey towards digital archives work be characterized as non-traditional? How do you plan on continuing your education in digital archives?

Writing for bloggERS! Digital Archives Pathways Series:

  • We encourage visual representations: Posts can include or consist of comics, flowcharts, a series of memes, etc!
  • Written content should be 200-600 words in length
  • Write posts for a wide audience: anyone who stewards, studies, or has an interest in digital archives and electronic records, both within and beyond SAA
  • Align with other editorial guidelines as outlined in the bloggERS! guidelines for writers.

Posts for this series will start in July, so let us know ASAP if you are interested in contributing by sending an email to ers.mailer.blog@gmail.com!

Fail4Lib: Acknowledging and Embracing Professional Failure

By Andreas Orphanides

____

This is the fifth post in the bloggERS series #digitalarchivesfail: A Celebration of Failure.

trainwreck
It could be worse.
Image title: Train wreck at Montparnasse
Credit: Studio Lévy et Fils, 1895
Copyright: Public domain

When was the last time you totally, completely, utterly loused up a project or a report or some other task in your professional life? When was the last time you dissected that failure, in meticulous detail, in front of a room full of colleagues? Let’s face it: we’ve all had the first experience, and I’d wager that most of us would pay good money to avoid the second.

It’s a given that we’ll all encounter failure professionally, but there’s a strong cultural disincentive to talk about it. Failure is bad. It is to be avoided at all costs. And should one fail, that failure should be buried away in a dark closet with one’s other skeletons. At the same time, it’s well acknowledged that failure is a critical step on the path to success. It’s only through failing and learning from that experience that we can make the necessary course corrections. In that sense, refusing to acknowledge or unpack failure is a disservice: failure is more valuable when well-understood than when ignored.

This philosophy — that we can gain value from failure by acknowledging and understanding it openly — is the underlying principle behind Fail4Lib, the perennial preconference workshop that takes place at the annual Code4Lib conference, and which completed its fifth iteration (Fail5Lib!) at Code4Lib 2017 in Los Angeles. Jason Casden (now of UNC Libraries) originally conceived of the Fail4Lib idea, and together he and I developed the concept into a workshop about understanding, analyzing, and coming to terms with professional failure in a safe, collegial environment.

Participants in a Fail4Lib workshop engage in a number of activities to foster a healthier relationship with failure: case study discussions to analyze high-profile failures such as the Challenger disaster and the Volkswagen diesel emissions scandal; lightning talks where brave souls share their own professional failures and talk about the lessons they learned; and an open bull session about risk, failure, and organizational culture, to brainstorm on how we can identify and manage failure, and how to encourage our organizations to become more failure-tolerant.

Fail4Lib’s goal is to help its participants to get better at failing. By practicing talking about and thinking about failure, we position ourselves to learn more from the failures of others as well as our own future failures. By sharing and talking through our failures we maximize the value of our experiences, we normalize the practice of openly acknowledging and discussing failure, and we reinforce the message to participants that it happens to all of us. And by brainstorming approaches to allow our institutions to be more failure-tolerant, we can begin making meaningful organizational change towards accepting failure as part of the development process.

The principles I’ve outlined here not only form the framework for the Fail4Lib workshop, they also represent a philosophy for engaging with professional failure in a constructive and blameless way. It’s only by normalizing the experience of failure that we can gain the most from it; in so doing, we make failure more productive, we accelerate our successes, and we make ourselves more resilient.

____

Andreas Orphanides is Associate Head, User Experience at the NCSU Libraries, where he develops user-focused solutions to support teaching, learning, and information discovery. He has facilitated Fail4Lib workshops at the annual Code4Lib conference since 2013. He holds a BA from Oberlin College and an MSLS from UNC-Chapel Hill.

Modeling archival problems in Computational Archival Science (CAS)

By Dr. Maria Esteva

____

It was Richard Marciano who almost two years ago convened a small multi-disciplinary group of researchers and professionals with experience using computational methods to solve archival problems, and encouraged us to define the work that we do under the label of Computational Archival Science (CAS.) The exercise proved very useful to communicate the concept to others, but also, for us to articulate how we think when we go about using computational methods to conduct our work. We introduced and refined the definition amongst a broader group of colleagues at the Finding New Knowledge: Archival Records in the Age of Big Data Symposium in April of 2016.

I would like to bring more archivists into the conversation by explaining how I combine archival and computational thinking.  But first, three notes to frame my approach to CAS: a) I learned to do this progressively over the course of many projects, b) I took graduate data analysis courses, and c) It takes a village. I started using data mining methods out of necessity and curiosity, frustrated with the practical limitations of manual methods to address electronic records. I had entered the field of archives because its theories, and the problems that they address are attractive to me, and when I started taking data analysis courses and developing my work, I saw how computational methods could help hypothesize and test archival theories. Coursework in data mining was key to learn methods that initially I understood as “statistics on steroids.” Now I can systematize the process, map it to different problems and inquiries, and suggest the methods that can be used to address them. Finally, my role as a CAS archivist is shaped through my ongoing collaboration with computer scientists and with domain scientists.

In a nutshell, the CAS process goes like this: we first define the problem at hand and identify key archival issues within. On this basis we develop a model, which is an abstraction  of the system that we are concerned with. The model can be a methodology or a workflow, and it may include policies, benchmarks, and deliverables. Then, an algorithm, which is a set of steps that are accomplished within a software and hardware environment, is designed to automate the model and solve the problem.

A project in which I collaborate with Dr. Weijia Xu, a computer scientist at the Texas Advanced Computing Center, and Dr. Scott Brandenberg, an engineering professor at UCLA illustrates a CAS case. To publish and archive large amounts of complex data from natural hazards engineering experiments, researchers would need to manually enter significant amounts of metadata, which has proven impractical and inconsistent. Instead, they need automated methods to organize and describe their data which may consist of reports, plans and drawings, data files and images among other document types. The archival challenge is to design such a method in a way that the scientific record of the experiments is accurately represented. For this, the model has to convey the dataset’s provenance and capture the right type of metadata. To build the model we asked the domain scientist to draw out a typical experiment steps, and to provide terms that characterize its conditions, tools, materials, and resultant data. Using this information we created a data model, which is a network of classes that represent the experiment process, and of metadata terms describing the process. The figures below are the workflow and corresponding data model for centrifuge experiments.

Figure 1. Workflow of a centrifuge experiment by Dr. Scott Brandenberg

 

Figure 2. Networked data model of the centrifuge experiment process by the archivist

Following, Dr. Weijia Xu created an algorithm that combines text mining methods to: a) identify the terms from the model that are present in data belonging to an experiment, b) extend the terms in the model to related ones present in the data, and c) based on the presence of all the terms, predict the classes to which data belongs to. Using this method, a dataset can be organized around classes/processes and steps, and corresponding metadata terms describe those classes.

In a CAS project, the archivist defines the problem and gathers the requirements that will shape the deliverables. He or she collaborates with the domain scientists to model the “problem” system, and with the computer scientist to design the algorithm. An interesting aspect is how the method is evaluated by all team members using data-driven and qualitative methods. Using the data model as the ground truth we assess if data gets correctly assigned to classes, and if the metadata terms correctly describe the content of the data files. At the same time, as new terms are found in the dataset and the data model gets refined, the domain scientist and the archivist review the accuracy of the resulting representation and the generalizability of the solution.

I look forward to hearing reactions to this work and about research perspectives and experiences from others in this space.

____
Dr. Maria Esteva is a researcher and data archivist/curator at the Texas Advanced Computing Center, at the University of Texas at Austin. She conducts research on, and implements large-scale archival processing and data curation systems using as a backdrop High Performance Computing infrastructure resources. Her email is: maria@tacc.utexas.edu

 

OSS4Pres 2.0: Developing functional requirements/features for digital preservation tools

By Heidi Elaine Kelly

____

This is the final post in the bloggERS series describing outcomes of the #OSS4Pres 2.0 workshop at iPRES 2016, addressing open source tool and software development for digital preservation. This post outlines the work of the group tasked with “ developing functional requirements/features for OSS tools the community would like to see built/developed (e.g. tools that could be used during ‘pre-ingest’ stage).” 

The Functional Requirements for New Tools and Features Group of the OSS4Pres workshop aimed to write user stories focused on new features that developers can build out to better support digital preservation and archives work. The group was largely comprised of practitioners who work with digital curation tools regularly, and was facilitated by Carl Wilson of the Open Preservation Foundation. While their work largely involved writing user stories for development, the group also came up with requirement lists for specific areas of tool development, outlined below. We hope that these lists help continue to bridge the gap between digital preservation professionals and open source developers by providing a deeper perspective of user needs.

Basic Requirements for Tools:

  • Mostly needed for Mac environment
  • No software installation on donor computer
  • No software dependencies requiring installation (e.g., Java)
  • Must be GUI-based, as most archivists are not skilled with the command line
  • Graceful failure

Descriptive Metadata Extraction Needs (using Apache Tika):

  • Archival date
  • Author
  • Authorship location
  • Subject location
  • Subject
  • Document type
  • Removal of spelling errors to improve extracted text

Technical Metadata Extraction Needs:

  • All datetime information available should be retained (minimum of LastModified Date)
  • Technical manifest report
  • File permissions and file ownership permissions
  • Information about the tool that generated the technical manifest report:
    • tool – name of the tool used to gather the disk image
    • tool version – the version of the tool
    • signature version – if the tool uses ‘signatures’ or other add-ons, e.g. which virus scanner software signature – such as signature release July 2014 or v84
    • datetime process run – the datetime information of when the process ran (usually tools will give you when the process was completed) – for each tool that you use

Data Transfer Tool Requirements:

  • Run from portable external device
  • Bag-It standard compliant (build into a “bag”)
  • Able to select a subset of data – not disk image the whole computer
  • GUI-based tool
  • Original file name (also retained in tech manifest)
  • Original file path (also retained in tech manifest)
  • Directory structure (also retained in tech manifest)
  • Address these issues in filenames (record the actual filename in the tech manifest): Diacritics (e.g. naïve ), Illegal characters ( \ / : * ? “ < > | ), Spaces, M-dashes, n-dashes, Missing file extensions, Excessively long file and folder names, etc
  • Possibly able to connect to “your” FTP site/cloud thingy and send the data there when ready for transfer

Checksum Verification Requirements:

  • File-by-file checksum hash generation
  • Ability to validate the contents of the transfer

Reporting Requirements:

  • Ability to highlight/report on possibly problematic files/folders in a separate file

Testing Requirements:

  • Access to a test corpora, with known issues, to test tool

Smart Selection & Appraisal Tool Requirements:

  • DRM/TPMs detection
  • Regular expressions/fuzzy logic for finding certain terms – e.g. phone numbers, security numbers, other predefined personal data
  • Blacklisting of files – configurable list of blacklist terms
  • Shortlisting a set of “questionable” files based on parameters that could then be flagged for a human to do further QA/QC

Specific Features Needed by the Community:

  • Gathering/generating quantitative metrics for web harvests
  • Mitigation strategies for FFMPEG obsolescence
  • TESSERACT language functionality

____

heidi-elaine-kellyHeidi Elaine Kelly is the Digital Preservation Librarian at Indiana University, where she is responsible for building out the infrastructure to support long-term sustainability of digital content. Previously she was a DiXiT fellow at Huygens ING and an NDSR fellow at the Library of Congress.

OSS4Pres 2.0: Sharing is Caring: Developing an online community space for sharing workflows

By Sam Meister

____

This is the third post in the bloggERS series describing outcomes of the #OSS4Pres 2.0 workshop at iPRES 2016, addressing open source tool and software development for digital preservation. This post outlines the work of the group tasked with “developing requirements for an online community space for sharing workflows, OSS tool integrations, and implementation experiences” See our other posts for information on the groups that focused on feature development and design requirements for FOSS tools.

Cultural heritage institutions, from small museums to large academic libraries, have made significant progress developing and implementing workflows to manage local digital curation and preservation activities. Many institutions are at different stages in the maturity of these workflows. Some are just getting started, and others have had established workflows for many years. Documentation assists institutions in representing current practices and functions as a benchmark for future organizational decision-making and improvements. Additionally, sharing documentation assists in creating cross-institutional understanding of digital curation and preservation activities and can facilitate collaborations amongst institutions around shared needs.

One of the most commonly voiced recommendations from iPRES 2015 OSS4PRES workshop attendees was the desire for a centralized location for technical and instructional documentation, end-to-end workflows, case studies, and other resources related to the installation, implementation, and use of OSS tools. This resource could serve as a hub that would enable practitioners to freely and openly exchange information, user requirements, and anecdotal accounts of OSS initiatives and implementations.

At the OSS4Pres 2.0 workshop, the group of folks looking at developing an online space for sharing workflows and implementation experience started by defining a simple goal and deliverable for the two hour session:

Develop a list of minimal levels of content that should be included in an open online community space for sharing workflows and other documentation

The group the began a discussion on developing this list of minimal levels by thinking about the potential value of user stories in informing these levels. We spent a bit of time proposing a short list of user stories, just enough to provide some insight into the basic structures that would be needed for sharing workflow documentation.

User stories

  • I am using tool 1 and tool 2 and want to know how others have joined them together into a workflow
  • I have a certain type of data to preserve and want to see what workflows other institutions have in place to preserve this data
  • There is a gap in my workflow — a function that we are not carrying out — and I want to see how others have filled this gap
  • I am starting from scratch and need to see some example workflows for inspiration
  • I would like to document my workflow and want to find out how to do this in a way that is useful for others
  • I would like to know why people are using particular tools – is there evidence that they tried another tool, for example, that wasn’t successful?

The group then proceeded to define a workflow object as a series of workflow steps with its own attributes, a visual representation, and organizational context:

Workflow step
Title / name
Description
Tools / resources
Position / role

Visual workflow diagrams / model
Organizational Context
            Institution type
            Content type

Next, we started to draft out the different elements that would be part of an initial minimal level for workflow objects:

Level 1:

Title
Description
Institution / organization type
Contact
Content type(s)
Status
Link to external resources
Download workflow diagram objects
Workflow concerns / reflections / gaps

After this effort the group focused on discussing next steps and how an online community space for sharing workflows could be realized. This discuss led towards pursuing the expansion of COPTR to support sharing of workflow documentation. We outlined a roadmap for next steps toward pursuing this goal:

  • Propose / approach COPTR steering group on adding workflows space to COPTR
  • Develop home page and workflow template
  • Add examples
  • Group review
  • Promote / launch
  • Evaluation

The group has continued this work post-workshop and has made good progress setting up a Community Owned Workflows section to COPTR and developing an initial workflow template. We are in the midst of creating and evaluating sample workflows to help with revising and tweaking as needed. Based on this process we hope to launch and start promoting this new online space for sharing workflows in the months ahead. So stay tuned!

____

meister_photoSam Meister is the Preservation Communities Manager, working with the MetaArchive Cooperative and BitCurator Consortium communities. Previously, he worked as Digital Archivist and Assistant Professor at the University of Montana. Sam holds a Master of Library and Information Science degree from San Jose State University and a B.A. in Visual Arts from the University of California San Diego. Sam is also an Instructor in the Library of Congress Digital Preservation Education and Outreach Program.