Embedded Memory

by Michelle Ganz

This is the third post in the BloggERS Embedded Series.

As an archivist in a small corporate repository for an architecture, design, and consulting firm I have a unique set of challenges and advantages. By being embedded with the creators as they are creating, I have the opportunity to ensure that archival standards are applied at the point of creation rather than after the collection has been transferred to a repository.

My environment is especially unique in the types of digital files I’m collecting. From the architecture side, William McDonough + Partners, I acquire architectural project files (CAD files), sketches, renderings, photographs and video of project phases, presentations, press, media files surrounding the project, and other associated materials. Many of these files are created on specialized, and expensive, software.

From McDonough Innovation, William McDonough’s sustainable development advising firm, I collect the world-changing ideas that Mr. McDonough is developing as they evolve. Mr. McDonough is a global thought leader who provides targeted ideas, product concepts, and solutions to a wide range of sustainable growth issues faced by corporate officers, senior executives, product designers, and project managers. He often works with CEOs to set the vision, and then with the management team to set goals and execute projects. His values-driven approach helps companies to embed sustainable growth principles into their corporate culture and to advance progress toward their positive vision. Archiving an idea is a multi-faceted endeavor. Materials can take the form of audio notes, sketches in a variety of programs, call or meeting recordings, and physical whiteboards. Since my role is embedded within the heart of Mr. McDonough’s enterprises, I ensure that all the right events are captured the right way, as they happen. I gather all the important contextual information and metadata about the event and the file. I can obtain permissions at the point of creation and coordinate directly with the people doing the original capture to ensure I get archival quality files.

Challenges in this environment are very different than what my academic counterparts face. In the academic world there was a chain of leadership that I could advocate to as needed. In my small corporate world there is no one to appeal to once my boss makes up their mind. Corporate interests are all focused on ROI (return on investment), and an archival department is a financial black hole; money is invested, but they will never see a financial return. This means that every new project must show ROI in more creative ways. I focus on how a project will free up other people to do more specialized tasks. But even this is often not enough, and I find myself advocating for things like file standards and server space. Many of the archival records are videos of speeches, events, meetings, or other activities and take up a huge amount of server space. A single month’s worth of archival files can be as large as 169 GB. In the academic setting where the archives is often a part of the library, the IT department is more prepared for the huge amounts of data that come with modern libraries; folding the archival storage needs into this existing digital preservation framework is often just a matter of resource allocation or funds.

Also, nearly every archival function that interacts with outside entities requires permissions these firms are not used to giving. Meetings can include people from 3 or 4 companies in 4 or 5 countries with a variety of NDAs in place with some, but not all, of the parties. In order to record a meeting I must obtain permission from every participant; this can be rather complicated and can create a lot of legal and privacy issues. A procedure was put in place to request permission to record when meetings are set up, as well as when meetings are confirmed. A spreadsheet was created to track all of the responses. For regular meeting participants annual permissions are obtained. This procedure, while effective, is time-consuming. For many meeting participants they are unfamiliar with what an archive is. There are many questions about how the information will be used, stored, disseminated, and accessed.  There are also a lot of questions around the final destination of the archive and what that means for their permissions. To help answer these questions I created fact sheets that explain what the archives are, how archival records are collected and used, deposit timelines, copyright basics, and links to more information. To further reassure participants, we give them the option of asking for a meeting to be deleted after the fact.

This is the server stack for the archives and the two firms. The archive uses 2 blades.
This hub connects the archive to workstations and handles the transfer of TBs of data.

Preservation and access are unique challenges, especially with the architecture files. Many of the project-related files are non-traditional file formats like .dwg, .skb, .indd, .bak, et al., and are created in programs like AutoCAD and SketchUp Pro. I work with the IT department to ensure that the proper backups are completed. We back up to a local server as well as one in the city, but offsite, and a third dark archive in California. I also perform regular checks to confirm the files can open. Due to the fact that projects are often reopened years later, it is impractical to convert the files to a more standardized format. To ensure some level of access without specialized terminals, final elements of the project are saved in a .pdf format. This includes final drawings/renderings and presentations.

Furthermore, I often find myself in the awkward position of arguing with records creators in favor of keeping files that they don’t want but I know have archival value. Without the benefit of patrons, and their potential needs, I am arguing for the future needs of the very people I am arguing with! Without a higher level of administration to appeal to, I am often left with no recourse but to do things that are not in the best interests of the collection. This leads to the unfortunate loss of materials but may not be as bad as it first appears. When considering how traditional archival collections are created and deposited, it is well within reason that these items would never have made it into the collection. I like to think that by being embedded in the creation process, I am able to save far more than would otherwise be deposited if the creators were left to make appraisal decisions on their own.


Michelle Ganz is the Archives Director at McDonough Innovation and the former archivist at Lincoln Memorial University. She received her MILS from the University of Arizona and a BA from the Ohio State University. In addition to her passion for all things archival Michelle loves to cook, read, and watch movies.

Advertisements

Latest #bdaccess Twitter Chat Recap

By Daniel Johnson and Seth Anderson

This post is the eighteenth in a bloggERS series about access to born-digital materials.

____

In preparation for the Born Digital Access Bootcamp: A Collaborative Learning Forum at the New England Archivists spring meeting, an ad-hoc born-digital access group with the Digital Library Federation recently held a set of #bdaccess Twitter chats. The discussions aimed to gain insight into issues that archives and library staff face when providing access to born-digital.

Here are a few ideas that were discussed during the two chats:

  • Backlogs, workflows, delivery mechanisms, lack of known standards, appraisal and familiarity with software were major barriers to providing access.
  • Participants were eager to learn more about new tools, existing functioning systems, providing access to restricted material and complicated objects, which institutions are already providing access to data, what researchers want/need, and if any user testing has been done.
  • Access is being prioritized by user demand, donor concerns, fragile formats and a general mandate that born-digital records are not preserved unless access is provided.
  • Very little user testing has been done.
  • A variety of archivists, IT staff and services librarians are needed to provide access.

You can search #bdaccess on Twitter to see how the conversation evolves or view the complete conversation from these chats on Storify.

The Twitter chats were organized by a group formed at the 2015 SAA annual meeting. Stay tuned for future chats and other ways to get involved!

____

Daniel Johnson is the digital preservation librarian at the University of Iowa, exploring, adapting, and implementing digital preservation policies and strategies for the long-term protection and access to digital materials.

Seth Anderson is the project manager of the MoMA Electronic Records Archive initiative, overseeing the implementation of policy, procedures, and tools for the management and preservation of the Museum of Modern Art’s born-digital records.

Digital Preservation, Eh?

by Alexandra Jokinen

This post is the third post in our series on international perspectives on digital preservation.

___

Hello / Bonjour!

Welcome to the Canadian edition of International Perspectives on Digital Preservation. My name is Alexandra Jokinen. I am the new(ish) Digital Archives Intern at Dalhousie University in Halifax, Nova Scotia. I work closely with the Digital Archivist, Creighton Barrett, to aid in the development of policies and procedures for some key aspects of the University Libraries’ digital archives program—acquisitions, appraisal, arrangement, description, and preservation.

One of the ways in which we are beginning to tackle this very large, very complex (but exciting!) endeavour is to execute digital preservation on a small scale, focusing on the processing of digital objects within a single collection, and then using those experiences to create documentation and workflows for different aspects of the digital archives program.

The collection chosen to be our guinea pig was a recent donation of work from esteemed Canadian ecologist and environmental scientist, Bill Freedman, who taught and conducted research at Dalhousie from 1979 to 2015. The fonds is a hybrid of analogue and digital materials dating from 1988 to 2015. Digital media carriers include: 1 computer system unit, 5 laptops, 2 external hard drives, 7 USB flash drives, 5 zip disks, 57 CDs, 6 DVDs, 67 5.25 inch floppy disks and 228 3.5 inch floppy disks. This is more digital material than the archives is likely to acquire in future accessions, but the Freedman collection acted as a good test case because it provided us with a comprehensive variety of digital formats to work with.

Our first area of focus was appraisal. For the analogue material in the collection, this process was pretty straightforward: conduct macro-appraisal and functional analysis by physically reviewing material. However, (as could be expected) appraisal of the digital material was much more difficult to complete. The archives recently purchased a forensic recovery of evidence device (FRED) but does not yet have all the necessary software and hardware to read the legacy formats in the collection (such as the floppy disks and zip disks), so, we started by investigating the external hard drives and USB flash drives. After examining their content, we were able to get an accurate sense of the information they contained, the organizational structure of the files, and the types of formats created by Freedman. Although, we were not able to examine files on the legacy media, we felt that we had enough context to perform appraisal, determine selection criteria and formulate an arrangement structure for the collection.

The next step of the project will be to physically organize the material. This will involve separating, photographing and reboxing the digital media carriers and updating a new registry of digital media that was created during a recent digital archives collection assessment modelled after OCLC’s 2012 “You’ve Got to Walk Before You Can Run” research report. Then, we will need to process the digital media, which will entail creating disk images with our FRED machine and using forensic tools to analyze the data.  Hopefully, this will allow us to apply the selection criteria used on the analogue records to the digital records and weed out what we do not want to retain. During this process, we will be creating procedure documentation on accessioning digital media as well as updating the archives’ accessioning manual.

The project’s final steps will be to take the born-digital content we have collected and ingest it using Archivematica to create Archival Information Packages for storage and preservation and accessed via the Archives Catalogue and Online Collections.

So there you have it! We have a long way to go in terms of digital preservation here at Dalhousie (and we are just getting started!), but hopefully our work over the next several months will ensure that solid policies and procedures are in place for maintaining a trustworthy digital preservation system in the future.

This internship is funded in part by a grant from the Young Canada Works Building Careers in Heritage Program, a Canadian federal government program for graduates transitioning to the workplace.

___

dsc_0329

Alexandra Jokinen has a Master’s Degree in Film and Photography Preservation and Collections Management from Ryerson University in Toronto. Previously, she has worked as an Archivist at the Liaison of Independent Filmmakers of Toronto and completed a professional practice project at TIFF Film Reference Library and Special Collections.

Connect with me on LinkedIn!

Processing Digital Research Data

By Elise Dunham

This is the sixth post in our Spring 2016 series on processing digital materials.

———

The University of Illinois at Urbana-Champaign’s (Illinois) library-based Research Data Service (RDS) will be launching an institutional data repository, the Illinois Data Bank (IDB), in May 2016. The IDB will provide University of Illinois researchers with a repository for research data that will facilitate data sharing and ensure reliable stewardship of published data. The IDB is a web application that transfers deposited datasets into Medusa, the University Library’s digital preservation service for the long-term retention and accessibility of its digital collections. Content is ingested into Medusa via the IDB’s unmediated self-deposit process.

As we conceived of and developed our dataset curation workflow for digital datasets ingested in the IDB, we turned to archivists in the University Archives to gain an understanding of their approach to processing digital materials. [Note: I am not specifying whether data deposited in the IDB is “born digital” or “digitized” because, from an implementation perspective, both types of material can be deposited via the self-deposit system in the IDB. We are not currently offering research data digitization services in the RDS.] There were a few reasons for consulting with the archivists: 1) Archivists have deep, real-world curation expertise and we anticipate that many of the challenges we face with data will have solutions whose foundations were developed by archivists and 2) If, through discussing processes, we found areas where the RDS and Archives have converging preservation or curation needs, we could communicate these to the Preservation Services Unit, who develops and manages Medusa, and 3) I’m an archivist by training and I jump on any opportunity to talk with archivists about archives!

Even though the RDS and the University Archives share a central goal–to preserve and make accessible the digital objects that we steward–we learned that there are some operational and policy differences between our approaches to digital stewardship that necessitate points of variance in our processing/curation workflow:

Appraisal and Selection

In my view, appraisal and selection are fundamental to the archives practice. The archives field has developed a rich theoretical foundation when it comes to appraisal and selection, and without these functions the archives endeavor would be wholly unsustainable. Appraisal and selection ideally tend to occur in the very early stages of the archival processing workflow. The IDB curation workflow will differ significantly–by and large, appraisal and selection procedures will not take place until at least five years after a dataset is published in the IDB–making our appraisal process more akin to that of an archives that chooses to appraise records after accessioning or even during the processing of materials for long-term storage. Our different approaches to appraisal and selection speak to the different functions the RDS and the University Archives fulfill within the Library and the University.

The University Archives is mandated to preserve University records in perpetuity by the General Rules of the University, the Illinois State Records Act. The RDS’s initiating goal, in contrast, is to provide a mechanism for Illinois researchers to be compliant with funder and/or journal requirements to make results of research publicly available. Here, there is no mandate for the IDB to accept solely what data is deemed to have “enduring value” and, in fact, the research data curation field is so new that we do not yet have a community-endorsed sense of what “enduring value” means for research data. Standards regarding the enduring value of research data may evolve over the long-term in response to discipline-specific circumstances.

To support researchers’ needs and/or desires to share their data in a simple and straightforward way, the IDB ingest process is largely unmediated. Depositing privileges are open to all campus affiliates who have the appropriate University log-in credentials (e.g., faculty, graduate students, and staff), and deposited files are ingested into Medusa immediately upon deposit. RDS curators will do a cursory check of deposits, as doing so remains scalable (see workflow chart below), and the IDB reserves the right to suppress access to deposits for a “compelling reason” (e.g., failure to meet criteria for depositing as outlined in the IDB Accession Policy, violations of publisher policy, etc.). Aside from cases that we assume will be rare, the files as deposited into the IDB, unappraised, are the files that are preserved and made accessible in the IDB.

Preservation Commitment

A striking policy difference between the RDS and the University Archives is that the RDS makes a commitment to preserving and facilitating access to datasets for a minimum of five years after the date of publication in the Illinois Data Bank.

The University Archives, of course, makes a long-term commitment to preserving and making accessible records of the University. I have to say, when I learned that the five-year minimum commitment was the plan for the IDB, I was shocked and a bit dismayed! But after reflecting on the fact that files deposited in the IDB undergo no formal appraisal process at ingest, the concept began to feel more comfortable and reasonable. At a time when terabytes of data are created, oftentimes for single projects, and budgets are a universal concern, there are logistical storage issues to contend with. Now, I fully believe that for us to ensure that we are able to 1) meet current, short-term data sharing needs on our campus and 2) fulfill our commitment to stewarding research data in an effective and scalable manner over time, we have to make a circumspect minimum commitment and establish policies and procedures that enable us to assess the long-term viability of a dataset deposited into the IDB after five years.

The RDS has collaborated with archives and preservation experts at Illinois and, basing our work in archival appraisal theory, have developed guidelines and processes for reviewing published datasets after their five-year commitment ends to determine whether to retain, deaccession, or dedicate more stewardship resources to datasets. Enacting a systematic approach to appraising the long-term value of research data will enable us to allot resources to datasets in a way that is proportional to the datasets’ value to research communities and its preservation viability.

Convergences

To show that we’re not all that different after all, I’ll briefly mention a few areas where the University Archives and the RDS are taking similar approaches or facing similar challenges:

  • We are both taking an MPLP-style approach to file conversion. In order to get preservation control of digital content, at minimum, checksums are established for all accessioned files. As a general rule, if the file can be opened using modern technology, file conversion will not be pursued as an immediate preservation action. Establishing strategies and policies for managing a variety of file formats at scale is an area that will be evolving at Illinois through collaboration of the University Archives, the RDS, and the Preservation Services Unit.
  • Accruals present metadata challenges. How do we establish clear accrual relationships in our metadata when a dataset or a records series is updated annually? Are there ways to automate processes to support management of accruals?
  • Both units do as much as they can to get contextual information about the material being accessioned from the creator, and metadata is enhanced as possible throughout curation/processing.
  • The University Archives and the RDS control materials in aggregation, with the University Archives managing at the archival collection level and the RDS managing digital objects at the dataset level.
  • More? Certainly! For both the research data curation community and the archives community, continually adopting pragmatic strategies to manage the information created by humans (and machines!) is paramount, and we will continue to learn from one another.

Research Data Alliance Interest Group

If you’re interested in further exploring the areas where the principles and practices in archives and research data curation overlap and where they diverge, join the Research Data Alliance (RDA) Archives and Records Professionals for Research Data Interest Group. You’ll need to register with the RDA, (which is free!), and subscribe to the group. If you have any questions, feel free to get in touch!

IDB Curation Workflow

The following represents our planned functional workflow for handling dataset deposits in the Illinois Data Bank:

Dunham_ProcessingDigitalReserachData_PublishedDepositScan_ERSblog_1
Workflow graphic created by Elizabeth Wickes. Click on the image to view it in greater detail.

Learn More

To learn more about the IDB policies and procedures discussed in this post, keep an eye on the Illinois Data Bank website after it launches next month. Of particular interest on the Policies page will be the Accession Policy and the Preservation Review, Retention, Deaccession, Revision, and Withdrawal Procedure document.

Acknowledgements

Bethany Anderson and Chris Prom of the University of Illinois Archives

The rest of the Research Data Preservation Review Policy/Procedures team: Bethany Anderson, Susan Braxton, Heidi Imker, and Kyle Rimkus

The rest of the RDS team: Qian Zhang, Elizabeth Wickes, Colleen Fallaw, and Heidi Imker

———

Dunham_ProcessingDigitalReserachData_PublishedDepositScan_ERSblog_2Elise Dunham is a Data Curation Specialist for the Research Data Service at the University of Illinois at Urbana-Champaign. She holds an MLS from the Simmons College Graduate School of Library and Information Science where she specialized in archives and metadata. She contributes to the development of the Illinois Data Bank in areas of metadata management, repository policy, and workflow development. Currently she co-chairs the Research Data Alliance Archives and Records Professionals for Research Data Interest Group and is leading the DACS workshop revision working group of the Society of American Archivists Technical Subcommittee for Describing Archives: A Content Standard.

Digital Preservation System Integration at the University of Michigan’s Bentley Historical Library

By Michael Shallcross

At SAA 2015, Courtney Mumma (formerly of Artefactual Systems) and I participated in a panel discussion at the Electronic Records Section meeting on “implementing digital preservation tools and systems,” with a focus on “the lessons learned through the planning, development, testing, and production of digital preservation applications.”  

The University of Michigan’s Bentley Historical Library is in the midst of a two-year project (2014-2016) funded by the Andrew W. Mellon Foundation to integrate ArchivesSpace, Archivematica, and DSpace in an end-to-end digital archives workflow (for more information on the project itself, see our blog).

Artefactual Systems is responsible for the development work on the project, which has involved adding new functionality to the Archivematica digital preservation system to permit the appraisal and arrangement of digital archives as well as the integration of ArchivesSpace functionality within Archivematica so that users can create and edit archival description in addition to associating digital objects with that information.

Continue reading