Welcome to the World of Tomorrow: Technology that should give archivists nightmares (or at least indigestion)

by Joshua Kitchens

Advances in technology should not be looked as so much as forward progress, but as a series  of more complicated things for use to preserve. This complicated reality that we as archivists will be facing. For just a moment, instead of considering the present or looking or backwards, let us look towards the bright and shiny tomorrow.

Quantum Computing

Quantum computing seems like a real thing. There were some doubts early on about whether or not the quantum computers that existed were real, but that sort of fits the whole definition of theoretical physics. Now it seems that qubits are the new bits. With Google and other tech companies leading the efforts to build machines that can calculate seemingly impossible things, and with speeds unheard of by today’s standards, say goodbye to simple 1’s and 0’s and hello to 1 and 0’s in superpositions and entangled, quantumly speaking. What kinds of records will these machines create? <Shrugs> It is impossible to know just yet, but they are coming, and we should be aware. Unfortunately, I doubt Al will be there to help us figure out where our leap into this new realm of computing has landed us.

Virtual Reality and Augmented Reality

Nothing quite gets my head spinning like thinking about how to deal with the inevitable virtual reality take over. While we may get to luxuriate in digital evergreen fields with elves, orcs, and cyberspace marines, I can only expect the enviable need to find a way to preserve these New Aged sprites, as I can only imagine that in the future a peace treaties will be worked out between a 7-foot-tall virtual anthropomorphic moose and an overly cute chibi panda. While further historians will debate the meaning of 🙂 in the third line of that treaty, we will need to understand the significant properties and other aspects that should be preserved and what could be said of the record qualities of these virtual spaces. What sorts of technological preservation will be required for these environments? Will we feel an overwhelming sense of dread as we appraise these records? Think about the headset graveyard!!! We should also consider augmented reality. Augmented reality poses a complex issue. What is the record, in this case: the Google Glass overlay onto the real world, or the data behind the overlay? I feel a bit like we are Morpheus searching for our Neo in this case. Will you be the One?

Video Games

In many respects, video games could be included in any discussion of virtual worlds, but for now, let’s take Mario head on, or shall we say feet first. Like virtual reality, video games are complex digital objects, but in addition to a game with systems for rendering pixels and dynamic worlds, there is usually a rabid and supporting fan base. These are primarily cultural spaces, sometimes based on game, like World of Warcraft and Eve Online, and sometimes existing through forums and twitter hashtags. These groups introduce new  language, like “ult” or ultimate. They debate issues going beyond the game environment. Problems range from ethics to Trans rights, to much more. So for video games, part of understanding  the complex record that is a game, is the various communities that have been created around them.

Blockchain

Blockchain is the new buzz word on the internet and business these days. What started out as principally a vehicle and system for recording transactions of a currency unfettered from governmental controls has blossomed into a buzzword fueled explosion of… well, I’m not entirely sure. What I do know is that graphics cards are prohibitively expensive now, and Kodak has licensed its name to a bitcoin mining company. Kodak has also allowed its name to be used for a company that wants to use blockchains to help track image rights. This is quite a development. Some researchers, such as Hrvoje Stancic, are already thinking about the implications of blockchains for archives and information professionals. So get ready, you might need your hacker specs for this one.

Advertisements

Diving into Computational Archival Science

by Jane Kelly

In December 2017, the IEEE Big Data conference came to Boston, and with it came the second annual computational archival science workshop! Workshop participants were generous enough to come share their work with the local library and archives community during a one-day public unconference held at the Harvard Law School. After some sessions from Harvard librarians that touched on how they use computational methods to explore archival collections, the unconference continued with lightning talks from CAS workshop participants and discussions about what participants need to learn to engage with computational archival science in the future.

So, what is computational archival science? It is defined by CAS scholars as:

“An interdisciplinary field concerned with the application of computational methods and resources to large-scale records/archives processing, analysis, storage, long-term preservation, and access, with aim of improving efficiency, productivity and precision in support of appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival material.”

Lightning round (and they really did strike like a dozen 90-second bolts of lightning, I promise!) talks from CAS workshop participants ranged from computational curation of digitized records to blockchain to topic modeling for born-digital collections. Following a voting session, participants broke into two rounds of large group discussions to dig deeper into lightning round topics. These discussions considered natural language processing, computational curation of cultural heritage archives, blockchain, and computational finding aids. Slides from lightning round presenters and community notes can be found on the CAS Unconference website.

Lightning round talks. (Image credit)

 

What did we learn? (What questions do we have now?)

Beyond learning a bit about specific projects that leverage computational methods to explore archival material, we discussed some of the challenges that archivists may bump up against when they want to engage with this work. More questions were raised than answered, but the questions can help us build a solid foundation for future study.

First, and for some of us in attendance perhaps the most important point, is the need to familiarize ourselves with computational methods. Do we have the specific technical knowledge to understand what it really means to say we want to use topic modeling to describe digital records? If not, how can we build our skills with community support? Are our electronic records suitable for computational processes? How might these issues change the way we need to conceptualize or approach appraisal, processing, and access to electronic records?

Many conversations repeatedly turned to issues of bias, privacy, and ethical issues. How do our biases shape the tools we build and use? What skills do we need to develop in order to recognize and dismantle biases in technology?

Word cloud from the unconference created by event co-organizer Ceilyn Boyd.

 

What do we need?

The unconference was intended to provide a space to bring more voices into conversations about computational methods in archives and, more specifically, to connect those currently engaged in CAS with other library and archives practitioners. At the end of the day, we worked together to compile a list of things that we felt many of us would need to learn in order to engage with CAS.

These needs include lists of methodologies and existing tools, canonical data and/or open datasets to use in testing such tools, a robust community of practice, postmortem analysis of current/existing projects, and much more. Building a community of practice and skill development for folks without strong programming skills was identified as both particularly important and especially challenging.

Be sure to check out some of the lightning round slides and community notes to learn more about CAS as a field as well as specific projects!

Interested in connecting with the CAS community? Join the CAS Google Group at: computational-archival-science@googlegroups.com!

The Harvard CAS unconference was planned and administered by Ceilyn Boyd, Jane Kelly, and Jessica Farrell of Harvard Library, with help from Richard Marciano and Bill Underwood from the Digital Curation Innovation Center (DCIC) at the University of Maryland’s iSchool. Many thanks to all the organizers, presenters, and participants!


Jane Kelly is the Historical & Special Collections Assistant at the Harvard Law School Library. She will complete her MSLIS from the iSchool at the University of Illinois, Urbana-Champaign in December 2018.

Improving Descriptive Practices for Born-Digital Material in an Archival Context

by Annie Tummino

In 2014/15 I worked at the New York Metropolitan Library Council (METRO) as the Project Manager for the National Digital Stewardship Residency (NDSR) program in New York City, providing administrative support for five fantastic digital stewardship projects. While I gained a great deal of theoretical knowledge during that time, my hands-on work with born digital materials has been fairly minimal. When I saw that METRO was offering a workshop on “Improving Descriptive Practices for Born-Digital Material in An Archival Context” with Shira Peltzman, former NDSR-NY resident (and currently Digital Archivist for UCLA Library Special Collections), I jumped at the opportunity to sign-up.

For the last two years I have served as the archivist at SUNY Maritime College, working as a “lone arranger” on a small library staff. My emphasis has been on modernizing the technical infrastructure for our special collections and archives. We’ve implemented ArchivesSpace for collections management and are in the process of launching a digital collections platform. However, most of my work with born-digital materials has been of a very controlled type; for example, oral histories and student photographs that we’ve collected as part of documentation projects. While we don’t routinely accession born-digital materials, I know the reckoning will occur eventually. A workshop on descriptive practices seemed like a good place to start.

Shira emphasized that a great deal of technical and administrative metadata is captured during processing of born-digital materials, but not all of this information should be recorded in finding aids. Part of the archivist’s job is figuring out which data is meaningful for researchers and important for access. Quoting Bertram Lyons, she also emphasized that possessing a basic understanding of the underlying “chemistry” of digital files will help archivists become better stewards of born-digital materials. To that end, she started the day with a “digital deep dive” outlining some of this underlying digital chemistry, including bits and bytes, character encoding, and plain text versus binary data. This was followed by an activity where we worked in small groups to analyze the interdependencies involved in finding, retrieving, and rendering the contents of files in given scenarios. The activity definitely succeeded in demonstrating the complexities involved in processing digital media, and provided an important foundation for our subsequent discussion of descriptive practice.

The following bullets, pulled directly from Shira’s presentation, succinctly summarize the unique issues archivists face when processing born digital materials:

  • Processing digital material often requires us to (literally) transform the files we’re working with
  • As stewards of this material, we must be prepared to record, account for, and explain these changes to researchers
  • Having guidelines that helps us do this consistently and uniformly is essential

The need for transparency and standardization were themes that came up again and again throughout the day.

To deal with some of the special challenges inherent in describing born-digital materials, a working group under the aegis of the UC Born-Digital Content Common Knowledge Group (CKG) has developed a UC-wide descriptive standard for born-digital archival material. The elements map to existing descriptive standards (DACS, EAD, and MARC) while offering additional guidance for born-digital materials where gaps exist. The most up-to-date version is on GitHub, where users can make pull requests to specific descriptive elements of the guidelines if they’d like to propose revisions. They have also been deposited in eScholarship, the institutional repository for the UC community.

Working in small groups, workshop participants took a closer look at the UC guidelines, examining particular elements, such as Processing Information; Scope and Content; Physical Description; and Extent. Drawing from our experience, we investigated differences and similarities in using these elements for born-digital materials in comparison to traditional materials. We also discussed the potential impacts of skipping these elements on the research process. We agreed that lack of standardization and transparency sows confusion, as researchers often don’t understand how born-digital media can be accessed, how much of it there is, or how it relates to the collection as a whole.

For our final activity, each group reviewed a published finding aid and identified at least five ways that the description of born-digital materials could be improved in the finding aid. The collections represented were all hybrid, containing born-digital materials as well as papers and other analog formats. It was common for the digital materials to be under-described, with unclear access statuses and procedures. The UC guidelines were extremely helpful in terms of generating ideas for improvements. However, the exercise also led to real talk about resource limitations and implementation. How do born-digital materials fit into an MPLP context? What do the guidelines mean for description in terms of tiered or efficient processing? No solid answers here, but great food for thought.

On the whole, the workshop was a great mix of presentation, discussion, and activities. I left with some immediate ideas to apply in my own institution. I hope more archivists will have opportunities to take workshops like this one and will check out the UC Guidelines.


 

Tummino Picture

Annie Tummino is the Archivist & Scholarly Communications Librarian at SUNY Maritime College, where she immerses herself in maritime special collections and advocates for Open Access while working in a historic fort by the sea. She received her Masters in Library and Information Studies and Archives Certificate from Queens College-CUNY in December, 2010.

Embedded Archives at the Institute for Social Research

by Kelly Chatain

This is the fourth post in the BloggERS Embedded Series.

As any archivist will tell you, the closer you can work with creators of digital content, the better. I work for the Institute for Social Research (ISR) at the University of Michigan. To be more precise, I am a part of the Survey Research Center (SRC), one of five centers that comprise the Institute and the largest academic social science research center in the United States. But really, I was hired by the Survey Research Operations (SRO) group, the operational arm of SRC, that conducts surveys all over the world collecting vast and varied amounts of data. In short, I am very close to the content creators. They move fast, they produce an extraordinary amount of content, and they needed help.

Being an ‘embedded’ archivist in this context is not just about the end of the line; it’s about understanding and supporting the entire lifecycle. It’s archives, records management, knowledge management, and more, all rolled into one big job description. I’m a functional group of one interacting with every other functional group within SRO to help manage research records in an increasingly fragmented and prolific digital world. I help to build good practices, relationships, and infrastructure among ourselves and other institutions working towards common scientific goals.

Lofty. Let’s break it down a bit.

Find it, back it up, secure it

When I arrived in 2012, SRO had a physical archive of master study files that had been tended to by survey research staff over the years. These records provide important reference points for sampling and contacting respondents, designing questionnaires, training interviewers, monitoring data collection activities, coding data, and more. After the advent of the digital age, a few building moves, and some server upgrades, they also had an extensive shared drive network and an untold number of removable media containing the history of more recent SRO work. My first task was to centralize the older network files, locate and back up the removable media, and make sure sensitive data was out of reach. Treesize Professional is a great tool for this type of work because it creates detailed reports and clear visualizations of disk space usage. This process also produced SRO’s first retention schedule and an updated collection policy for the archive.

Charts produced by Treesize Professional used for the initial records survey and collection.
A small selection of removable media containing earlier digital content.

Welcome, GSuite

Despite its academic home, SRO operates more like a business. It serves University of Michigan researchers as well as external researchers (national and international), meeting the unique requirements for increasingly complex studies. It maintains a national field staff of interviewers as well as a centralized telephone call center. The University of Michigan moved to Google Apps for Education (now GSuite) shortly after I arrived, which brought new challenges, particularly in security and organization. GSuite is not the only documentation environment in which SRO operates, but training in the Googleverse coincided nicely with establishing guidance on best practices for email, file management, and organization in general. For instance, we try to label important emails by project (increasingly decisions are documented only in email) which can then be archived with the other documentation at the end of the study (IMAP to Thunderbird and export to pdf; or Google export to .mbox, then into Thunderbird). Google Drive files are downloaded to our main projects file server in .zip format at the end of the study.

Metadata, metadata, metadata

A marvelous video on YouTube perfectly captures the struggle of data sharing and reuse when documentation isn’t available. The survey data that SRO collects is delivered to the principal investigator, but SRO also collects and requires documentation for data about the survey process to use for our own analysis and design purposes. Think study-level descriptions, methodologies, statistics, and more. I’m still working on finding that delicate balance of collecting enough metadata to facilitate discovery and understanding while not putting undue burden on study staff. The answer (in progress) is a SQL database that will extract targeted structured data from as many of our administrative and survey systems as possible, which can then be augmented with manually entered descriptive metadata as needed. In addition, I’m looking to the Data Documentation Initiative, a robust metadata standard for documenting a wide variety of data types and formats, to promote sharing and reuse in the future.

DDI is an international standard for describing data.

Preserve it

The original plan for digital preservation was to implement and maintain our own repository using an existing open-source or proprietary system. Then I found my new family in the International Association for Social Science Information Services & Technology (IASSIST) and realized I don’t have to do this alone. In fact, just across the hall from SRO is the Inter-University Consortium for Political and Social Research (ICPSR), who recently launched a new platform called Archonnex for their data archive(s). Out of the box, Archonnex already delivers much of the basic functionality SRO is looking for, including support for the ever-evolving  preservation needs of digital content, but it can also be customized to serve the particular needs of a university, journal, research center, or individual department like SRO.

Searching for data in OpenICPSR, built on the new Archonnex platform.

 

The embedded archivist incorporates a big picture perspective with the specific daily challenges of managing records in ways that not many positions allow. And you never know what you might be working on next…


Kelly Chatain is Associate Archivist at the Institute for Social Research, University of Michigan in Ann Arbor. She holds an MS from Pratt Institute in Brooklyn, NY.

Thoughts from a Newcomer: Code4Lib 2018 Recap

by Erica Titkemeyer

After prioritizing audiovisual preservation conferences for so long, this year I chose to attend my first Code4Lib annual conference in Washington, D.C. I looked forward to the talks most directly related to my work, but also knew that there would be many relatable discussions to participate in, as we are all users/creators of library technology tools and we all seek solutions to similar data management challenges.

The conference started with Chris Bourg’s keynote, detailing research on why marginalized individuals are compelled to leave their positions in tech jobs because of undue discrimination. Calling for increased diversity in the workplace, less mansplaining/whitesplaining, more vouching for and amplification of marginalized colleagues, Dr. Bourg set out to “equip the choir”. She also pointed out that Junot Diaz said it best at ALA’s midwinter conference, when he called for a reckoning, explaining that “A profession that is 88% white means 5000% agony for people of color, no matter how liberal and enlightened you think you are.”

I appreciated her decision to use this opportunity to point out our own shortcomings, contradictions and need to do better. I will also say, if you ever needed proof that there is an equity problem in the tech world, you can:

  1. Listen to Dr. Bourg’s talk and
  2. Read the online trolling and harassment that she has since been subjected to because of it.

Since the backlash, Code4Lib has released a Community Statement in support of her remarks.

Following the keynote, the first round of talks further assured me that I had chosen the right conference to attend. In Andreas Orphanides’ talk: “Systems thinking: a practical field guide”, he cleverly pointed out system failures and hacks we all experience in our daily lives, and how they are analogous to the software we might build and where there might be areas for improvement. I also appreciated Julie Swierczek’s talk “For Beginners – No Experience Necessary”, in which she made the case for improving how we teach to true beginners in workshops. She also argued that instructors should not assume everyone is on the same level-playing field just because the title includes “for beginners” as it is not likely that attendees will know how to self-select workshops, especially if they are truly beginners to the technology being taught.

As a fan of Arduino (an open source hardware and software platform that supports DIY electronic prototying), I was curious to hear Monica Maceli’s “Low-cost preservation Environment Monitoring” talk, where she described her experience developing an environmental datalogger using the Raspberry Pi (similar microcontroller concept to Arduino) comparing the results and associated costs with a commercial datalogger, the PEM2. While it would require staff with appropriate expertise, it seemed to be a worthwhile endeavor for anyone wishing to spend a quarter of the price.

With the sunsetting of Flash, I was eager to hear how Jacob Zaborowski’s talk “Save Homestar Runner!: Preserving Flash on the Web” would address the preservation concerns surrounding Homestar Runner, an online cartoon series that began in 2000 using flash animation. Knowing that tools such as Webrecorder and Archive-it would capture, but not aid in preserving the SWF files comprising the animations, Zaborowski sought out free and/or open source tools for transcoding the files to a more accessible and preservation-minded format. Like many audiovisual-based digital formats, tools for transcoding the SWF files were not entirely reliable or capable of migrating all of the unique attributes to a new container with different encodings. At the time of his talk, the folks at Homestar Runner were in the midst of a site redesign to hopefully resolve some of these issues.

While I don’t have the space to summarize all of the talks I found relatable or enlightening during my time at Code4Lib, I think these few that I’ve mentioned show how varied the topics can be, while still managing to complement the information management work we are all charged with doing.


TitkemeyerErica.jpgErica Titkemeyer is the Audiovisual Conservator for the Southern Folklife Collection (SFC) at the University of North Carolina at Chapel Hill. She is the Project Director on the Andrew W. Mellon Foundation grant Extending the Reach of Southern Audiovisual Sources, and overseas the digitization, preservation and access of audiovisual recordings for the SFC.

Embedded Memory

by Michelle Ganz

This is the third post in the BloggERS Embedded Series.

As an archivist in a small corporate repository for an architecture, design, and consulting firm I have a unique set of challenges and advantages. By being embedded with the creators as they are creating, I have the opportunity to ensure that archival standards are applied at the point of creation rather than after the collection has been transferred to a repository.

My environment is especially unique in the types of digital files I’m collecting. From the architecture side, William McDonough + Partners, I acquire architectural project files (CAD files), sketches, renderings, photographs and video of project phases, presentations, press, media files surrounding the project, and other associated materials. Many of these files are created on specialized, and expensive, software.

From McDonough Innovation, William McDonough’s sustainable development advising firm, I collect the world-changing ideas that Mr. McDonough is developing as they evolve. Mr. McDonough is a global thought leader who provides targeted ideas, product concepts, and solutions to a wide range of sustainable growth issues faced by corporate officers, senior executives, product designers, and project managers. He often works with CEOs to set the vision, and then with the management team to set goals and execute projects. His values-driven approach helps companies to embed sustainable growth principles into their corporate culture and to advance progress toward their positive vision. Archiving an idea is a multi-faceted endeavor. Materials can take the form of audio notes, sketches in a variety of programs, call or meeting recordings, and physical whiteboards. Since my role is embedded within the heart of Mr. McDonough’s enterprises, I ensure that all the right events are captured the right way, as they happen. I gather all the important contextual information and metadata about the event and the file. I can obtain permissions at the point of creation and coordinate directly with the people doing the original capture to ensure I get archival quality files.

Challenges in this environment are very different than what my academic counterparts face. In the academic world there was a chain of leadership that I could advocate to as needed. In my small corporate world there is no one to appeal to once my boss makes up their mind. Corporate interests are all focused on ROI (return on investment), and an archival department is a financial black hole; money is invested, but they will never see a financial return. This means that every new project must show ROI in more creative ways. I focus on how a project will free up other people to do more specialized tasks. But even this is often not enough, and I find myself advocating for things like file standards and server space. Many of the archival records are videos of speeches, events, meetings, or other activities and take up a huge amount of server space. A single month’s worth of archival files can be as large as 169 GB. In the academic setting where the archives is often a part of the library, the IT department is more prepared for the huge amounts of data that come with modern libraries; folding the archival storage needs into this existing digital preservation framework is often just a matter of resource allocation or funds.

Also, nearly every archival function that interacts with outside entities requires permissions these firms are not used to giving. Meetings can include people from 3 or 4 companies in 4 or 5 countries with a variety of NDAs in place with some, but not all, of the parties. In order to record a meeting I must obtain permission from every participant; this can be rather complicated and can create a lot of legal and privacy issues. A procedure was put in place to request permission to record when meetings are set up, as well as when meetings are confirmed. A spreadsheet was created to track all of the responses. For regular meeting participants annual permissions are obtained. This procedure, while effective, is time-consuming. For many meeting participants they are unfamiliar with what an archive is. There are many questions about how the information will be used, stored, disseminated, and accessed.  There are also a lot of questions around the final destination of the archive and what that means for their permissions. To help answer these questions I created fact sheets that explain what the archives are, how archival records are collected and used, deposit timelines, copyright basics, and links to more information. To further reassure participants, we give them the option of asking for a meeting to be deleted after the fact.

This is the server stack for the archives and the two firms. The archive uses 2 blades.
This hub connects the archive to workstations and handles the transfer of TBs of data.

Preservation and access are unique challenges, especially with the architecture files. Many of the project-related files are non-traditional file formats like .dwg, .skb, .indd, .bak, et al., and are created in programs like AutoCAD and SketchUp Pro. I work with the IT department to ensure that the proper backups are completed. We back up to a local server as well as one in the city, but offsite, and a third dark archive in California. I also perform regular checks to confirm the files can open. Due to the fact that projects are often reopened years later, it is impractical to convert the files to a more standardized format. To ensure some level of access without specialized terminals, final elements of the project are saved in a .pdf format. This includes final drawings/renderings and presentations.

Furthermore, I often find myself in the awkward position of arguing with records creators in favor of keeping files that they don’t want but I know have archival value. Without the benefit of patrons, and their potential needs, I am arguing for the future needs of the very people I am arguing with! Without a higher level of administration to appeal to, I am often left with no recourse but to do things that are not in the best interests of the collection. This leads to the unfortunate loss of materials but may not be as bad as it first appears. When considering how traditional archival collections are created and deposited, it is well within reason that these items would never have made it into the collection. I like to think that by being embedded in the creation process, I am able to save far more than would otherwise be deposited if the creators were left to make appraisal decisions on their own.


Michelle Ganz is the Archives Director at McDonough Innovation and the former archivist at Lincoln Memorial University. She received her MILS from the University of Arizona and a BA from the Ohio State University. In addition to her passion for all things archival Michelle loves to cook, read, and watch movies.

Modernization of the A.D. Hopkins collection at the Smithsonian Institution Department of Entomology

by Teresa Boyd

This is the second post in the BloggERS Embedded Series.

The Smithsonian Institution’s Department of Entomology has recently finished phase one of their multiyear project to digitize their portion of the A.D. Hopkins notes and records system, which includes about 100 years of observations, both in the field and in the lab. A.D. Hopkins developed the system in order to collect biological and natural history notes about species, the environment they were in, as well as the collectors and locations of collection. This collection was adopted by the United States Department of Agriculture (USDA) when Hopkins was named Chief of Forest Insect Investigations, though Hopkins is known to have developed and used the system while working at West Virginia University in the late 1800s. The Smithsonian Institution’s Department of Entomology has historically worked very closely with the USDA and therefore obtained the largest portion of the Hopkins card file over the years.

 

It was important to Hopkins to collect as much information as possible about specimens because he felt it was the quickest way to understand the situation of potential pests and to find solutions to harmfully invasive species. Part of Hopkins’ methodology was to encourage average citizens to send in specimens and observations to the USDA, the Smithsonian, or one of the forest experiment stations that were located throughout the United States, which were then incorporated into the Hopkins note system. Some of these notes are also documentation about lab research such as specimen rearing, specimen transfers, and communications between lab and field. A few of these notes are also cross-referenced, so often a lab note can be traced back to a field note, making it easier for researchers to quickly see the correlation between field and lab (work that was often done by different individuals.) The numbers on each individual card within the A.D. Hopkins system correlates to specimens that are housed in various locations. Traditionally a researcher or scientist would ask for the notes that were associated with a a specimen number. By creating an online repository of the notes, the Smithsonian hopes to further enrich researchers with new tools to expand their work and perhaps find new ways to use the data which has been collected by past researchers and scientists.

I have been working on this project as a lone archivist for the past 5 years, scanning the card file portion of the collection, and am now working on preparing these scans for a website that will be built specifically for this type of collection. The Smithsonian Institution’s Department of Entomology hopes to begin sending the scans of the cards to the Smithsonian Transcription Center soon to crowdsource the transcriptions. This cuts down on the time it takes to transcribe the older material which is all handwritten. I will be adding the transcribed notes to the digitized card on the website so that researchers will be able to go to the website, look up a specific card, and see both the original scan and the transcribed notes, making it easy for anyone to be able to use the information contained in the Hopkins collection. Additionally these scans will be incorporated into the Department of Entomology’s collections database by matching specimens to their unique card numbers;  thereby giving researchers the complete picture.

The Smithsonian Institution’s work to digitize and make their A.D. Hopkins collection publicly available is not the first of its kind; the USDA had previously accomplished this in the 1980s, and has made their documents available on the USDA website, HUSSI. There is hope that in the future other institutions that have their own portions of the A.D. Hopkins notes and records system will also begin to digitize and make them available online, supplementing the Smithsonian and USDA efforts to make this invaluable data available to researchers. 


Teresa Boyd is an archivist for the Department of State and a volunteer archivist for the Smithsonian Institute’s Department of Entomology. She holds a degree in Library and Information Science from the University of Arizona.

Capturing Common Ground

by Leslie Matthaei

This is the first post in the BloggERS Embedded Series.

Every Tuesday I am asked the same question: “T Coast today?” T Coast, or Tortilla Coast, is the preferred lunch location for some of the photographers that occupy the Photography Branch, in the Curator Division, of the federal agency Architect of the Capitol. The agency oversees the maintenance of building and landscapes on Capitol Hill to include the Library of Congress buildings, Supreme Court, United States Botanic Gardens, House and Senate Office Buildings, Capitol Power Plant, and, of course, the United States Capitol. I have joined the professional photographers at T Coast for more than a dozen lunches now. I am here for the Taco Salad and comradery but mostly I am here to listen. And to ask questions. I am an embedded archivist.

461700
The Library of Congress Thomas Jefferson Building
467882
United States Capital Building
469327
United States Botanic Garden – Bartholdi Park

I use my time at the T Coast lunch table to get to know the photographers and for them to get to know me. I discovered very quickly that the photographers and I have a lot in common. For example, the photographers are often assigned to shoot a long-term project (Collection) which may have multiple phases (Series), and for each phase, they go out on specific days to shoot (File Units). They cull excess and/or duplicate photographs. And they generally have a tried workflow for ingesting their born-digital objects to edit in Adobe Lightroom then upload them to an in-house Digital Asset Management system known as PhotoLightbox. Within PhotoLightbox, they are responsible for defining the security status of an individual image or group of images and providing the descriptive metadata. Tapping into parallel duties has allowed me to bridge potential knowledge gaps in explaining what roles and functions I can provide the branch as a whole.

One rather large knowledge gap is descriptive metadata. To be sure, the photographers in our agency are incredibly busy and in high demand. And they are professionally trained photographers. They see the world through aesthetics. It is not necessarily their job to use PhotoLightbox to help a researcher find images of the East Front extension that occurred in the 1950s, for example. That is my role, and when I query PhotoLightbox, the East Front extension project is represented in multiple ways: EFX, East Front (Plaza) Extension, East Extension, Capitol East Front Extension. You may see where this is going: there is no controlled vocabulary. When, in a staff meeting, I pitched the idea of utilizing controlled vocabularies, they immediately understood the need. Following their lead, the conversation turned to having me develop a data entry template for each of their shoots.

Matthaei_DataEntryTemplate
An example of the data entry template.

I admit now that my first spin through PhotoLightbox revealed a pressing need for controlled vocabularies, among other concerns the database presented. I am the type of person that when I see a problem, I want to fix it immediately. Yet I knew that if my first professional introduction to the photographers was a critique of how unworkable their data entry was and had been over time, I might turn them off immediately. Instead, I went to lunch. I credit the results produced by this particular staff meeting to the time that I put in getting to know the photographers, getting to understand each of their respective workflows, and understanding a little bit about the historic function and purpose of the office within the agency.

I have another half dozen lunches to go before I begin to talk to the photographers about the need for digital preservation of born-digital images over the long-term and both of our roles in the surrounding concepts and responsibilities. I have a few more lunches after that to get their assistance in codifying the decision we are making together into branch policies. I feel confident, however, that I have their complete buy-in for the work that I have been tasked to do in the branch. Instead of seeing me as another staff member making them do something they do not want, I am seen as someone who can help them gain control of and manage their assets in a way that has yet to be done in the branch. I cannot do it alone, I need their help. And some chips and salsa every once in a while.


Matthaei
Leslie Matthaei

Leslie Matthaei is an Archivist in the Photography Branch, Curator Division, Architect of the Capitol. She holds an MLIS from the University of Arizona, and an MA and BA in Media Arts from the University of Arizona.

Partnerships in Advancing Digital Archival Education

by Sohan Shah, Michael J. Kurtz, and Richard Marciano

This is the fourth post in the BloggERS series on Collaborating Beyond the Archival Profession.

The mission of the Digital Curation Innovation Center (DCIC) at the University of Maryland’s iSchool is to integrate archival education with research and technology. The Center does this through innovative instructional design, integrated with student-based project experience. A key element in these projects is forming collaborations with academic, public sector, and industry partners. The DCIC fosters these interdisciplinary partnerships through the use of Big Records and Archival Analytics.

DCIC Lab space at the University of Maryland.

The DCIC works with a wide variety of U.S. and foreign academic research partners. These include, among others, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, the University of British Columbia, King’s College London, and the Texas Advanced Computing Center at the University of Texas at Austin. Federal and state agencies who partner by providing access to Big Records collections and their staff expertise include the National Agricultural Library, the National Archives and Records Administration, the National Park Service, the U.S. Holocaust Memorial Museum, and the Maryland State Archives. In addition, the DCIC collaborates with the European Holocaust Research Infrastructure project to provide digital access to Holocaust-era collections documenting cultural looting by the Nazis and subsequent restitution actions. Industry partnerships have involved NetApp and Archival Analytics Solutions.

Students working on a semester-long project with Dr. Richard Marciano, Director, DCIC.

We offer students the opportunity to participate in interdisciplinary digital curation projects with the goal of developing new digital skills and conducting front line research at the intersection of archives, digital curation, Big Data, and analytics. Projects span across justice, human rights, cultural heritage, and cyber-infrastructure themes. Students explore new research opportunities as they work with cutting-edge technology and receive guidance from faculty and staff at the DCIC.

To further digital archival education, DCIC faculty develop courses at the undergraduate and graduate levels that teach digital curation theory and provide experiential learning through team-based digital curation projects. The DCIC has also collaborated with the iSchool to create a Digital Curation for Information Professionals (DCIP) Certificate program designed for working professionals who need training in next generation cloud computing technologies, tools, resources, and best practices to help with the evaluation, selection, and implementation of digital curation solutions. Along these lines, the DCIC will sponsor, with the Archival Educators Section of the Society of American Archivists (SAA), a workshop at the Center on August 13, 2018, immediately prior to the SAA’s Annual Meeting in Washington, D.C. The theme of the workshop is “Integrating Archival Education with Technology and Research.” Further information on the workshop will be forthcoming.

The DCIC seeks to integrate all its educational and research activities by exploring and developing a potentially new trans-discipline, Computational Archival Science (CAS), focused on the computational treatments of archival content. The emergence of CAS follows advances in Computational Social Science, Computational Biology, and Computational Journalism.

For further information about our programs and projects visit our web site at http://dcic.umd.edu. To learn more about CAS, see http://dcicblog.umd.edu/cas. Information about a student-led Data Challenge, which the DCIC is co-sponsoring, can be accessed at http://datachallenge.ischool.umd.edu.


Sohan Shah

Sohan Shah is a Master’s student at the University of Maryland studying Information Management. His focus is on using research and data analytical techniques to make better business decisions. He holds a Bachelor’s degree in Computer Science from Ramaiah Institute of Technology, India, and has worked for 4 years at Microsoft as a Consultant and then as a Technical Lead prior to joining the University of Maryland. Sohan is working at the DCIC to find innovative ways of integrating data analytics with archival education. He is the co-author of “Building Open-Source Digital Curation Services and Repositories at Scale” and is working on other DCIC initiatives such as the Legacy of Slavery and Japanese American WWII Camps. Sohan is also the President of the Master of Information Management Student Association and initiated University of Maryland’s annual “Data Challenge,” bringing together hundreds of students from different academic backgrounds and class years to work with industry experts and build innovative solutions from real-world datasets.

Dr. Michael J. Kurtz is Associate Director of the Digital Curation Innovation Center in the College of Information Studies at the University of Maryland. Prior to this he worked at the U.S. National Archives and Records Administration for 37 years as a professional archivist, manager, and senior executive, retiring as Assistant Archivist in 2011. He received his doctoral degree in European History from Georgetown University in Washington, D.C. Dr. Kurtz has published extensively in the fields of American history and archival management. His works, among others, include: “ The Enhanced ‘International Research Portal for Records Related to Nazi-Era Cultural Property’ Project (IRP2): A Continuing Case Study” (co-author) in Big Data in the Arts and Humanities: Theory and Practice (forthcoming); “Archival Management and Administration,” in Encyclopedia of Library and Information Sciences (Third Edition, 2010); Managing Archival and Manuscript Repositories (2004); America and the Return of Nazi Contraband: The Recovery of Europe’s Cultural Treasures (2006, Paperback edition 2009).

Dr. Richard Marciano is a professor in the College of Information Studies at the University of Maryland and director of the Digital Curation Innovation Center (DCIC).  Prior to that, he conducted research at the San Diego Supercomputer Center (SDSC) at the University of California San Diego (UCSD) for over a decade with an affiliation in the Division of Social Sciences in the Urban Studies and Planning program.  His research interests center on digital preservation, sustainable archives, cyberinfrastructure, and big data.  He is also the 2017 recipient of the Emmett Leahy Award for achievements in records and information management. With partners from KCL, UBC, TACC, and NARA, he has launched a Computational Archival Science (CAS) initiative to explore the opportunities and challenges of applying computational treatments to archival and cultural content. He holds degrees in Avionics and Electrical Engineering, a Master’s and Ph.D. in Computer Science from the University of Iowa, and conducted a Postdoc in Computational Geography.