Caption These Bits! And the Winner Is…

Thanks to everyone who cast a vote in our inaugural caption contest.

Digital drumroll, please! [0110010001110010011101010110110101110010011011110110110001101100]

The winner is:

“Reading from PowerPoint – boring that guy in the back since 1958.”

Submitted by Lora Davis, Assistant Professor and Collections Archivist in the University Libraries, Colgate University. Congrats, Lora!

Stay tuned for the next installment of Caption These Bits!, and in the meantime, remember, RAMAC is ready to answer your questions.

First_National_Bank_computer_demonstration(1)

Source: N09-033_a, Tracy O’Neal Photographic Collection, 1923-1975, Photographic Collection. Special Collections and Archives, Georgia State University Library. http://131.96.12.97/cdm/ref/collection/oneal/id/511 via DPLA: http://131.96.12.97/cdm/ref/collection/oneal/id/511

Should We Collect It Because We Can?

The following is a post by Dan Noonan, Digital Resources Archivist at Ohio State University, based on a breakout session at the ERS section meeting of last year’s SAA annual meeting.

With an expanding capacity to store information in the digital age, do archivists still need to consider the size of collections in making appraisal decisions? Is it more compelling to accept a collection that can be held on a few CD’s than one that occupies 30 cubic feet of climate-controlled compact shelving? Should archivists make different acquisition decisions for digital and physical collections? These questions were the topics of a break-out discussion at the 2014 Electronic Records Section of the Society of American Archivists annual meeting in Washington, DC.

Participants identified many examples of document sets not typically accessioned as a whole, either subject to sampling or outright rejection:  timecards and attendance records, correspondence (email), financial records (besides annual reports, budgets, and general ledgers), policies, promotion and tenure files, research data, resumes, and syllabi. Appraisal and selection of these types of materials have traditionally been justified by a lack of resources–space to store documents, supplies to house them, workers to process them. The presence (or potential presence) of sensitive or confidential information has often led archivists to select out whole categories of documents to avoid the risk disclosure.

It could be argued that digital files counter many of the standard arguments for selection and appraisal. With appropriate indexing and metadata, it may be easier to understand and appraise large volumes of digital content. Likewise in a digital environment, locating and redacting sensitive or confidential information could be automated. New tools and systems to manage large collections help support the argument that size may not apply as an appraisal criteria for digital content.

However, session participants also noted that  digital files pose their own special problems. Digital storage may be cheap and getting cheaper, but institutions with digital collections will still require server space for storage, and staff and resources to process, preserve, and provide access to them. And what about more complex digital objects, like audio and video files, research data, and web archives? Preservation quality versions of these files can be enormous and quickly consume all of your available storage space. Maintaining these types of content at scale may require powerful, expensive processing workstations, and more sophisticated metadata and indexing information to ensure their long-term preservation and accessibility.

Ultimately the participants agreed that any decision to acquire (or not acquire) a collection should align with an organization’s core collection development policies. Organizations still struggling may want to create a decision matrix that weighs the costs and benefits of acquisition of different types of content alongside those collection development policies. Such tools, along with staff training, would be helpful for personnel to use when making decisions about whether to accept potential digital acquisitions. Archives also need to appropriately plan for and allocate resources for the long-term preservation and management of collections—digital and physical. For digital collections this type of planning accounts not only for one-time costs of hardware and software purchases, but also for equipment replacement, upgrades, and migration, human resources to operate and manage the equipment, and other overhead. These costs should be annualized and accounted for in the same way as for annual plant operations and maintenance fees, facility rental/lease fees or mortgage payments.

Forever is a long time, and can be difficult to conceptualize in a digital environment. Archival collection policies should be subject to reconsideration, and collection decisions to reappraisal. One participant noted that in the past, archivists regularly excluded things based on format— turning down a paper collection that too voluminous to handle–so archivists can anticipate the conversation will continue in the digital realm.

Vote for the Winner: Caption These Bits!

Last week, we kicked off BloggERS’s caption contest, Caption These Bits!, by asking readers to submit captions for our first image. Thanks to everyone who contributed ideas–the witty electronic records humor out there is inspiring!

The BloggERS editorial team voted for the top three captions, and now we need your help choosing the winner. Cast your vote for your favorite caption by 3/23, and then we’ll announce the winner.

 

First_National_Bank_computer_demonstration(1)

Source: N09-033_a, Tracy O’Neal Photographic Collection, 1923-1975, Photographic Collection. Special Collections and Archives, Georgia State University Library. http://131.96.12.97/cdm/ref/collection/oneal/id/511 via DPLA: http://131.96.12.97/cdm/ref/collection/oneal/id/511

Announcing the BloggERS Caption Contest: Caption These Bits!

Calling all metadata masters, description divas, and preservation pun connoisseurs…

We’re excited to kick off a new repeating bloggERS feature: Caption These Bits!

Once a month, bloggERS will invite readers to submit captions for images related to electronic records and the history of technology, sourced from archives around the world. Submit your caption below by 3/17. Digital archives, preservation, and curation humor encouraged.

We’ll choose three finalists and invite readers to vote for the winner.

This month’s image:

First_National_Bank_computer_demonstration(1)Source: N09-033_a, Tracy O’Neal Photographic Collection, 1923-1975, Photographic Collection. Special Collections and Archives, Georgia State University Library. http://131.96.12.97/cdm/ref/collection/oneal/id/511 via DPLA: http://131.96.12.97/cdm/ref/collection/oneal/id/511

CAPTION THESE BITS!

Big Data and Big Challenges for Archives

The following is a post by Glen McAninch based on a breakout session at the ERS section meeting of last year’s SAA annual meeting.

What is “big data” and how does it relate to what archivists do? Many of us, particularly those outside the Federal Government, private technology companies, and research based universities, will doubt that they will ever have to deal with “big data,” but the topic addresses issues that those of us who manage electronic records collections are facing more and more. No doubt most archivists are beginning to acquire increasingly large collections of electronic records that challenge our abilities to process them, preserve them, and provide access to them.

11711725656_fbe0919b55_z
Image courtesy of Stafano Bertolo.

So, what is big data? According to Gartner Research, it is “high-volume, high-velocity, high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” Traditionally, computers excelled at manipulating structured data, but increasingly businesses need to be able to integrate and make sense of data from a wide range of sources, in a variety of formats, at various levels of structure and cleanliness, and it needs to do so in a timely fashion.[1]

Most archivists are acquiring collections that are rapidly increasing in volume and complexity and may fit this definition of big data. It is chiefly a matter of scale that separates most of us from the data tsunami faced by some institutions. Can archivists learn processes and acquire tools from those who are using big data sets for non-archival purposes?

The current use of big data is basically for analytics or research rather than to document specific activities. Data analytics tools allow researchers to manipulate and analyze data stored in multiple formats. Issues with big data of greatest concern for archivists are:

  • Appraisal involves selection of records, some of which may be useful for analytic research. When acquiring unstructured and structured records, it is important for archivists to carefully document the context of the original record so that researchers who use big data tools have the proper framework to do analysis.
  • Searching, one of the long suits of “big data” tools, can be leveraged to help archivists improve access to massive amounts of records in multiple formats.
  • Big data sets are often not managed in the traditional way that archivists and records managers select records for long-term retention. Thus, the emphasis of big data tools, such as Apache’s Hadoop, is on analysis of objects and not on the management of records like is done in structured databases. This makes many big data tools unsuited for archival management and preservation needs.
  • How will you and your institution acquire big data? From whom? Will this data come directly from those who collect it, for instance, your university’s department of institutional research? If so, do you want to acquire the full set of raw data or are you only interested in the different outputs and analyses performed on that data? Or will you acquire the work of researchers who had obtained copies of electronic records, extracted selected content from those records, mashed that content up with data from other sources, and then performed analyses on that data? Is the goal of acquisition to allow future researchers to reinterpret and reanalyze the data or is your goal to document the information that informed certain decisions at an institution?
  • Privacy, the fear that big brother is watching us, is a popular issue that is often associated with big data and archivists need to address that fear through access restrictions and redaction. Visualization tools are increasingly being used to appraise records and establish links between records, particularly for large e-mail projects. Additionally, users of high volume data have made advances in using crowd sourcing, face recognition, and other techniques that archivists are adapting.

Projects to watch:

  • Brown Dog is a collaborative big data management project based on the integration of heterogeneous datasets and multi-source historical and digital collections.
  • Tools like the CI-BER treemap GIS interface to NARA records, the visual analysis tools being developed by Maria Esteva at the Texas Advanced Computing Center, and Kenton McHenry’s 1940 Census big data analysis, indexing, and visualization at the National Center for Supercomputing Applications (now part of Brown Dog) provide good examples of adapting big data techniques to the mission and spirit of the archival profession.

[1] http://radar.oreilly.com/2012/01/what-is-big-data.html. Accessed 12/17/2014.