NINCH >> Computer Sciences and Humanities >> Working Group

HEADLINE: Archiving Digital Material

Proposal - December 1998

COMPUTER SCIENCE AND TELECOMMUNICATIONS BOARD
NATIONAL RESEARCH COUNCIL
2101 Constitution Avenue, N.W.
Washington, D.C. 20418

SCOPE
CONTEXT
Policy Context
Technical Context
PLAN OF ACTION
Statement of Task
Expertise Required
Preliminary Work Plan
Product & Dissemination Plan

SCOPE

Dramatic growth in the quantity of materials in digital form and the rapid pace of technological change challenge existing archival practices and institutions responsible for providing long term access to information.

Concern over the potential loss of a broad array of information in digital form, ranging from cultural items to government records to scientific data is prompting increased attention to addressing digital archives issues. Obsolescence of hardware and software, fragility of electronic media, and lack of attention or resources all contribute to the risk that digital information will become inaccessible over time.

The humanities community, which has begun to explore opportunities for large scale digitization projects, has expressed particular concern about the lack of an adequate infrastructure and extensible technical platforms for archiving digital material.

This project will describe the pros and cons of various technological approaches, policy options, legal frameworks, institutional responsibilities, and business models. The study will provide guidance for moving from short term actions required to save critical digital information to long term solutions for ensuring availability of digital material over time.

Special attention will be paid to issues raised by humanists about the need to maintain the historical and cultural record in the digital environment. The analysis would be valuable to those charged with setting policy within government and others responsible for establishing organizational requirements for digital archives. The study also would contribute to long-term access to digital material by the public and for research and educational purposes.

 

CONTEXT

Policy Context

The arts and humanities employ computing technology increasingly for such activities as conducting historical studies or enhancing access to rare materials. A number of large scale digitization efforts to convert cultural materials are underway or being considered. In addition, the growth of knowledge originating in digital format and greater use of computers and communities in the humanities necessitates resolution of issues associated with preservation of cultural material in digital formats and development of approaches that will ensure future retention of the existing knowledge base. Among the concerns expressed by humanists is how to ensure that knowledge is accurately represented as it moves from tangible artifacts to digital representations and over time migrates from one digital platform to another. While a number of notable individual IT projects in the humanities have been undertaken, the existence of a scaleable capacity for ongoing storage and retrieval of digital material does not yet exist. The growing use of digital materials in education creates a further imperative to provide direction for long term access to these electronic resources.

Several efforts in the library community have focused on the issue of digital archives and the implications for traditional library activities. The Commission on Preservation and Access and the Research Libraries Group created a Task Force on Digital Archiving. The Task Force issued a report that called for a national system of "repositories of digital information that are collectively responsible for the long-term accessibility of the nation's social, economic, cultural and intellectual heritage instantiated in digital form." The report highlighted the need to address the legal, institutional, and economic barriers to preservation, as well as the technical challenges. Among the more difficult issues is the identification of institutions that have the responsibility and legal authority for maintaining a digital archive of last resort for the purpose of long term preservation. 1

The challenges of establishing and maintaining digital archives exists in other domains as well. With increasing amounts of material either originating in or being converted to digital formats, policymakers are confronted with the need to maintain access to digital material that is vital for government operations, scientific research, and public access to government information. Preservation of the nation's historical record and massive amounts of scientific data are threatened by the lack of policies and viable technical approaches to the problem. For example, a combination of pressures to reduce costs, enhance access, increase efficiency, and modernize aging technical infrastructures within government agencies are contributing to an escalation in the amount of material that is available solely in digital format. Much of that material resides on either internal systems or on agency web sites that often operate without established procedures or policy directives. Many of the web sites are maintained by individuals and are not yet institutionalized within agency management structures. Efforts to establish government-wide policies and practices have had limited success. The potential for creating a "digital gap" in recorded history continues to grow as more information exists solely in electronic form. The courts recently rejected guidelines proposed by the National Archives and Records Administration (NARA) for preserving electronic government records, thereby focusing increased attention on the need for resolution of this problem.

Because of the early and heavy reliance on computing technologies in scientific disciplines and the increased use of large scale instruments and sensing devices that collect data, scientists are increasingly concerned with the need for digital archives. From the collection of data through the use of networks to collaborate and ultimately disseminate the results of scientific research, the production of large scale databases for scientific inquiry is now fundamental to the research enterprise and significant funds continue to be committed to developing these digital resources. Preserving Scientific Data on Our Physical Universe: A Strategy for Archiving the Nationís Scientific Information resources, 1995 (CPSMA) provided an important analysis of this issue.

Despite a number of specific initiatives and an increased level of interest, considerable uncertainty about digital archives remains and there is no consensus on directions for approaching the problem. What is required is the development of a strategy that will ensure the preservation of critical digital information in the short term and identify options for creating the combined technical and institutional infrastructure needed for long term solutions. The multi-dimensional nature of the problem and the interdependencies among the array of technological, organizational, legal, and economic issues involved make it particularly challenging. Consensus does not exist on standards and requisite metadata elements, although some efforts to develop standards are underway. Uncertainty about the financial costs and the possibilities for recouping expenses hamper investment in digital archives by owners, distributors, and traditional library and archival institutions. A number of related legal issues concerning intellectual property rights, licensing arrangements, and requirements for authentication increase the complexity of the issue. Furthermore, the roles of different stakeholders, such as government agencies, libraries, archives, professional societies, authors, and publishers, as well as the relationships among them have yet to be clarified, especially concerning responsibilities for providing long term access to digital materials.

 

Technical Context

Some digital materials are created by conversion from other media, but an increasing amount originates in digital form. Material converted from other formats can lose the intellectual integrity of the original presentation and the intended representation of knowledge. Information ìbornî digital poses special challenges for preservation because it may vanish before a systematic effort to maintain it can be undertaken. The World Wide Web is an enormous digital resource reflecting many of the problems inherent in preserving digital material. It continues to grow so rapidly that new sites are added and existing ones disappear before the information they contain can be captured. The World Wide Web is growing at the estimated rate of 1.5 million pages a day, and thus the number of pages that disappear yearly continues to grow exponentially as well. In addition, the Web is dynamic and distributed. Online material is manipulated in novel ways to produce unique and often transient views of digital information. Virtual documents are composed by linking in real time to geographically dispersed resources at multiple sites. Preserving the links among different sources of digital information and capturing the content of the linked material, as well as the core item can prove problematic. How to identify what should be archived and developing ways to accurately represent and provide access to such material over time are major questions.

The fragility of digital media is well recognized and substantial concerns exist about the longevity of different storage media. The rapid obsolescence of different software and hardware platforms create even greater problems for long term access. While some producers provide for upward compatibility when new software and hardware releases are made, vast amounts of digital information reside on media or are dependent on software that is no longer available. Other problems include migrating file formats from one system to another and establishing standards that can facilitate the migration of data to new environments. Standards for describing the contents of digital documents and for ensuring their authenticity and integrity require further development.

One option for preserving digital material is to create a paper artifact of the item, so that it can be preserved using traditional archival techniques. This approach was recommended by the National Archives for electronic federal records, but was considered inadequate by the courts. Other approaches to preserving digital material currently being debated include:

  • Continually migrating digital material to new software and hardware platforms as the physical media deteriorate or the existing formats are superseded by new technology.
  • Establishing long term standards and descriptive formats that have longer life spans.
  • Emulating the original (but now obsolete) system upon which digital material operated to create equivalent access and functionality.

Significant research is needed to identify the most promising directions for long term preservation and to establish a path for ensuring the persistence of digital material for future use.

PLAN OF ACTION

 

Statement of Task

This project will examine the challenges associated with establishing and maintaining digital archives and provide recommendations on directions to ensure the long term preservation of digital material. It will explore possible scenarios for creating digital archives, including an analysis of technology options, policy approaches, legal frameworks, and business models. It will pay particular attention to the inter-relationship among the legal, technical, institutional, and economic factors required to create a viable infrastructure for establishing digital archives. It will address the particular issues associated with digital archiving of humanities materials.

The project will seek to provide a road map for moving from short term solutions to a more sustained long term institutional, policy, legal, and technical framework for maintaining access to digital material. Pros and cons of different approaches will be analyzed and major research challenges identified.

Expertise Required

This project will require technical expertise in database design, storage technologies, retrieval systems, digital formats and media, and related standards. Other perspectives required include those of humanists, archivists, librarians, educators, scientists, publishers, and legal experts. Nominations for the study committee will be solicited from CSTB and other Boards within the National Research Council, as well as from a broad range of other sources.

Preliminary Work Plan

CSTB will assemble a study committee of approximately 12-14 members with expertise in the areas outlined above. The committee will attempt to identify the range of technical and policy options that might be used to establish archives of digital materials. Furthermore, through briefings and outreach to key stakeholders (e.g., a workshop in the fact-finding stage), it will seek to understand the pros and cons of different options. A key focus of the committeeís work will be on the interrelationship among the institutional, legal, economic, and technical considerations involved in establishing digital archives and the steps required to move from short term solutions to the infrastructure requirements for long term retention of digital material. The committee will attempt to answer questions such as:

  • What research is needed to advance technologies for long term storage? How can storage and maintenance costs be reduced to make large scale digital archives more economically viable? What technologies need to be developed to allow for continued access to and use of digital materials?
  • What methods can be employed to guarantee authenticity and intellectual integrity of archived information? How can knowledge be represented accurately to meet the requirements of different disciplines?
  • Is migration of digital material over time to new formats or systems the optimal approach? Will emulation provide an alternative to migration? What other technology options exist to refresh digital information and ensure that access to it persists despite rapid technological change and the short-term nature of digital media? How are risk assessments performed to determine what information is in jeopardy of being lost?
  • How are viable standards set for archival quality of digital materials? What are the limitations of standards given the fast pace of technological change? How can standards evolve over time? What standards for metadata are needed?
  • What institutions bear responsibility for maintaining digital archives and have the legal authority to ensure preservation of digital material for the benefit of society at large? Within institutions, where do responsibilities lie? Is there an institution or set of institutions of last resort for digital materials? How do the roles of creators, distributors, and libraries and archives change in terms of responsibilities for digital materials?
  • How do intellectual property issues get resolved for archiving digital information? How does the shift outright ownership to licensing access to digital material through contractual agreements alter existing models for long term access to information?
  • What business models might be appropriate for financing digital archives? What role will commercial information providers have in sustaining digital archives? What type of financial arrangements among institutions could support long-term maintenance of digital materials?
  • What are the inter-relationships among the institutional, legal, economic, and technical issues that need to be considered in developing approaches to digital archives?

The committee will convene in 6 meetings during the course of the study to solicit input from outside parties, deliberate over its findings and recommendations, and prepare its final report. The budget provides for input to be gathered from a wide range of stakeholders. The results of the committeeís deliberations will be summarized in a final report to be delivered to the sponsor 24 months from the date the contract is awarded. An additional two months of time are budgeted for dissemination activities.

Product and Dissemination Plan

The principal product of this project will be a report summarizing the committeeís findings and recommendations. The report will be subject to National Research Council review procedures. The dissemination will be targeted broadly to government policy makers in both the legislative and executive branches and different stakeholder groups, such as archivists, librarians, humanists, and scientists. Given the uncertainty about which directions are likely to prevail, commercial information providers and consumers also stand to benefit from an objective analysis across these dimensions. The report will be made available on the Internet via the National Academiesí World Wide Web server, as well as in paper form. Additional efforts will be made to disseminate the reportís findings and recommendations to interested parties in government, academia, and industry; through participation in relevant conferences and by publication of summary articles in relevant journals, as appropriate.

 

 

1. Commission on Library and Information Resources and Research Library Group. Preserving Digital Information: Report of the Task Force on Archiving of Digital Information. 1996.

 


COMPUTER SCIENCE AND TELECOMMUNICATIONS BOARD
NATIONAL RESEARCH COUNCIL
2101 Constitution Avenue, N.W., Washington, D.C. 20418

OFFICE LOCATION:
MILTON HARRIS BLDG. RM. 560
2001 WISCONSIN AVENUE, NW
(OFF WHITEHAVEN STREET)
WASHINGTON, D.C. 20007
PHONE: (202) 334-2605
FAX: (202) 334-2318
INTERNET: CSTB@NAS.EDU
WORLD WIDE WEB: WWW2.NAS.EDU/CSTBWEB

 

The National Research Council is the principal operating agency of the National Academy of Sciences and the National Academy of Engineering to serve government and other organizations. The Computer Science and Telecommunications Board addresses national, scientific and policy issues in computing science, telecommunications, and computer technology and their applications.