NINCH >> Computer Sciences and Humanities >> Working Group
Archiving Digital Material
Proposal - December 1998
COMPUTER SCIENCE AND TELECOMMUNICATIONS BOARD
NATIONAL RESEARCH COUNCIL
2101 Constitution Avenue, N.W.
Washington, D.C. 20418
SCOPE
CONTEXT
Policy
Context
Technical
Context
PLAN OF
ACTION
Statement of
Task
Expertise Required
Preliminary Work Plan
Product & Dissemination Plan
SCOPE
Dramatic growth in the quantity of materials in digital form and
the rapid pace of technological change challenge existing archival
practices and institutions responsible for providing long term access
to information.
Concern over the potential loss of a broad array of information in
digital form, ranging from cultural items to government records to
scientific data is prompting increased attention to addressing
digital archives issues. Obsolescence of hardware and software,
fragility of electronic media, and lack of attention or resources all
contribute to the risk that digital information will become
inaccessible over time.
The humanities community, which has begun to explore opportunities
for large scale digitization projects, has expressed particular
concern about the lack of an adequate infrastructure and extensible
technical platforms for archiving digital material.
This project will describe the pros and cons of various
technological approaches, policy options, legal frameworks,
institutional responsibilities, and business models. The study will
provide guidance for moving from short term actions required to save
critical digital information to long term solutions for ensuring
availability of digital material over time.
Special attention will be paid to issues raised by humanists about
the need to maintain the historical and cultural record in the
digital environment. The analysis would be valuable to those charged
with setting policy within government and others responsible for
establishing organizational requirements for digital archives. The
study also would contribute to long-term access to digital material
by the public and for research and educational purposes.
CONTEXT
Policy Context
The arts and humanities employ computing technology increasingly
for such activities as conducting historical studies or enhancing
access to rare materials. A number of large scale digitization
efforts to convert cultural materials are underway or being
considered. In addition, the growth of knowledge originating in
digital format and greater use of computers and communities in the
humanities necessitates resolution of issues associated with
preservation of cultural material in digital formats and development
of approaches that will ensure future retention of the existing
knowledge base. Among the concerns expressed by humanists is how to
ensure that knowledge is accurately represented as it moves from
tangible artifacts to digital representations and over time migrates
from one digital platform to another. While a number of notable
individual IT projects in the humanities have been undertaken, the
existence of a scaleable capacity for ongoing storage and retrieval
of digital material does not yet exist. The growing use of digital
materials in education creates a further imperative to provide
direction for long term access to these electronic resources.
Several efforts in the library community have focused on the issue
of digital archives and the implications for traditional library
activities. The Commission on Preservation and Access and the
Research Libraries Group created a Task Force on Digital Archiving.
The Task Force issued a report
that called for a national system of "repositories of digital
information that are collectively responsible for the long-term
accessibility of the nation's social, economic, cultural and
intellectual heritage instantiated in digital form." The report
highlighted the need to address the legal, institutional, and
economic barriers to preservation, as well as the technical
challenges. Among the more difficult issues is the identification of
institutions that have the responsibility and legal authority for
maintaining a digital archive of last resort for the purpose of long
term preservation. 1
The challenges of establishing and maintaining digital archives
exists in other domains as well. With increasing amounts of material
either originating in or being converted to digital formats,
policymakers are confronted with the need to maintain access to
digital material that is vital for government operations, scientific
research, and public access to government information. Preservation
of the nation's historical record and massive amounts of scientific
data are threatened by the lack of policies and viable technical
approaches to the problem. For example, a combination of pressures to
reduce costs, enhance access, increase efficiency, and modernize
aging technical infrastructures within government agencies are
contributing to an escalation in the amount of material that is
available solely in digital format. Much of that material resides on
either internal systems or on agency web sites that often operate
without established procedures or policy directives. Many of the web
sites are maintained by individuals and are not yet institutionalized
within agency management structures. Efforts to establish
government-wide policies and practices have had limited success. The
potential for creating a "digital gap" in recorded history continues
to grow as more information exists solely in electronic form. The
courts recently rejected guidelines proposed by the National Archives
and Records Administration (NARA) for preserving electronic
government records, thereby focusing increased attention on the need
for resolution of this problem.
Because of the early and heavy reliance on computing technologies
in scientific disciplines and the increased use of large scale
instruments and sensing devices that collect data, scientists are
increasingly concerned with the need for digital archives. From the
collection of data through the use of networks to collaborate and
ultimately disseminate the results of scientific research, the
production of large scale databases for scientific inquiry is now
fundamental to the research enterprise and significant funds continue
to be committed to developing these digital resources. Preserving
Scientific Data on Our Physical Universe: A Strategy for Archiving
the Nationís Scientific Information resources, 1995 (CPSMA)
provided an important analysis of this issue.
Despite a number of specific initiatives and an increased level of
interest, considerable uncertainty about digital archives remains and
there is no consensus on directions for approaching the problem.
What is required is the development of a strategy that will ensure
the preservation of critical digital information in the short term
and identify options for creating the combined technical and
institutional infrastructure needed for long term solutions. The
multi-dimensional nature of the problem and the interdependencies
among the array of technological, organizational, legal, and economic
issues involved make it particularly challenging. Consensus does not
exist on standards and requisite metadata elements, although some
efforts to develop standards are underway. Uncertainty about the
financial costs and the possibilities for recouping expenses hamper
investment in digital archives by owners, distributors, and
traditional library and archival institutions. A number of related
legal issues concerning intellectual property rights, licensing
arrangements, and requirements for authentication increase the
complexity of the issue. Furthermore, the roles of different
stakeholders, such as government agencies, libraries, archives,
professional societies, authors, and publishers, as well as the
relationships among them have yet to be clarified, especially
concerning responsibilities for providing long term access to digital
materials.
Technical Context
Some digital materials are created by conversion from other media,
but an increasing amount originates in digital form. Material
converted from other formats can lose the intellectual integrity of
the original presentation and the intended representation of
knowledge. Information ìbornî digital poses special
challenges for preservation because it may vanish before a systematic
effort to maintain it can be undertaken. The World Wide Web is an
enormous digital resource reflecting many of the problems inherent in
preserving digital material. It continues to grow so rapidly that
new sites are added and existing ones disappear before the
information they contain can be captured. The World Wide Web is
growing at the estimated rate of 1.5 million pages a day, and thus
the number of pages that disappear yearly continues to grow
exponentially as well. In addition, the Web is dynamic and
distributed. Online material is manipulated in novel ways to produce
unique and often transient views of digital information. Virtual
documents are composed by linking in real time to geographically
dispersed resources at multiple sites. Preserving the links among
different sources of digital information and capturing the content of
the linked material, as well as the core item can prove problematic.
How to identify what should be archived and developing ways to
accurately represent and provide access to such material over time
are major questions.
The fragility of digital media is well recognized and substantial
concerns exist about the longevity of different storage media. The
rapid obsolescence of different software and hardware platforms
create even greater problems for long term access. While some
producers provide for upward compatibility when new software and
hardware releases are made, vast amounts of digital information
reside on media or are dependent on software that is no longer
available. Other problems include migrating file formats from one
system to another and establishing standards that can facilitate the
migration of data to new environments. Standards for describing the
contents of digital documents and for ensuring their authenticity and
integrity require further development.
One option for preserving digital material is to create a paper
artifact of the item, so that it can be preserved using traditional
archival techniques. This approach was recommended by the National
Archives for electronic federal records, but was considered
inadequate by the courts. Other approaches to preserving digital
material currently being debated include:
- Continually migrating digital material to new software and
hardware platforms as the physical media deteriorate or the
existing formats are superseded by new technology.
- Establishing long term standards and descriptive formats that
have longer life spans.
- Emulating the original (but now obsolete) system upon which
digital material operated to create equivalent access and
functionality.
Significant research is needed to identify the most promising
directions for long term preservation and to establish a path for
ensuring the persistence of digital material for future use.
PLAN OF ACTION
Statement of Task
This project will examine the challenges associated with
establishing and maintaining digital archives and provide
recommendations on directions to ensure the long term preservation of
digital material. It will explore possible scenarios for creating
digital archives, including an analysis of technology options, policy
approaches, legal frameworks, and business models. It will pay
particular attention to the inter-relationship among the legal,
technical, institutional, and economic factors required to create a
viable infrastructure for establishing digital archives. It will
address the particular issues associated with digital archiving of
humanities materials.
The project will seek to provide a road map for moving from short
term solutions to a more sustained long term institutional, policy,
legal, and technical framework for maintaining access to digital
material. Pros and cons of different approaches will be analyzed and
major research challenges identified.
Expertise Required
This project will require technical expertise in database design,
storage technologies, retrieval systems, digital formats and media,
and related standards. Other perspectives required include those of
humanists, archivists, librarians, educators, scientists, publishers,
and legal experts. Nominations for the study committee will be
solicited from CSTB and other Boards within the National Research
Council, as well as from a broad range of other sources.
Preliminary Work Plan
CSTB will assemble a study committee of approximately 12-14
members with expertise in the areas outlined above. The committee
will attempt to identify the range of technical and policy options
that might be used to establish archives of digital materials.
Furthermore, through briefings and outreach to key stakeholders
(e.g., a workshop in the fact-finding stage), it will seek to
understand the pros and cons of different options. A key focus of
the committeeís work will be on the interrelationship among
the institutional, legal, economic, and technical considerations
involved in establishing digital archives and the steps required to
move from short term solutions to the infrastructure requirements for
long term retention of digital material. The committee will attempt
to answer questions such as:
- What research is needed to advance technologies for long term
storage? How can storage and maintenance costs be reduced to
make large scale digital archives more economically viable? What
technologies need to be developed to allow for continued access to
and use of digital materials?
- What methods can be employed to guarantee authenticity and
intellectual integrity of archived information? How can knowledge
be represented accurately to meet the requirements of different
disciplines?
- Is migration of digital material over time to new formats or
systems the optimal approach? Will emulation provide an
alternative to migration? What other technology options exist to
refresh digital information and ensure that access to it persists
despite rapid technological change and the short-term nature of
digital media? How are risk assessments performed to determine
what information is in jeopardy of being lost?
- How are viable standards set for archival quality of digital
materials? What are the limitations of standards given the fast
pace of technological change? How can standards evolve over
time? What standards for metadata are needed?
- What institutions bear responsibility for maintaining digital
archives and have the legal authority to ensure preservation of
digital material for the benefit of society at large? Within
institutions, where do responsibilities lie? Is there an
institution or set of institutions of last resort for digital
materials? How do the roles of creators, distributors, and
libraries and archives change in terms of responsibilities for
digital materials?
- How do intellectual property issues get resolved for archiving
digital information? How does the shift outright ownership to
licensing access to digital material through contractual
agreements alter existing models for long term access to
information?
- What business models might be appropriate for financing
digital archives? What role will commercial information providers
have in sustaining digital archives? What type of financial
arrangements among institutions could support long-term
maintenance of digital materials?
- What are the inter-relationships among the institutional,
legal, economic, and technical issues that need to be considered
in developing approaches to digital archives?
The committee will convene in 6 meetings during the course of the
study to solicit input from outside parties, deliberate over its
findings and recommendations, and prepare its final report. The
budget provides for input to be gathered from a wide range of
stakeholders. The results of the committeeís deliberations
will be summarized in a final report to be delivered to the sponsor
24 months from the date the contract is awarded. An additional two
months of time are budgeted for dissemination activities.
Product and Dissemination Plan
The principal product of this project will be a report summarizing
the committeeís findings and recommendations. The report will
be subject to National Research Council review procedures. The
dissemination will be targeted broadly to government policy makers in
both the legislative and executive branches and different stakeholder
groups, such as archivists, librarians, humanists, and scientists.
Given the uncertainty about which directions are likely to prevail,
commercial information providers and consumers also stand to benefit
from an objective analysis across these dimensions. The report will
be made available on the Internet via the National Academiesí
World Wide Web server, as well as in paper form. Additional efforts
will be made to disseminate the reportís findings and
recommendations to interested parties in government, academia, and
industry; through participation in relevant conferences and by
publication of summary articles in relevant journals, as appropriate.
1. Commission on Library and Information Resources and
Research Library Group.
Preserving Digital
Information: Report of the Task Force on Archiving of Digital
Information. 1996.
COMPUTER SCIENCE AND TELECOMMUNICATIONS BOARD
NATIONAL RESEARCH COUNCIL
2101 Constitution Avenue, N.W., Washington, D.C. 20418
OFFICE LOCATION:
MILTON HARRIS BLDG. RM. 560
2001 WISCONSIN AVENUE, NW
(OFF WHITEHAVEN STREET)
WASHINGTON, D.C. 20007
PHONE: (202) 334-2605
FAX: (202) 334-2318
INTERNET: CSTB@NAS.EDU
WORLD WIDE WEB: WWW2.NAS.EDU/CSTBWEB
The National Research Council is the principal operating agency of
the National Academy of Sciences and the National Academy of
Engineering to serve government and other organizations. The
Computer Science and Telecommunications Board addresses national,
scientific and policy issues in computing science,
telecommunications, and computer technology and their
applications.
|