Digital preservation refers broadly to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary, such as collection, description, migration and redundant storage.
Digital preservation is
|“||[t]he series of managed activities, policies, strategies and actions to ensure the accurate rendering of digital content for as long as necessary, regardless of the challenges of media failure and technological change.||”|
Reading and understanding information in digital form requires equipment and software, which is changing constantly and may not be available within a decade of its introduction. Who today has a punched card reader, a Dectape drive, or a working copy of FORTRAN II? Even newer technology such as 9-track tape is rapidly becoming obsolete. We cannot save the machines if there are no spare parts available, and we cannot save the software if no one is left who knows how to use it.
Rapid changes in the means of recording information, in the formats for storage, and in the technologies for use threaten to render the life of information in the digital age as, to borrow a phrase from Hobbes, "nasty, brutish and short." Some information no doubt deserves such a fate, but numerous examples illustrate the danger of losing valuable cultural memories that may appear in digital form. Consider, for example, the now famous, but often misrepresented, case of the 1960 Census.
As it compiled the decennial census in the early sixties, the Census Bureau retained records for its own use in what it regarded as “permanent” storage. In 1976, the National Archives identified seven series of aggregated data from the 1960 Census files as having long-term historical value. A large portion of the selected records, however, resided on tapes that the Bureau could read only with a UNIVAC type II-A tape drive. By the mid-seventies, that particular tape drive was long obsolete, and the Census Bureau faced a significant engineering challenge in preserving the data from the UNIVAC type II-A tapes. By 1979, the Bureau had successfully copied onto industry-standard tapes nearly all the data judged then to have long-term value.
Though largely successful in the end, the data rescue effort was a signal event that helped move the Committee on the Records of Government six years later to proclaim that “the United States is in danger of losing its memory.” The Committee did not bother to describe the actual details of the migration of the 1960 census records. Nor did it analyze the effects on the integrity of the constitutionally-mandated census of the nearly 10,000 (of approximately 1.5 million) records of aggregated data that the rescue effort did not successfully recover. Instead, it chose to register its warning on the dangers of machine obsolescence in apocryphal terms. With more than a little hyperbole, it wrote that “when the computer tapes containing the raw data from the 1960 federal census came to the attention of NARS [the National Archives and Records Service], there were only two machines in the world capable of reading those tapes: one in Japan and the other already deposited in the Smithsonian as a relic.”
Other examples lack the memorable but false details accompanying the 1960 Census story, but they do equally illustrate how readily we can lose our heritage in electronic form when the custodian makes no plans for long-term retention in a changing technical environment. In 1964, the first electronic mail message was sent from either the Massachusetts Institute of Technology, the Carnegie Institute of Technology or Cambridge University. The message does not survive, however, and so there is no documentary record to determine which group sent the pathbreaking message. Satellite observations of Brazil in the 1970s, critical for establishing a time-line of changes in the Amazon basin, are also lost on the now obsolete tapes to which they were written.
Today, information technologies that are increasingly powerful and easy to use, especially like those that support the World Wide Web, have unleashed the production and distribution of digital information. Such information is penetrating and transforming nearly every aspect of our culture. Effective preservation for future generations of the portion of the rapidly expanding corpus of information in digital form that represents our cultural record, it is necessary to understand the costs of doing so and to commit society technically, legally, economically and organizationally to the full dimensions of the task. Failure to look for trusted means and methods of digital preservation will exact a stiff, long-term cultural penalty.
The materials subject to digital preservation may be born digital or be the products of digitization projects. Preservation is critical in the digital context to ensure continued long term access to historically, scientifically and socially valuable materials, so that future generations will be able to benefit from works created in the present day.
Libraries, archives and other preservation institutions have been responsible for much of the preservation of analog works that has occurred in past centuries. Many books, musical compositions, drawings and other works are still available today for scholars and historians to read, hear and see because of the preservation efforts of these institutions.
The astounding growth of digital information has outpaced the evolution of institutions and policies to support its preservation. Increasingly, our cultural and intellectual assets are being created and distributed in digital format. They include news and blogs, videos, multiplayer games, music, film, and books, as well as corporate and business documents, e-mails, and scientific and legal records. If properly managed, these assets will become the raw material from which future knowledge will be built and from which historians will reconstruct the complete story of our unique and changing times. Yet the solutions for long-term preservation of these materials are still very much in the development stages. Meeting the new challenges presented by digital preservation will require public policy support to encourage the development of the necessary infrastructure, technologies, and preservation practices.
Digital preservation activities are undertaken today by a range of preservation institutions, including for example by libraries, archives and museums. Such institutions may operate independently or may be located within other bodies such as educational institutions or government entities.
Digital preservation is complex and expensive: it requires a technical infrastructure and expertise that may not be readily available in traditional cultural memory institutions. It also requires the ability to conduct long-term planning and careful, secure data management and manipulation. Moreover, the costs of digital preservation are front loaded: decisions concerning what to preserve need to be made early in the life of the work. Digital materials need to be actively managed, not just left on shelves. Owing to the inherent instability of many digital media formats and the frequent obsolescence of formats and equipment to render digital files readable, active steps to preserve materials, including, for instance, migrating material to preservable formats and adding standardized metadata, may need to be taken quite early in the life of a digital work. At the same time, an exponential growth in the volume of digital material being created today makes it impossible for any one institution to collect and preserve more than a small fraction of the total.
Types of digital preservation
There are three general types of digital preservation:
- Long-term preservation that provides continued access to digital materials, or at least to the information contained in them, indefinitely.
- Medium-term preservation that provides continued access to digital materials beyond changes in technology for a defined period of time, but not indefinitely.
- Short-term preservation that provides access to digital materials either for a defined period of time while use is predicted, but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology.
Works in digital form present significant challenges for preservation that most analog works do not. Many analog materials remain stable for long periods of time and require only intermittent interventions for purposes of preservation. Moreover, degradation of an analog work is usually gradual enough to provide advance warning that preservation efforts are required. For example, one can perform a fold test to determine if the paper on which a book was printed has become brittle, or smell the vinegar that signals degradation of films. Digital materials, in contrast, cannot be unattended for long: their preservation requires regular intervention. They may suffer from “bit rot,” a degradation that usually cannot be discerned by the naked eye and therefore may not be discovered until someone tries to use the work. Bit rot often renders the entire digital copy useless. Technological obsolescence is another problem for digital works. Even if their bits remain intact, the hardware and software required to access them may be difficult or impossible to obtain. Because of these characteristics, preservation of digital materials must begin at or shortly after production or acquisition.
The benefits of digital preservation must be continuously weighed against the costs. Assessment of benefits must rely extensively on input from the relevant stakeholder communities, be conducted openly, and be consistent with the mission of the organization paying for the preservation. Such assessment should include consideration of the full range of benefits, both tangible and intangible. The assessment should compare the costs of preserving a data set with the possibility and costs of regenerating the data. When reproducing data is not possible, preservation should be the preferred choice where feasible.
Cost analyses should be informed by comprehensive and reliable information. Similar analyses should be conducted for plans to digitize physical artifacts (books, documents, reference samples and specimens, etc.) for preservation and access. Recognizing that current analyses are limited by the lack of comprehensive economic theory and management frameworks for long-term digital preservation, organizations should work together to support research and development to improve the conceptual foundations and methodologies in this area.
It is clear, however, that in many cases the digital equivalents of those analog works preserved in the past are not being preserved in any systematic way, in part because digital preservation triggers copyright concerns in a way that analog preservation does not. Many of the activities involved in digital preservation, such as making multiple copies of a work, distributing copies among multiple institutions, and migrating works to new technological formats and media, involve the exercise of exclusive rights, including but not limited to the reproduction right.
Long term management of a digital work usually requires that multiple copies of the work be made over the course of its lifetime. One purpose for making copies is for security and disaster preparedness. Since it is always possible that digital works can be destroyed due to fire, flood, or other calamity, it is necessary to retain one or more redundant copies in different locations. Another purpose is to migrate information content from an old to a new technology, such as copying works from a floppy disk to a server. Access to content — either by users or by institution staff to verify its integrity — also may entail making a copy on a screen and in computer memory.
The right of distribution may be implicated by disseminating digital copies to multiple institutions to protect against catastrophic loss. And, to the extent access is required for digital preservation best practices, that access may implicate the right of public performance or public display.
(1) the material is not protected by copyright (i.e., it is in the public domain); (2) the copying is permitted under an exception in the copyright law or related legislation (e.g., pursuant to an exception for libraries, archives or other preservation institutions or legal deposit); or (3) digital preservation is undertaken by the owner of copyright in the work or with the permission of the owner.
International legal issues
The Berne Convention for the Protection of Literary and Artistic Works provides the foundation for governance of copyright law internationally. The principal modern updates to the Berne Convention — the World Intellectual Property Organization (WIPO) Copyright Treaty (WCT) and the WIPO Performances and Phonograms Treaty (WPPT), as well as the World Trade Organization’s Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPs).
Together, these agreements require members to provide authors of literary and artistic works with a number of exclusive rights with respect to their works, including the rights of reproduction, adaptation, broadcasting, public performance, communication to the public and distribution to the public, subject to certain limitations and exceptions. In addition, performers of phonograms (also referred to as sound recordings) are provided with a right of fixation, and performers and producers of phonograms are granted rights of reproduction, distribution, rental, and making available their fixed performances. All of these rights are subject to limitations and exceptions.
|“||It shall be a matter for legislation in the countries of the Union to permit the reproduction of [literary and artistic works] in certain special cases, provided that such reproduction does not conflict with a normal exploitation of the work and does not unreasonably prejudice the legitimate interests of the author.||”|
The WIPO Copyright Treaty builds upon Berne Convention’s three-step test by providing that contracting parties may provide for limitations or exceptions to the rights granted under that treaty or under the Berne Convention in “certain special cases that do not conflict with a normal exploitation of the work and do not unreasonably prejudice the legitimate interests of the author.” Art. 10. In other words, the WIPO Copyright Treaty makes the three-step test applicable to exceptions and limitations with respect to any of the rights granted to authors under either that Treaty or the Berne Convention. The WPPT similarly makes the three-step test applicable to rights granted under that treaty.
Thus, while these treaties do not mandate any exceptions or limitations specific to preservation activities or preservation institutions, the treaties do permit such exceptions or limitations, provided they comport with the three-step test.
The EU Information Society Directive permits, but does not require, members of the European Union to provide exceptions and limitations for certain activities of publicly accessible libraries, educational establishments or museums, or by archives. The permitted exceptions and limitations are: (1) for specific acts of reproduction of copyrighted works which are not for direct or indirect economic or commercial advantage, art. 5(2)(c); and (2) for use by communication or making available of copyrighted works in their collections, for the purpose of research or private study, to individual members of the public by dedicated terminals on the premises of such establishments, provided those works are not subject to purchase or licensing terms to the contrary, art. 5(3)(n).
Other countries discussed have specific exceptions for libraries and archives (and sometimes also for other preservation institutions) in their copyright laws. There are some similarities among jurisdictions, but also some significant variations. Some variations are due to unique characteristics of a particular country’s legal system. Other variations reflect the rapid pace of technological change and the fact that some countries have updated their laws more recently than others to try to accommodate library and archives and other preservation institution activities in the digital environment.
Although Australian copyright law sets out a number of exceptions designed to facilitate preservation of cultural collections, as well as a scheme requiring publishers to deposit copies of published printed material with the National Library of Australia (NLA), due to a number of significant gaps Australian federal law does not currently support compulsory collection and preservation of digital material. Digital preservation activities in Australia are thus not governed by uniform standards and requirements. However, a number of voluntary (permission based) digital archiving schemes have been in operation since the 1990s. Led by the NLA, libraries, including the libraries of educational institutions, play a primary role in digital preservation.
Major digital preservation initiatives in the Netherlands involve different types of works. The first digitization projects started in the late 1990s. Those first initiatives were taken to rescue Dutch heritage but other initiatives followed. The Koninklijke Bibliotheek, the National Library of the Netherlands (National Library or KB) plays an important role in many of the digitization projects.
There is no national strategy as such for digital preservation in the UK. There is work going on in different sectors and there are some organizations that are trying to bring these sectors together. There has been some strategic activity in science. Her Majesty’s Treasury, the (then) Department of Trade and Industry (DTI) and the (then) Department for Education and Skills identified a need for an e-infrastructure for research in 2004. A preservation and curation working group was established by the Office for Science and Innovation to focus on this specific area. This group made several recommendations, including taking into account long-term preservation when reviewing legislation, policy and codes of practice. Another recommendation was that the DTI and Research Councils should fund research by universities and industry to address challenges. Another key recommendation was that there should be a DTI funded “national information infrastructure development programme.”
The Joint Information Systems Committee (JISC), the British Library (BL) and, to a lesser extent, the National Preservation Office (NPO) were instrumental in the development of digital preservation in the libraries and archives communities. The agenda of a JISC and BL sponsored workshop in 1995 was influenced by the draft version of the RLG/CPA Task Force on Digital Archiving Report (1996) report. The JISC and the NPO followed up the recommendations in various ways, including identifying good practice and commissioning a set of studies on digital preservation. The focus of a second workshop in 1999 was preservation strategy. Many of the recommendations from this workshop related to the adoption of standards, the need for guidelines and to identify and publicize good practice. Another theme running through the recommendations was the need for coordination and cooperation. The workshop endorsed several recommendations from the JISC/NPO studies, including the need for a national forum and for training. The recommendations for research represented a move from the mainly basic exploratory activities of the JISC/NPO studies.
A national forum has now been established in the form of the Digital Preservation Coalition (DPC). The National Preservation Office had initially taken on a digital preservation role, but it has since relinquished it: the NPO is now an “allied organisation” with the DPC. The DPC has been particularly strong in raising awareness amongst the stakeholders. It has brought together information of best practice in digital preservation management in its handbook, it runs training and other events for members to share knowledge and best practice. The Digital Curation Centre (DCC) is jointly funded by the JISC and the UK’s e-Science programme. The DCC undertakes research and disseminates good practice in digital curation and preservation. The JISC continues to support the development in digital preservation through its programmes and funding of projects and initiatives.
The Research Information Network (2007) has developed “a framework of principles and guidelines” for caring for digital research data. This framework includes digital preservation. The Research Data Service (UKRDS) feasibility study is being carried out in 2008. It is funded by the Higher Education Funding Council for England (HEFCE) through its Shared Services programme, with support from JISC. The aim is to assess the feasibility and costs of developing and maintaining a national shared digital research data service for the UK Higher Education sector.
A wide range of digital preservation-related activities are taking place in the United States. These activities include assembling and maintaining digital archives, developing technical tools for digital preservation, and identifying best practices for ensuring long term availability of digital content. Much of this work is decentralized, and is occurring both in the public and in the private sectors. Entities undertaking such efforts have taken different approaches to the technical, legal and administrative issues raised by digital preservation. Standards and practices are still developing, and this is likely to continue for some time to come.
Initial efforts to bring digital information under institutional stewardship have focused largely on research materials, including scholarly literature. Digital preservation activities are now addressing a wider range of content. Current digital preservation projects encompass both “born digital” material and digitized analog material. Preservation activities extend to public domain materials as well as to those protected by copyright. In the latter case, preservation projects generally rely on copyright exceptions or on agreements with right holders.
The Library of Congress has worked collaboratively with government, academic, commercial, and professional communities across the nation on many digital preservation activities, primarily through the National Digital Information Infrastructure and Preservation Program (NDIIPP), but also through its other programs, including:
- Web Capture. The Library’s Web Capture team is charged with building a Library-wide understanding and technical infrastructure for capturing Web content. The team, in collaboration with a variety of Library staff, and national and international partners, is identifying policy issues, establishing best practices, and building tools to collect and preserve Web content. It is also a founding member of the International Internet Preservation Consortium.
- The National Audio-Visual Conservation Center (NAVCC) is the first centralized facility in America specifically planned and designed for the acquisition, cataloging, storage and preservation of the nation’s collection of moving images and recorded sounds. This collaborative initiative is the result of a partnership between the Packard Humanities Institute, the U.S. Congress, the Library of Congress and the Architect of the Capitol. The NAVCC will use state-of-the-art technologies to significantly increase preservation capacities and capabilities, and new large-scale digital acquisition and archiving systems that will serve as a prototype for the global audiovisual community.
- NDSA Glossary.
- Berne Convention, art. 9(2).
- Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonization of certain aspects of copyright and related rights in the information society.
- Tasmania and the Northern Territory are the only Australian jurisdictions to have passed laws requiring compulsory deposit of electronic material: Libraries Act 1984 (Tas), s 22; Publications (Legal Deposit) Act 2004 (NT), ss7 and 13.
- See Digitalpreservation.gov.
- Library of Congress National Digital Information Infrastructure and Preservation Program, the Joint Information Systems Committee, the Open Access to Knowledge (OAK) Law Project, and the SURFfoundation, International Study on the Impact of Copyright Law on Digital Preservation (July 2008) (full-text).
- American Memory
- Chronicling America
- Digital curation
- Digital curator
- Digital Preservation Coalition
- Law Library Legal Blawgs Web Archive
- Library of Congress Web Archives
- National Digital Newspaper Program
- Preservation data
- Preserving Virtual Worlds: Final Report
- World Digital Library