Definition[]
Information provenance is the accurate historical record of an information object such as a digital text, an image, or an audio file.
Overview[]
Provenance begins with identification of the original form and authorship of an object or its constituent components and continues with identification of each subsequent alteration to the object. Provenance information can include not only what was changed but also who or what produced the change, when the change was made, and other attributes of the object. As reliance on networked information and transactional processes grows, the need for technical means of establishing information provenance becomes increasingly important. The goal of information provenance capabilities is to track the pedigree of a digital object from its origin through all transformations leading to the current state.
The provenance of information can help a user determine whether to trust it and how to interpret it. Information provenance techniques are also needed to control information sharing. Partners (e.g., allies, collaborators, or corporations engaged in a joint project) generally want to share information, but only to a limited extent. For example, there may be a constraint that information of certain types can be shared only among certain partners. Enforcing such constraints is complicated by alternate data formats and by transformations that combine data and deliver derived results. For example, the classification level of information derived from otherwise unclassified sources may prevent its public release. In addition, as diverse datasets are combined, accurate information may be interspersed with inaccurate information. To verify the provenance of the information, data about its source and derivation (or aggregation) must be propagated with the information itself.
Information provenance combines key concepts from operating system security, access control, and authentication. R&D advances have enabled application of some information provenance techniques in such areas as security software and management of large-scale collections of digital materials, often referred to as digital libraries. But next-generation versions of related technologies, such as metadata processing, taxonomies and ontologies, and digital rights management need to be integrated into more sophisticated information provenance capabilities.