Integrity

DEFINITION: A key capability in ISO 14721 conforming digital repositories is ensuring the integrity of the records in its custody, which involves two related preservation actions. The first action generates a cryptographic hash algorithm that normalizes any digital object regardless of size or content type to a fixed length bit stream (e.g., 156 bits). This fixed length bit stream is called a hash digest and it serves as a digital fingerprint. Depending upon the "strength" of the hash digest algorithm used, it is "computationally infeasible" for two different digital objects to have the same hash digest or to reconstruct a data object from this hash digest.

The second action involves integrity fixity that supports an unbroken electronic chain of custody captured in Preservation Description Information (PDI) in AIPs. Hash digests cannot support this chain of custody because migration to newer file formats will introduce changes in the underlying bit streams. Affixing a digital signature to an AIP that authenticates it after any preservation action will mitigate this issue. Over time digital signatures support a strong, unbroken chain of electronic custody.

 
 Level 0   The archival repository has no documented procedure for integrity protection of electronic records in its custody.
 Level 1 The archival repository generates and preserves MD-5 hash digests before and after device/media renewal and other archival storage preservation actions.
 Level 2 The archival repository generates and preserves SHA-1 hash digests before and after device/media renewal and other internal preservation actions for partially conforming ISO 14721 AIPs. 
 Level 3 The archival repository generates SHA-2 hash digests before and after device/media renewal and other internal preservation actions for all fully conforming ISO 14721 AIPs and stores them in the Preservation Description Information (PDI) of the AIPs.
 Level 4

Level 4a: The archival repository encapsulates fully conforming ISO 14721 AIPs in XML and signs them with a digital signature.

Level 4b: Integrity protection procedures are continuously evaluated and updated as new tools and approaches become available.

 

Resources

Resources associated with the Integrity Framework element assists with providing background information and useful examples that can be consulted when trying to develop a policy or move forward in the area of policy development. 

Definition

A key capability in ISO 14721 (OAIS) conforming digital repositories is ensuring the integrity of the records in its custody, which involves two related preservation actions. The first action generates a cryptographic hash algorithm that normalizes any digital object regardless of size or content type to a fixed length bit stream (e.g., 156 bits). This fixed length bit stream is called a hash digest and it serves as a digital fingerprint. Depending upon the "strength" of the hash digest algorithm used, it is "computationally infeasible" for two different digital objects to have the same hash digest or to reconstruct a data object from this hash digest.

The second action involves integrity fixity that supports an unbroken electronic chain of custody captured in Preservation Description Information (PDI) in AIPs. Hash digests cannot support this chain of custody because migration to newer file formats will introduce changes in the underlying bit streams. Affixing a digital signature to an AIP that authenticates it after any preservation action will mitigate this issue. Over time digital signatures support a strong, unbroken chain of electronic custody.

Level 0

The archival repository has no documented procedure for integrity protection of electronic records in its custody. 

Move to Level 1: Develop a procedure for generating and preserving MD5 hash values before and after device/media renewal and other preservation actions.

Jump to Level 2:  Develop a procedure for generating and preserving SHA1 hash values before and after device/media renewal and other internal preservation actions for partially conforming AIPs.

Level 1

The archival repository generates and preserves MD-5 hash digests before and after device/media renewal and other archival storage preservation actions.

Move to Level 2: Develop a procedure for generating and preserving SHA1 hash values before and after device/media renewal and other internal preservation actions for partially conforming AIPs.

Jump to Level 3: Develop a procedure for generating and preserving SHA-2 hash values before and after device/media renewal and other preservation actions. Store these in the Preservation Description Information (PDI) of the AIP

Level 2

The archival repository generates and preserves SHA-1 hash digests before and after device/media renewal and other internal preservation actions for partially conforming ISO 14721 AIPs.

Move to Level 3: Develop a procedure for generating and preserving SHA-2 hash values before and after device/media renewal and other preservation actions. Store these in the Preservation Description Information (PDI) of the AIP.

Jump to Level 4: Encapsulate fully conforming AIPs in XML and sign them with digital signatures to support an unbroken chain of custody and evaluate integrity protection procedures continuously and update as new tools and approaches become available.

Level 3

The archival repository generates SHA-2 hash digests before and after device/media renewal and other internal preservation actions for all fully conforming ISO 14721 AIPs and stores them in the Preservation Description Information (PDI) of the AIPs.

Move to Level 4: Encapsulate fully conforming AIPs in XML and sign them with digital signatures to support an unbroken chain of custody [4a] and evaluate integrity protection procedures continuously and update as new tools and approaches become available [4b].[Keeping in mind, newer might not be better.]

Level 4a

The archival repository encapsulates fully conforming ISO 14721 AIPs in XML and signs them with a digital signature.

Level 4b

Integrity protection procedures are continuously evaluated and updated as new tools and approaches become available. [Keeping in mind, newer might not be better.]


Helpful Hints

Something to Consider

  • While using increasingly more difficult hash algorithms can be considered "best practice" for assisting with the "security" of the files - using more complex hash values many not be necessary.  MD5 hash values may be enough if your data was created in house and will stay in house. The 'higher' level of checksum algorithm the longer it will take to run, but the more 'secure' it is. Your repository needs should dictate the level of checksum security you need.
    • Note that MD5 was 'broken' in 2005 - meaning that there are methods to make hash values of different files be the same. This is one way to create malicious files and pass them off as 'good' files. The chance of two files 'colliding' accidentally is very low; therefore many still say that MD5 hashes are appropriate for periodic checks on materials.
    • Consider running both MD5 and SHA-256 checksums. Use one to monitor your files over time, and save the other to check in the future. The odds that both collide are very low.
  • Specific procedures for generating and preserving hash algorithms will depend on many things including your workflows, storage environments, and the number/size of files you are working with.
  • Calculated hash values need to be kept alongside the materials for which they reference or in a reference file.  Hash values should be kept for as long you need to verify that a file hasn't changed.  

 Resources

Get the CoSA News Brief

Stay Connected