Open Standards / Neutral Formats

DEFINITION: A fundamental requisite for a sustainable digital preservation program that ensures long-term access to usable and understandable electronic records is mitigation of obsolescence of file formats. Open standard technology neutral (“OS/TN”) file formats are developed in an open, public setting, issued by a certified standards organization, and have few or no technology dependencies. Current [2012] preferred OS/TN format examples include:

  • HTML, Plain Text, XML, ODF, and PDF/A for text
  • CSV for spreadsheets
  • JPEG 2000 for photographs
  • PDF/A, PNG, and TIFF for scanned images
  • SVG for vector graphics
  • BWF for audio
  • MPEG-4 and Motion JPEG2000 for video
  • WARC for web pages.

Over time new digital preservation tools and solutions will emerge that will require new OS/TN file formats. OS/TN formats are backwardly compatible so they can support interoperability across technology platforms over an extended period of time.

NOTE: Recommendations for OS/TN formats below were made in 2012.

 
 Level 0   The Archives/RM unit has not adopted any OS/TN file format as a digital preservation format.
 Level 1 The Archives/RM unit has adopted at least one OS/TN file format as digital preservation format.
 Level 2 The Archives/RM unit has adopted at least three OS/TN file formats as digital preservation formats.
 Level 3

Level 3a: The Archives/RM unit has adopted OS/TN for text. EX: HTML, Plain Text, XML, ODF, and PDF/A. 

Level 3b: The Archives/RM unit has adopted an OS/TN for spreadsheets. EX: CSV

Level 3c: The Archives/RM unit has adopted an OS/TN for raster / bit-map images (scanned and born digital). EX: JPEG 2000 and TIFF for born-digital photographs; PDF/A, PNG, and TIFF for scanned images.

Level 3d: The Archives/RM unit has adopted an OS/TN for vector graphics. EX: SVG

Level 3e: The Archives/RM unit has adopted an OS/TN for audio. EX:BWF 

Level 3f: The Archives/RM unit has adopted an OS/TN for videos. EX: MPEG-4 and Motion JPEG2000 

Level 3g: The Archives/RM unit has adopted an OS/TN for web pages. EX: WARC

 Level 4 The Archives/RM unit continuously monitors the sustainability of OS/TN file formats and adopt them as appropriate for use as preservation formats.

 

Resources

Resources associated with the Open Source / Neutral Standards Framework element assist with providing background information and useful examples that can be consulted when trying to determine which formats meet your requirements for long-term preservation.

Definition

A fundamental requisite for a sustainable digital preservation program that ensures long-term access to usable and understandable electronic records is mitigation of obsolescence of file formats. Open standard technology neutral (“OS/TN”) file formats are developed in an open, public setting, issued by a certified standards organization, and have few or no technology dependencies. Current preferred OS/TN format examples include:

  • HTML, Plain Text, XML, ODF, and PDF/A for text
  • CSV for spreadsheets
  • JPEG 2000 for photographs
  • PDF/A, PNG, and TIFF for scanned images
  • SVG for vector graphics
  • BWF for audio
  • MPEG-4 and Motion JPEG2000 for video
  • WARC for web pages.

Over time new digital preservation tools and solutions will emerge that will require new OS/TN file formats. OS/TN formats are backwardly compatible so they can support interoperability across technology platforms over an extended period of time.

Note: Recommendations for the OS/TN formats below were made in 2012.

Level 0

The Archives/RM unit has not adopted any OS/TN file format as a digital preservation format.

Move to Level 1: Adopt at least one OS/TN file format as a digital preservation format.

Jump to Level 2: Adopt at least three OS/TN file formats as a digital preservation format.

Level 1

The Archives/RM unit has adopted at least one OS/TN file format as digital preservation format.

Move to Level 2: Adopt at least three OS/TN file formats as a digital preservation format.

Jump to Level 3: Use OS/TN formats for as many types of files that you can such as for text, spreadsheets, raster images, vector graphics, audio, video, and web pages.

Level 2

The Archives/RM unit has adopted at least three OS/TN file formats as digital preservation formats. 

Move to Level 3: Use OS/TN formats for as many types of files that you can such as for text, spreadsheets, raster images, vector graphics, audio, video, and web pages.

Jump to Level 4: Use OS/TN formats as preservation formats for common file types and monitor their sustainability over time.

Level 3a

The Archives/RM unit has adopted OS/TN for text. EX: HTML, Plain Text, XML, ODF, and PDF/A

Level 3b

The Archives/RM unit has adopted an OS/TN for spreadsheets. EX: CSV

Level 3c

The Archives/RM unit has adopted an OS/TN for raster / bit-map images (scanned and born digital). EX: JPEG 2000 and TIFF for born-digital photographs; PDF/A, PNG, and TIFF for scanned images

Level 3d

The Archives/RM unit has adopted an OS/TN for vector graphics. EX: SVG

Level 3e

The Archives/RM unit has adopted an OS/TN for audio. EX:BWF

Level 3f

The Archives/RM unit has adopted an OS/TN for videos. EX: MPEG-4 and Motion JPEG2000

Level 3g

The Archives/RM unit has adopted an OS/TN for web pages. EX: WARC

Move to Level 4: Adopt OS/TN formats for file types as listed above and monitor the sustainability of current and future OS/TN formats and adopts them as appropriate for preservation formats.

Level 4

The Archives/RM unit continuously monitors the sustainability of OS/TN file formats and adopt them as appropriate for use as preservation formats.

Helpful Hints

Something to Consider

    • For some types of files, even commonly used formats, best practice consensus on single ideal preservation formats does not exist. Your options should be evaluated in your environment before final choices are made. Details about the decision should be documented.
      • Video preservation format best practices require multiple specifications of the various layers of encodings (audio, video, subtitles) and wrappers. Indicating simply MPEG-4 or Motion JPEG2000 might be misleading.
      • Even PDF/A has several subtypes, including PDF/A-1a, A-1b, A-2a, A-2b, whose exact parameters should be specified for consistency.
    • While the concept of using and preferring OS/TN formats provides a solid foundation it will not always be possible to utilize OS/TN formats.
      • Certain components of GIS data might be in formats that have no OS/TN equivalent but need to be preserved nonetheless.
      • Other formats such as Word and other Microsoft formats, while not OS/TN, are very common and are well-supported and may not have or cause any immediate preservation concerns - you must consider your environment and make these decisions for yourself. For example, if your organization uses and works with others that support Windows, there might not be a strong need to spend the time and effort converting to Open Office or PDF/A at this time.
  • When converting any format to another, your institution must consider the possibility that information in the original file may be lost when converting to an OS/TN format. This information could be underlying data or the original rendering of that data. Consider preserving objects in two formats: the original format and an OS/TN format. As with all formats, including OS/TN formats, it is important to understand the formats you have and evaluate the risks in preserving them long term.
    • Excel spreadsheet converted to CSV would lose metadata, formatting information, and embedded formulas, and should sometimes be saved either in ODF or even in the original XLSX rather than CSV.

Example OS/TN Formats

Open standards at the time of writing the original self-assessment (2012) include:

Sustainability of Formats

File format registries that provide information about file formats to assist with reviewing individual format sustainability factors.

The Sustainability of Digital Formats website has pages that address preferences for still images, sound files, text, moving image, and web archiving.  

File Format Discussions

Get the CoSA News Brief

Stay Connected