File Format Recommendations

Question

This document seeks to identify recommendations for file formats for Emory’s preservation repository.

The Digital Preservation Functional Requirements Group (FRG) of the DLP Project was tasked with identifying “preferred file formats for preservation of major formats/content types (still images, audio, video, disk images, supplemental/generic files).” This document outlines those recommendations.

Decision

Emory’s preservation repository should be able to accept any type of file.

Assumptions

  • Deposit service owners may make recommendations or identify requirements around the deposit of files. For example, the Electronic Theses and Dissertation service owner may identify that depositors are required to deposit a .pdf file rather than a .doc file.
  • Content viewers may require specific derivative file types to utilize features of the viewer (e.g. book viewers, may require .jp2 files to enable page turner features). The creation of derivative file types may assume that a depositor deposits particular preservation file types (e.g. to generate .jp2 files the depositor may need to deposit .tiff files).
  • Deposit service owners making recommendations for similar content types will collaborate on recommendations and requirements so they align.
  • Bulk deposit service owners may identify recommendations or identify requirements around the normalization of files. For example, the book digitization service owner may identify the need to deposit .tiff files, .alto files, and similar file types in order to generate .pdf files.
  • Automatic generation of file types (i.e. derivative files and normalized files) will be performed by workflows.

Risks

  • Users of self-deposit services may deposit malformed and/or invalid files.
  • Users of self-deposit services may deposit files that don’t conform to the needs of content viewers.
  • By not fully defining the requirements for file formats, it may extend the timeline for implementation.