Technical Metadata

Prepared by: Emily Porter

Last revised: Feb 2018

Status: Final Draft for Approval


Technical/Characterization Metadata

Overview

Technical/characterization metadata provides information about the characteristics and composition of a digital file (such as its size, mime type, compression, and encoding information), and is an important component of digital preservation. Technical metadata is only applicable to digital file assets (i.e. PCDM Files), not intellectual objects (i.e. PCDM Objects).

The DLP Metadata Implementation Working Group (M-IWG) analyzed characterization metadata separately from other types of preservation metadata because it is primarily auto-extracted from a file by means of tools such as JHOVE, FITS, or MediaInfo, as opposed to being manually created by a content curator. Additional preservation-related metadata are documented separately based on requirements identified by the Digital Preservation Functional Requirements Group.

While most significant to preservation curators, some technical metadata properties may also be useful to present to end-users about how to interact with the file (duration, size, file type, filename, etc.) and to understand the context in which it was originally created. 

The following elements are recommended as baseline technical metadata for both generic and specific formats (analogous to Emory Primary Content Types), based upon a review of Samvera community metadata standards, Emory stakeholder-supplied fields, and Samvera’s primary technical file characterization tool, FITS, as well as the utilization of FITS data as incorporated into the HydraWorks characterization software (see Appendices A and B). In 2016, a number of test cases were evaluated against FITS (see Appendix B) against available versions of FITS.

Indexing and display recommendations are also noted below for selected metadata units, based on feedback from the Metadata Implementation Working Group and additional stakeholders from the Digitization and Digital Curation team provided in 2016. The tables that follow indicate where having system actionable (searchable, viewable) technical metadata may be useful, as opposed to characterization metadata remaining in a static text/XML file which may be included as part of an archival package. The properties listed are based on FITS 1.2, but the version implemented for the DLP repository is configurable based on versions supported by Samvera software[PE1] .

See also - Merged Spreadsheet/Inventory View

Baseline Generic Technical Metadata

The following properties are recommended for alltypes of digital files, regardless of format. Note: common digital file properties may also be expressed/stored as Preservation metadata (PREMIS object characteristics). 

 

Format

Property Label

Extraction Source

Staff Display

Patron Display

Samvera Profile*

PREMIS 3**

Generic

md5checksum

FITS

Y

 

Y

Y

Generic

SHAchecksum

DLP?

Y

 

 

Y

Generic

SHA256checksum[PE2] 

DLP?

Y

 

 

Y

Generic

filename

FITS

Y

Y

Y

Y

Generic

filepath

FITS

Y

 

 

 

Generic

size

FITS

Y

Y

Y

Y

Generic

valid

FITS

Y

 

Y

 

Generic

well-formed

FITS

Y

 

Y

 

Generic

created

FITS

Y

 

Y

Y

Generic

creatingApplicationVersion

FITS

Y

 

Y?

Y

Generic

creatingApplicationName

FITS

Y

 

 

Y

Generic

creatingOS [Operating System]

FITS

Y

 

 

 

Generic

mimetype 

FITS

Y

Y

Y

Y

Generic

Format [label for mimetype]

FITS

Y

 

 

Y

Generic

PUID

FITS/Droid

 

 

Y

 

Generic

copyrightBasis***

FITS

Y***

 

 

Y

Generic

copyrightNote***

FITS

Y***

 

 

Y

Generic

rightsBasis***

FITS

Y***

 

 

Y

Generic

fslastmodified [file system]

FITS

Y

 

 

Y

Generic

inhibitorType

FITS

 

 

 

Y

Generic

inhibitorTarget

FITS

 

 

 

Y

* Indicated in Samvera Technical Metadata Profile and/or HydraWorks Characterization Profile

**Indicated in PREMIS standard (digital object characteristics)

*** Addressed elsewhere in MIWG specifications

Baseline for Audio 

Format

Property Label

Extraction Source

Staff Display

Patron Display

Audio

channels

FITS, HydraWorks

Y

Y

Audio

duration

FITS, HydraWorks

Y

Y

Audio

audioDataEncoding

FITS

 

 

Audio

dataFormat

HydraWorks*

 

 

Audio

avgBitRate

FITS

 

 

Audio

avgPacketSize

FITS

 

 

Audio

bitDepth

FITS, HydraWorks

 

 

Audio

bitrate

FITS

 

 

Audio

blockAlign

FITS

 

 

Audio

blockSizeMax

FITS

 

 

Audio

blockSizeMin

FITS

 

 

Audio

byteOrder

FITS

 

 

Audio

maxBitRate

FITS

 

 

Audio

maxPacketSize

FITS

 

 

Audio

numSamples

FITS

 

 

Audio

offset

FITS, HydraWorks

 

 

Audio

sampleRate

FITS, HydraWorks

 

 

Audio

software

FITS

 

 

Audio

soundField

FITS

 

 

Audio

time

FITS

 

 

Audio

wordSize

FITS

 

 

*Listed in HydraWorks, but not in current FITS specification

Baseline for Video

Format

Property Label

Extraction Source

Staff Display

Patron Display

Video

duration

FITS, HydraWorks

Y

Y

Video

frameRate

FITS, HydraWorks

Y

 

Video

frameRateMode

FITS

Y

 

Video

imageHeight

FITS, HydraWorks

Y

 

Video

imageWidth

FITS, HydraWorks

Y

 

Video

sampleRate

FITS, HydraWorks

Y

 

Video

apertureSetting

FITS

 

 

Video

bitDepth

FITS

 

 

Video

bitRate

FITS

 

 

Video

blockSizeMax

FITS

 

 

Video

blockSizeMin

FITS

 

 

Video

channels

FITS

 

 

Video

dataFormatType

FITS

 

 

Video

digitalCameraManufacturer

FITS

 

 

Video

digitalCameraModelName

FITS

 

 

Video

exposureTime

FITS

 

 

Video

exposureProgram

FITS

 

 

Video

fNumber

FITS

 

 

Video

focus

FITS

 

 

Video

gain

FITS

 

 

Video

GPS [ ~30 properties]

FITS

 

 

Video

imageStabilization

FITS

 

 

Video

shutterSpeedValue

FITS

 

 

Video

videoStreamType

FITS

 

 

Video

whiteBalance

FITS

 

 

Video

xSamplingFrequency

FITS

 

 

Video

ySamplingFrequency

FITS

 

 

[Broadcast standard, color space, chroma subsampling, compression mode, display aspect ratio, scan type were previously available in FITS, but removed as of 2018. Similar property names exist in the Image section.]

Baseline for Documents/Text

Format

Property Label

Extraction Source

Staff Display

Patron Display

Document

pageCount[PE3] [PFA4] 

FITS, HydraWorks

Y

Y

Document

charset

FITS, HydraWorks

Y

 

Document

title

FITS, HydraWorks

 

 

Document

author

FITS

 

 

Document

language

FITS, HydraWorks

 

 

Document

markupBasis

FITS, HydraWorks

 

 

Document

markupBasisVersion

FITS

 

 

Document

markupLanguage

FITS, HydraWorks

 

 

Document

has Annotations

FITS

 

 

Document

hasOutline

FITS

 

 

Document

isProtected

FITS

 

 

Document

isRightsManaged

FITS

 

 

Document

isTagged

FITS

 

 

Document

linebreak

FITS

 

 

Document

paragraphCount

HydraWorks

 

 

Document

tableCount

HydraWorks

 

 

Document

graphicsCount

HydraWorks

 

 

Baseline for Images

Format

Property Label

Extraction Source

Staff Display

Patron Display

Image

byteOrder

FITS, HydraWorks

Y

 

Image

colorSpace

FITS, HydraWorks

Y

 

Image

compressionScheme

FITS, HydraWorks

Y

 

Image

imageHeight

FITS, HydraWorks

Y

 

Image

imageWidth

FITS, HydraWorks

Y

 

Image

iccProfileName

FITS

Y

 

Image

iccProfileVersion

FITS

Y

 

Image

apertureValue

FITS

 

 

Image

bitsPerSample

FITS

 

 

Image

brightnessValue

FITS

 

 

Image

captureDevice

FITS, HydraWorks

 

 

Image

cfaPattern

FITS

 

 

Image

cfaPattern2

FITS

 

 

Image

colorMap

FITS, HydraWorks

 

 

Image

digitalCameraManufacturer

FITS

 

 

Image

digitalCameraModelName

FITS

 

 

Image

digitalCameraSerialNo

FITS

 

 

Image

exifVersion

FITS, HydraWorks

 

 

Image

exposureBiasValue

FITS

 

 

Image

exposureIndex

FITS

 

 

Image

exposureProgram

FITS

 

 

Image

exposureTime

FITS

 

 

Image

extraSamples

FITS

 

 

Image

flash

FITS

 

 

Image

flashEnergy

FITS

 

 

Image

focalLength

FITS

 

 

Image

gps [31 Properties]

FITS, some HydraWorks

 

 

Image

grayResponseUnit

FITS

 

 

Image

imageProducer

FITS, HydraWorks

 

 

Image

isoSpeedRating

FITS

 

 

Image

lightSource

FITS

 

 

Image

maxApertureValue

FITS

 

 

Image

meteringMode

FITS

 

 

Image

oECF

FITS

 

 

Image

orientation

FITS, HydraWorks

 

 

Image

primaryChromaticities [6+]

FITS

 

 

Image

qualityLayers

FITS

 

 

Image

referenceBlackWhite

FITS

 

 

Image

resolutionLevels

FITS

 

 

Image 

samplesPerPixel

FITS

 

 

Image

samplingFrequencyUnit

FITS

 

 

Image

scannerManufacturer

FITS

 

 

Image

scannerModelName

FITS

 

 

Image

scannerModelName

FITS

 

 

Image

scannerModelName

FITS

 

 

Image

scanningSoftwareName

FITS, HydraWorks?[PE5] 

 

 

Image

scanningSoftwareVersionNo

FITS, HydraWorks? 

 

 

Image

sensingMethod

FITS

 

 

Image

shutterSpeedValue

FITS

 

 

Image

spectralSensitivity

FITS

 

 

Image

subjectDistance

FITS

 

 

Image

tileHeight

FITS

 

 

Image

tileWidth

FITS

 

 

Image

whitePointXValue

FITS

 

 

Image

whitePointYValue

FITS

 

 

Image

xSamplingFrequency

FITS

 

 

Image

ySamplingFrequency

FITS

 

 

Image

YCbCrCoefficients

FITS

 

 

Image

YCbCrPositioning

FITS

 

 

Image

YCbCrSubSampling

FITS

 

 

Appendix A – Samvera Technical Metadata Profile (Summary)

 

See the Samvera wiki for more detail. Note: this community-developed profile also incorporates review of the Europeana Technical Metadata specifications.

Namespaces

Properties 

Element/Property

Label

Obligation

Repeat

Description

RDF Range

ebucore:filename

 

 

File Name

Required

N

Equivalent: nfo:fileName, premis:hasOriginalName

 

Xsd:string

ebucore:fileSize

 

 

File Size

Required

N

nfo:fileSize, dct:extent, premis:hasSize

 

Xsd:integer

rdfs:label*

Label

Recommended

N

Descriptive label (for file, vs. Title)

Xsd:string

ebucore:dateCreated

Date Created

Recommended

N

 

 

premis:hasMessageDigest

File Hash

Recommended

Y

The output of the message digest algorithm.

 

Xsd:string

premishash:md5

MD5 Checksum

Optional

N

MD5 checksum value

[string?]

Rdf:type*

File Format Type

Recommended

N

Category or Genre of the File

 

Pcdm:
Document

Ebucore:hasMimeType

Has Mime Type

Recommended

N

Equiv: dcterms:format; pronom:internetMediaType

Xsd:string

ebucore:dateModified**

 

Date Modified

Optional

N

Dcterms:modified

[string/date]

pronom:puid

 

File Format

Optional

N

Pronom ID to uniquely identify file format

xsd:string

 

sweetjpl:hasByteOrder***

Byte Order

Optional

N

[Not supplied]

 

 

* Not included in Emory recommendations for Technical/Characterization Metadata, because we consider them to be more descriptive/manual in nature vs. automated characterization

**M-IWG recommends additional clarifications on what type of date/activity is stored here 

*** Limited extraction in in FITS (available for some formats only)

 

Appendix B – MWG Analysis Documents and FITS Information (2016 - 2018)

Detailed element-level analysis

Metadata Working Group’s analysis included multiple sources for technical metadata:

  • FITS (documentation and use of standalone application)
  • Hydra-Works Extraction of FITS and metadata properties defined as RDF
  • Hydra Technical Metadata Profile
  • Emory DAMS (provides extracted technical metadata when indexing file assets)
  • The Keep AV Technical Metadata

Working Files 


FITS Testing (2016)

The team performed limited testing of FITS in multiple environments (standalone: versions 0.10 and 0.8, and Sufia 6-integrated): 

FITS: Use of Metadata Standards

When configured to output standard XML metadata, FITS references the following standardsto encode its metadata values: 

  • Audio: AES 
  • Documents: Document MD
  • Images: MIX
  • Text: TextMD
  • Video: EbuCore