Technical Metadata
Prepared by: Emily Porter
Last revised: Feb 2018
Status: Final Draft for Approval
Technical/Characterization Metadata
Overview
Technical/characterization metadata provides information about the characteristics and composition of a digital file (such as its size, mime type, compression, and encoding information), and is an important component of digital preservation. Technical metadata is only applicable to digital file assets (i.e. PCDM Files), not intellectual objects (i.e. PCDM Objects).
The DLP Metadata Implementation Working Group (M-IWG) analyzed characterization metadata separately from other types of preservation metadata because it is primarily auto-extracted from a file by means of tools such as JHOVE, FITS, or MediaInfo, as opposed to being manually created by a content curator. Additional preservation-related metadata are documented separately based on requirements identified by the Digital Preservation Functional Requirements Group.
While most significant to preservation curators, some technical metadata properties may also be useful to present to end-users about how to interact with the file (duration, size, file type, filename, etc.) and to understand the context in which it was originally created.
The following elements are recommended as baseline technical metadata for both generic and specific formats (analogous to Emory Primary Content Types), based upon a review of Samvera community metadata standards, Emory stakeholder-supplied fields, and Samvera’s primary technical file characterization tool, FITS, as well as the utilization of FITS data as incorporated into the HydraWorks characterization software (see Appendices A and B). In 2016, a number of test cases were evaluated against FITS (see Appendix B) against available versions of FITS.
Indexing and display recommendations are also noted below for selected metadata units, based on feedback from the Metadata Implementation Working Group and additional stakeholders from the Digitization and Digital Curation team provided in 2016. The tables that follow indicate where having system actionable (searchable, viewable) technical metadata may be useful, as opposed to characterization metadata remaining in a static text/XML file which may be included as part of an archival package. The properties listed are based on FITS 1.2, but the version implemented for the DLP repository is configurable based on versions supported by Samvera software[PE1] .
See also - Merged Spreadsheet/Inventory View
Baseline Generic Technical Metadata
The following properties are recommended for alltypes of digital files, regardless of format. Note: common digital file properties may also be expressed/stored as Preservation metadata (PREMIS object characteristics).
Format | Property Label | Extraction Source | Staff Display | Patron Display | Samvera Profile* | PREMIS 3** |
Generic | md5checksum | FITS | Y |
| Y | Y |
Generic | SHAchecksum | DLP? | Y |
|
| Y |
Generic | SHA256checksum[PE2] | DLP? | Y |
|
| Y |
Generic | filename | FITS | Y | Y | Y | Y |
Generic | filepath | FITS | Y |
|
|
|
Generic | size | FITS | Y | Y | Y | Y |
Generic | valid | FITS | Y |
| Y |
|
Generic | well-formed | FITS | Y |
| Y |
|
Generic | created | FITS | Y |
| Y | Y |
Generic | creatingApplicationVersion | FITS | Y |
| Y? | Y |
Generic | creatingApplicationName | FITS | Y |
|
| Y |
Generic | creatingOS [Operating System] | FITS | Y |
|
|
|
Generic | mimetype | FITS | Y | Y | Y | Y |
Generic | Format [label for mimetype] | FITS | Y |
|
| Y |
Generic | PUID | FITS/Droid |
|
| Y |
|
Generic | copyrightBasis*** | FITS | Y*** |
|
| Y |
Generic | copyrightNote*** | FITS | Y*** |
|
| Y |
Generic | rightsBasis*** | FITS | Y*** |
|
| Y |
Generic | fslastmodified [file system] | FITS | Y |
|
| Y |
Generic | inhibitorType | FITS |
|
|
| Y |
Generic | inhibitorTarget | FITS |
|
|
| Y |
* Indicated in Samvera Technical Metadata Profile and/or HydraWorks Characterization Profile
**Indicated in PREMIS standard (digital object characteristics)
*** Addressed elsewhere in MIWG specifications
Baseline for Audio
Format | Property Label | Extraction Source | Staff Display | Patron Display |
Audio | channels | FITS, HydraWorks | Y | Y |
Audio | duration | FITS, HydraWorks | Y | Y |
Audio | audioDataEncoding | FITS |
|
|
Audio | dataFormat | HydraWorks* |
|
|
Audio | avgBitRate | FITS |
|
|
Audio | avgPacketSize | FITS |
|
|
Audio | bitDepth | FITS, HydraWorks |
|
|
Audio | bitrate | FITS |
|
|
Audio | blockAlign | FITS |
|
|
Audio | blockSizeMax | FITS |
|
|
Audio | blockSizeMin | FITS |
|
|
Audio | byteOrder | FITS |
|
|
Audio | maxBitRate | FITS |
|
|
Audio | maxPacketSize | FITS |
|
|
Audio | numSamples | FITS |
|
|
Audio | offset | FITS, HydraWorks |
|
|
Audio | sampleRate | FITS, HydraWorks |
|
|
Audio | software | FITS |
|
|
Audio | soundField | FITS |
|
|
Audio | time | FITS |
|
|
Audio | wordSize | FITS |
|
|
*Listed in HydraWorks, but not in current FITS specification
Baseline for Video
Format | Property Label | Extraction Source | Staff Display | Patron Display |
Video | duration | FITS, HydraWorks | Y | Y |
Video | frameRate | FITS, HydraWorks | Y |
|
Video | frameRateMode | FITS | Y |
|
Video | imageHeight | FITS, HydraWorks | Y |
|
Video | imageWidth | FITS, HydraWorks | Y |
|
Video | sampleRate | FITS, HydraWorks | Y |
|
Video | apertureSetting | FITS |
|
|
Video | bitDepth | FITS |
|
|
Video | bitRate | FITS |
|
|
Video | blockSizeMax | FITS |
|
|
Video | blockSizeMin | FITS |
|
|
Video | channels | FITS |
|
|
Video | dataFormatType | FITS |
|
|
Video | digitalCameraManufacturer | FITS |
|
|
Video | digitalCameraModelName | FITS |
|
|
Video | exposureTime | FITS |
|
|
Video | exposureProgram | FITS |
|
|
Video | fNumber | FITS |
|
|
Video | focus | FITS |
|
|
Video | gain | FITS |
|
|
Video | GPS [ ~30 properties] | FITS |
|
|
Video | imageStabilization | FITS |
|
|
Video | shutterSpeedValue | FITS |
|
|
Video | videoStreamType | FITS |
|
|
Video | whiteBalance | FITS |
|
|
Video | xSamplingFrequency | FITS |
|
|
Video | ySamplingFrequency | FITS |
|
|
[Broadcast standard, color space, chroma subsampling, compression mode, display aspect ratio, scan type were previously available in FITS, but removed as of 2018. Similar property names exist in the Image section.]
Baseline for Documents/Text
Format | Property Label | Extraction Source | Staff Display | Patron Display |
Document | FITS, HydraWorks | Y | Y | |
Document | charset | FITS, HydraWorks | Y |
|
Document | title | FITS, HydraWorks |
|
|
Document | author | FITS |
|
|
Document | language | FITS, HydraWorks |
|
|
Document | markupBasis | FITS, HydraWorks |
|
|
Document | markupBasisVersion | FITS |
|
|
Document | markupLanguage | FITS, HydraWorks |
|
|
Document | has Annotations | FITS |
|
|
Document | hasOutline | FITS |
|
|
Document | isProtected | FITS |
|
|
Document | isRightsManaged | FITS |
|
|
Document | isTagged | FITS |
|
|
Document | linebreak | FITS |
|
|
Document | paragraphCount | HydraWorks |
|
|
Document | tableCount | HydraWorks |
|
|
Document | graphicsCount | HydraWorks |
|
|
Baseline for Images
Format | Property Label | Extraction Source | Staff Display | Patron Display |
Image | byteOrder | FITS, HydraWorks | Y |
|
Image | colorSpace | FITS, HydraWorks | Y |
|
Image | compressionScheme | FITS, HydraWorks | Y |
|
Image | imageHeight | FITS, HydraWorks | Y |
|
Image | imageWidth | FITS, HydraWorks | Y |
|
Image | iccProfileName | FITS | Y |
|
Image | iccProfileVersion | FITS | Y |
|
Image | apertureValue | FITS |
|
|
Image | bitsPerSample | FITS |
|
|
Image | brightnessValue | FITS |
|
|
Image | captureDevice | FITS, HydraWorks |
|
|
Image | cfaPattern | FITS |
|
|
Image | cfaPattern2 | FITS |
|
|
Image | colorMap | FITS, HydraWorks |
|
|
Image | digitalCameraManufacturer | FITS |
|
|
Image | digitalCameraModelName | FITS |
|
|
Image | digitalCameraSerialNo | FITS |
|
|
Image | exifVersion | FITS, HydraWorks |
|
|
Image | exposureBiasValue | FITS |
|
|
Image | exposureIndex | FITS |
|
|
Image | exposureProgram | FITS |
|
|
Image | exposureTime | FITS |
|
|
Image | extraSamples | FITS |
|
|
Image | flash | FITS |
|
|
Image | flashEnergy | FITS |
|
|
Image | focalLength | FITS |
|
|
Image | gps [31 Properties] | FITS, some HydraWorks |
|
|
Image | grayResponseUnit | FITS |
|
|
Image | imageProducer | FITS, HydraWorks |
|
|
Image | isoSpeedRating | FITS |
|
|
Image | lightSource | FITS |
|
|
Image | maxApertureValue | FITS |
|
|
Image | meteringMode | FITS |
|
|
Image | oECF | FITS |
|
|
Image | orientation | FITS, HydraWorks |
|
|
Image | primaryChromaticities [6+] | FITS |
|
|
Image | qualityLayers | FITS |
|
|
Image | referenceBlackWhite | FITS |
|
|
Image | resolutionLevels | FITS |
|
|
Image | samplesPerPixel | FITS |
|
|
Image | samplingFrequencyUnit | FITS |
|
|
Image | scannerManufacturer | FITS |
|
|
Image | scannerModelName | FITS |
|
|
Image | scannerModelName | FITS |
|
|
Image | scannerModelName | FITS |
|
|
Image | scanningSoftwareName | FITS, HydraWorks?[PE5] |
|
|
Image | scanningSoftwareVersionNo | FITS, HydraWorks? |
|
|
Image | sensingMethod | FITS |
|
|
Image | shutterSpeedValue | FITS |
|
|
Image | spectralSensitivity | FITS |
|
|
Image | subjectDistance | FITS |
|
|
Image | tileHeight | FITS |
|
|
Image | tileWidth | FITS |
|
|
Image | whitePointXValue | FITS |
|
|
Image | whitePointYValue | FITS |
|
|
Image | xSamplingFrequency | FITS |
|
|
Image | ySamplingFrequency | FITS |
|
|
Image | YCbCrCoefficients | FITS |
|
|
Image | YCbCrPositioning | FITS |
|
|
Image | YCbCrSubSampling | FITS |
|
|
Appendix A – Samvera Technical Metadata Profile (Summary)
See the Samvera wiki for more detail. Note: this community-developed profile also incorporates review of the Europeana Technical Metadata specifications.
Namespaces
Prefix | Namespace |
ebucore | |
rdfs | |
premis | |
rdf | |
pronom | (unpublished) |
sweetjpl |
Properties
Element/Property | Label | Obligation | Repeat | Description | RDF Range |
ebucore:filename
| File Name | Required | N | Equivalent: nfo:fileName, premis:hasOriginalName
| Xsd:string |
ebucore:fileSize
| File Size | Required | N | nfo:fileSize, dct:extent, premis:hasSize
| Xsd:integer |
rdfs:label* | Label | Recommended | N | Descriptive label (for file, vs. Title) | Xsd:string |
ebucore:dateCreated | Date Created | Recommended | N |
|
|
premis:hasMessageDigest | File Hash | Recommended | Y | The output of the message digest algorithm.
| Xsd:string |
premishash:md5 | MD5 Checksum | Optional | N | MD5 checksum value | [string?] |
Rdf:type* | File Format Type | Recommended | N | Category or Genre of the File
| Pcdm: |
Ebucore:hasMimeType | Has Mime Type | Recommended | N | Equiv: dcterms:format; pronom:internetMediaType | Xsd:string |
ebucore:dateModified**
| Date Modified | Optional | N | Dcterms:modified | [string/date] |
pronom:puid
| File Format | Optional | N | Pronom ID to uniquely identify file format | xsd:string
|
sweetjpl:hasByteOrder*** | Byte Order | Optional | N | [Not supplied] |
|
* Not included in Emory recommendations for Technical/Characterization Metadata, because we consider them to be more descriptive/manual in nature vs. automated characterization
**M-IWG recommends additional clarifications on what type of date/activity is stored here
*** Limited extraction in in FITS (available for some formats only)
Appendix B – MWG Analysis Documents and FITS Information (2016 - 2018)
Detailed element-level analysis
Metadata Working Group’s analysis included multiple sources for technical metadata:
- FITS (documentation and use of standalone application)
- Hydra-Works Extraction of FITS and metadata properties defined as RDF
- Hydra Technical Metadata Profile
- Emory DAMS (provides extracted technical metadata when indexing file assets)
- The Keep AV Technical Metadata
Working Files
- Spreadsheet(2016)
FITS Testing (2016)
The team performed limited testing of FITS in multiple environments (standalone: versions 0.10 and 0.8, and Sufia 6-integrated):
FITS: Use of Metadata Standards
When configured to output standard XML metadata, FITS references the following standardsto encode its metadata values:
- Audio: AES
- Documents: Document MD
- Images: MIX
- Text: TextMD
- Video: EbuCore