Preservation Events and Workflows Metadata Specification

Prepared by: DLP Metadata Implementation Working Group (M-IWG)

Last Revised Date: March 2018

Status: Final Draft for Approval

Reviewed by: M-IWG, DLP Core Team




Overview

As part of its larger charter, the DLP Metadata Implementation Working Group (M-IWG) includes a task for identifying:

Metadata specifications and requirements for… [a] Preservation Metadata standard (e.g. use of PREMIS/events)

This document includes specifications for Preservation Events and Workflows metadata relative to requirements identified by the M-IWG and Digital Preservation Functional Requirements Group. Requirements for individual digital files’ characteristics is documented separately in M-IWG’s Technical/Characterization Metadata Specification. Additional supplemental preservation metadata (aka “Source Metadata”) that describes the original carrier or originating environment from which a digital surrogate is derived may be included in a digital object’s preservation package, but is not directly actionable by the DLP repository software.

This specification is implementation-agnostic due to the fact that custom local development will be required to generate events and workflows auditing: these data entities could be managed in RDF or as other types of data. System-initiated events will likely performed by multiple tools in different layers of the Samvera stack (some locally developed, some within Hyrax, some within Fedora itself).

Detailed information about specific metadata units is documented in the Preservation Events and Workflows worksheet.

Work Process

A review of current state metadata and practices was conducted, including the following sources:

Preservation audits metadata currently generated for our Fedora 3 revealed variable practices, some utilizing PREMIS (including some locally defined event names) and some recording audit data in other ways. M-IWG’s initial approach was to identify local events currently in use and share the inventory with the Digital Preservation FRG for further analysis. Additionally, MIWG summarized a list of revised Library of Congress Preservation Event Types which was released during the course of our work, which considerably expanded the original set of events.

The Digital Preservation FRG reviewed the list of local and standard LC Preservation Events and assessed them for future state utilization. Their resulting requirements proposed a broader use of Workflows in addition to Events, which expands on the conventions of the LC Preservation Events list. Workflows and Events are also referenced in the 2018 Digital Collections Steering Committee’s Retention Policy. M-IWG analyzed the new requirements and policy outputs to produce a metadata specification to support these needs.

Preservation Events and Workflow Metadata Specification

Full details for the metadata described below are found in the Preservation Events and Workflows worksheet. (Note: this spreadsheet includes information for some metadata units documented in the Rights Metadata Inventory.)


The diagram that follows shows the relationship of preservation workflows and events as related to individual objects in the repository context.


Some semantic units apply to both workflows and events, as noted below.

Workflow-level Metadata

  • [Workflow] Identifier

  • [Workflow] Type

  • Object Identifier

  • Initiating User

  • Start

  • End

  • Notes

  • Outcome

  • Rights Basis

  • Rights Notes

  • Rights Basis - Review Date

  • Rights Basis - Reviewer

  • Rights Basis - URI

  • Object Visibility Change

Event-level Metadata

  • [Event] Identifier

  • Workflow Identifier [for a parent workflow]

  • [Event] Type

  • Object Identifier

  • Initiating User [or system process name]

  • Start

  • End

  • Outcome

  • System Event Detail

  • System Event Software Version

Preservation Workflows Rights Information

As noted in the Rights Metadata specification, additional rights information may be recorded with a human-initiated preservation workflow. In these cases, repository staff users may need to record additional rights information to explain the context of a particular preservation activity that impacts access to material. This metadata is more appropriate to workflows as opposed to individual events, because the identified workflows are primarily human-initiated and may be triggered by a rights-related factor. Assigning rights metadata to each individual Event would also be duplicative.

Preservation Workflows/Events Rights Metadata units identified:

  • Preservation Rights Basis [e.g. In Copyright - Section 108; Administrative Decision]

  • Preservation Rights Basis - Review Date

  • Preservation Rights Basis - Reviewer

  • Preservation Rights Basis - Note

  • Preservation Rights Basis - URI

Additional Recommendations for Implementation

The following activities are recommended for implementation phase efforts:

  • Adjust or expand metadata as needed to accommodate specific event-level functionality once more implementation details are known. Specific event outcome details (e.g. fixity check results) could be stored as lengthy text notes, but may benefit from more granular field definitions.

  • Revisit metadata units for optimization in their final implementation serialization: revisions may be needed if the metadata is stored in RDF vs. XML vs. a relational database

  • Index and/or enable events and workflows metadata in search or reporting capabilities, so that content curators are able to monitor preservation health of their content and the system as a whole

  • Work with implementation team to determine any required events that should always occur for a given preservation workflow, so that if the events do not run when expected, this information is also recorded

  • For migration purposes, migrate legacy content’s PREMIS and/or audit trails as supplemental preservation files and generate new preservation metadata moving forward relative to the date of re-ingest to the DLP repository

  • Establish a local vocabulary for labeling/relating supplemental preservation files, extending the PCDM File Use vocabulary if appropriate (see suggested values)