Preservation Events and Workflows Metadata Specification
Prepared by: DLP Metadata Implementation Working Group (M-IWG)
Last Revised Date: March 2018
Status: Final Draft for Approval
Reviewed by: M-IWG, DLP Core Team
Overview
As part of its larger charter, the DLP Metadata Implementation Working Group (M-IWG) includes a task for identifying:
Metadata specifications and requirements for… [a] Preservation Metadata standard (e.g. use of PREMIS/events)
This document includes specifications for Preservation Events and Workflows metadata relative to requirements identified by the M-IWG and Digital Preservation Functional Requirements Group. Requirements for individual digital files’ characteristics is documented separately in M-IWG’s Technical/Characterization Metadata Specification. Additional supplemental preservation metadata (aka “Source Metadata”) that describes the original carrier or originating environment from which a digital surrogate is derived may be included in a digital object’s preservation package, but is not directly actionable by the DLP repository software.
This specification is implementation-agnostic due to the fact that custom local development will be required to generate events and workflows auditing: these data entities could be managed in RDF or as other types of data. System-initiated events will likely performed by multiple tools in different layers of the Samvera stack (some locally developed, some within Hyrax, some within Fedora itself).
Detailed information about specific metadata units is documented in the Preservation Events and Workflows worksheet.
Work Process
A review of current state metadata and practices was conducted, including the following sources:
The Keep
OpenEmory
ETDs (legacy application)
Library of Congress Preservation Event Types (2017 revision)
Emory Libraries Digital Preservation Policy
Emory Libraries Digital Collections Retention Policy
Preservation audits metadata currently generated for our Fedora 3 revealed variable practices, some utilizing PREMIS (including some locally defined event names) and some recording audit data in other ways. M-IWG’s initial approach was to identify local events currently in use and share the inventory with the Digital Preservation FRG for further analysis. Additionally, MIWG summarized a list of revised Library of Congress Preservation Event Types which was released during the course of our work, which considerably expanded the original set of events.
The Digital Preservation FRG reviewed the list of local and standard LC Preservation Events and assessed them for future state utilization. Their resulting requirements proposed a broader use of Workflows in addition to Events, which expands on the conventions of the LC Preservation Events list. Workflows and Events are also referenced in the 2018 Digital Collections Steering Committee’s Retention Policy. M-IWG analyzed the new requirements and policy outputs to produce a metadata specification to support these needs.
Preservation Events and Workflow Metadata Specification
Full details for the metadata described below are found in the Preservation Events and Workflows worksheet. (Note: this spreadsheet includes information for some metadata units documented in the Rights Metadata Inventory.)
The diagram that follows shows the relationship of preservation workflows and events as related to individual objects in the repository context.
Some semantic units apply to both workflows and events, as noted below.
Workflow-level Metadata
[Workflow] Identifier
[Workflow] Type
Object Identifier
Initiating User
Start
End
Notes
Outcome
Rights Basis
Rights Notes
Rights Basis - Review Date
Rights Basis - Reviewer
Rights Basis - URI
Object Visibility Change
Event-level Metadata
[Event] Identifier
Workflow Identifier [for a parent workflow]
[Event] Type
Object Identifier
Initiating User [or system process name]
Start
End
Outcome
System Event Detail
System Event Software Version
Preservation Workflows Rights Information
As noted in the Rights Metadata specification, additional rights information may be recorded with a human-initiated preservation workflow. In these cases, repository staff users may need to record additional rights information to explain the context of a particular preservation activity that impacts access to material. This metadata is more appropriate to workflows as opposed to individual events, because the identified workflows are primarily human-initiated and may be triggered by a rights-related factor. Assigning rights metadata to each individual Event would also be duplicative.
Preservation Workflows/Events Rights Metadata units identified:
Preservation Rights Basis [e.g. In Copyright - Section 108; Administrative Decision]
Preservation Rights Basis - Review Date
Preservation Rights Basis - Reviewer
Preservation Rights Basis - Note
Preservation Rights Basis - URI
Additional Recommendations for Implementation
The following activities are recommended for implementation phase efforts:
Adjust or expand metadata as needed to accommodate specific event-level functionality once more implementation details are known. Specific event outcome details (e.g. fixity check results) could be stored as lengthy text notes, but may benefit from more granular field definitions.
Revisit metadata units for optimization in their final implementation serialization: revisions may be needed if the metadata is stored in RDF vs. XML vs. a relational database
Index and/or enable events and workflows metadata in search or reporting capabilities, so that content curators are able to monitor preservation health of their content and the system as a whole
Work with implementation team to determine any required events that should always occur for a given preservation workflow, so that if the events do not run when expected, this information is also recorded
For migration purposes, migrate legacy content’s PREMIS and/or audit trails as supplemental preservation files and generate new preservation metadata moving forward relative to the date of re-ingest to the DLP repository
Establish a local vocabulary for labeling/relating supplemental preservation files, extending the PCDM File Use vocabulary if appropriate (see suggested values)