Preservation Services in the Cor Repository

Emory’s Cor preservation repository provides robust preservation support for Emory’s unique and rare digital assets. Preservation services that the repository provides include:

  • Fixity checking
  • Extraction of technical and administrative metadata
  • Virus checking
  • File format identification and validation
  • Replication of files

Works ingested to the repository also receive preservation events and workflow metadata to document major lifecycle activities such as Accessioning, Ingest, Decommissioning, and Deletion.

Storage

Content files submitted to the Cor repository are first transferred to a pre-ingest space (Amazon EFS storage). These files are retained for a minimum of 6 months after a Collection is fully ingested. Once quality assurance testing is completed, these files are then transferred to Glacier storage and retained permanently.

Ingested content files are stored in Amazon S3 in the US-East region (Virginia) and are retained permanently.

Every 48 hours, newly ingested content files are replicated to a second S3 bucket located in the US-West region (Oregon).

Backups and Restoration

The Cor repository receives regular backups of the following preservation data, which are retained for a minimum of 6 months:

  • Application databases: backed up daily
  • Fedora (preservation metadata): backed up daily
  • SOLR index: backed up daily
  • S3 content storage (replicated every 48 hours)

A full backup and restoration of production data was last tested in June, 2020.

Implemented Preservation Events and Workflows: Summary

The Cor repository supports both major preservation lifecycle workflows as well as specific preservation events/actions. More information about requirements identified by the Preservation Functional Requirements Group is available on our wiki.

System-generated Preservation Events

Event/Service

Workflow Context

Works

Preservation Master Files

Derivative Files

Notes

Policy assignment

Ingest

X



Works: records the Visibility/access control assigned at time of Ingest

ModificationAt-rest/monitoring


Works: records the initiating user and timestamp when a work is modified after ingest

Validation

Ingest

X

X


Works: validates that SIP includes all required components

Files: FITS validation for the identified format

Virus check

Ingest


X


Master file only is scanned at time of ingest

Message digest calculation

Ingest


X

X

All files: sha1

Master file:

sha1, md5, sha256 

Characterization

Ingest


X

X *

*Derivatives receive minimal characterization

File submission

Ingest


X

X

All files are submitted to preservation storage and then a second copy is replicated

Fixity check

Accession, Ingest


X

X

Files transferred to AWS in bulk receive fixity checking, but events are not recorded until ingest

Fixity services check all files using sha1. Both copies of files in S3 are checked every six months. 

Can also be run on-demand in the Curate product

Preservation Workflows

The following major lifecycle workflows were initially identified through the Digital Preservation Functional Requirements Group and have been further refined during the implementation of the repository system. In version 1 of the Cor repository, some workflows are not fully automated, and some workflows are not yet implemented. Future releases of the repository will expand on this initial functionality.

Workflow

Description

Status

Implementation Notes

Accession

Process by which depositors prepare the components of a digital object for submission to Emory’s preservation repository; most activities occur outside of the repository system

Implemented

Manual processes for appraisal and preparation of material

Fixity checking during file transfers to pre-ingest storage

Automated processes for generating ingest-ready submission packages

Repository support for Accession workflow metadata 

Ingest

Process in which the repository software collects or generates the components of a digital object and transfers it to the preservation environment

Implemented 

Repository performs highest priority preservation events and provides support for Ingest workflow metadata

At-rest/
Monitoring

Ongoing monitoring of ingested objects and files

Partial

Repository provides fixity checking (ongoing and on-demand) with basic reporting

Repository provides storage replication and monitoring 

Versioning

Formal capture of modifications to an ingested object and its files so that an actionable version history is created

Planned

Repository enables basic audit trails for FileSets

Dissemination

Large-scale dissemination of objects to third parties for preservation or discovery

Planned


Decommission

Long-term or permanent removal of object from public access

Implemented 

Manual review processes with support for Decommission workflow metadata

Deletion

Permanent removal of content files from public access and repository

Implemented

Manual review and deletion processes with support for Deletion workflow metadata

Page Contents: