Emory’s Cor preservation repository provides robust preservation support for Emory’s unique and rare digital assets. Preservation services that the repository provides include:
Fixity checking
Extraction of technical and administrative metadata
Virus checking
File format identification and validation
Replication of files
Works ingested to the repository also receive preservation events and workflow metadata to document major lifecycle activities such as Accessioning, Ingest, Decommissioning, and Deletion.
Storage
Content files submitted to the Cor repository are first transferred to a pre-ingest space. These files are retained for a minimum of 6 months after a Collection is fully ingested. Once quality assurance testing is completed, these files are then transferred to Glacier storage and retained permanently.
Ingested content files are stored in Amazon S3 in the US-East region (Virginia) and are retained permanently.
Every 48 hours, newly ingested content files are replicated to a second S3 bucket located in the US-West region (Oregon).
Backups and Restoration
The Cor repository receives regular backups of the following preservation data, which are retained for a minimum of 6 months:
Application databases: backed up daily
Fedora (preservation metadata): backed up daily
SOLR index: backed up daily
S3 content storage (replicated every 48 hours)
A full backup and restoration of production data was last tested in June, 2020.
Implemented Preservation Events and Workflows: Summary
The Cor repository supports both major preservation lifecycle workflows as well as specific preservation events/actions. More information about requirements identified by the Preservation Functional Requirements Group is available on our wiki.
System-generated Preservation Events
Event/Service
Workflow Context
Works
Preservation Master Files
Derivative Files
Notes
Policy assignment
Ingest
X
Works: records the Visibility/access control assigned at time of Ingest
Modification
At-rest/monitoring
Works: records the initiating user and timestamp when a work is modified after ingest
Validation
Ingest
X
X
Works: validates that SIP includes all required components
Files: FITS validation for the identified format
Virus check
Ingest
X
Master file only is scanned at time of ingest
Message digest calculation
Ingest
X
X
All files: sha1
Master file:
sha1, md5, sha256
Characterization
Ingest
X
X *
*Derivatives receive minimal characterization
File submission
Ingest
X
X
All files are submitted to preservation storage and then a second copy is replicated
Fixity check
Accession, Ingest
X
X
Files transferred to AWS in bulk receive fixity checking, but events are not recorded until ingest
Fixity services check all files using sha1
Can also be run on-demand in the Curate product
Preservation Workflows
The following major lifecycle workflows were initially identified through the Digital Preservation Functional Requirements Group and have been further refined during the implementation of the repository system. In version 1 of the Cor repository, some workflows are not fully automated, and some workflows are not yet implemented. Future releases of the repository will expand on this initial functionality.
Workflow
Description
Status
Implementation Notes
Accession
Process by which depositors prepare the components of a digital object for submission to Emory’s preservation repository; most activities occur outside of the repository system
Implemented
Manual processes for appraisal and preparation of material
Fixity checking during file transfers to pre-ingest storage
Automated processes for generating ingest-ready submission packages
Repository support for Accession workflow metadata
Ingest
Process in which the repository software collects or generates the components of a digital object and transfers it to the preservation environment
Implemented
Repository performs highest priority preservation events and provides support for Ingest workflow metadata
At-rest/ Monitoring
Ongoing monitoring of ingested objects and files
Partial
Repository provides fixity checking (ongoing and on-demand) with basic reporting
Repository provides storage replication and monitoring
Versioning
Formal capture of modifications to an ingested object and its files so that an actionable version history is created
Planned
Repository enables basic audit trails for FileSets
Dissemination
Large-scale dissemination of objects to third parties for preservation or discovery
Planned
Decommission
Long-term or permanent removal of object from public access
Implemented
Manual review processes with support for Decommission workflow metadata
Deletion
Permanent removal of content files from public access and repository
Implemented
Manual review and deletion processes with support for Deletion workflow metadata