Preservation Services in the Cor Repository
- Emily Porter
Emory’s Cor preservation repository provides robust preservation support for Emory’s unique and rare digital assets. Preservation services that the repository provides include:
- Fixity checking
- Extraction of technical and administrative metadata
- Virus checking
- File format identification and validation
- Replication of files
Works ingested to the repository also receive preservation events and workflow metadata to document major lifecycle activities such as Accessioning, Ingest, Decommissioning, and Deletion.
Storage
Content files submitted to the Cor repository are first transferred to a pre-ingest space (Amazon EFS storage). These files are retained for a minimum of 6 months after a Collection is fully ingested. Once quality assurance testing is completed, these files are then transferred to Glacier storage and retained permanently.
Ingested content files are stored in Amazon S3 in the US-East region (Virginia) and are retained permanently.
Every 48 hours, newly ingested content files are replicated to a second S3 bucket located in the US-West region (Oregon).
Backups and Restoration
The Cor repository receives regular backups of the following preservation data, which are retained for a minimum of 6 months:
- Application databases: backed up daily
- Fedora (preservation metadata): backed up daily
- SOLR index: backed up daily
- S3 content storage (replicated every 48 hours)
A full backup and restoration of production data was last tested in June, 2020.
Implemented Preservation Events and Workflows: Summary
The Cor repository supports both major preservation lifecycle workflows as well as specific preservation events/actions. More information about requirements identified by the Preservation Functional Requirements Group is available on our wiki.
System-generated Preservation Events
Event/Service | Workflow Context | Works | Preservation Master Files | Derivative Files | Notes |
Policy assignment | Ingest | X | Works: records the Visibility/access control assigned at time of Ingest | ||
Modification | At-rest/monitoring | Works: records the initiating user and timestamp when a work is modified after ingest | |||
Validation | Ingest | X | X | Works: validates that SIP includes all required components Files: FITS validation for the identified format | |
Virus check | Ingest | X | Master file only is scanned at time of ingest | ||
Message digest calculation | Ingest | X | X | All files: sha1 Master file: sha1, md5, sha256 | |
Characterization | Ingest | X | X * | *Derivatives receive minimal characterization | |
File submission | Ingest | X | X | All files are submitted to preservation storage and then a second copy is replicated | |
Fixity check | Accession, Ingest | X | X | Files transferred to AWS in bulk receive fixity checking, but events are not recorded until ingest Fixity services check all files using sha1. Both copies of files in S3 are checked every six months. Can also be run on-demand in the Curate product |
Preservation Workflows
The following major lifecycle workflows were initially identified through the Digital Preservation Functional Requirements Group and have been further refined during the implementation of the repository system. In version 1 of the Cor repository, some workflows are not fully automated, and some workflows are not yet implemented. Future releases of the repository will expand on this initial functionality.
Workflow | Description | Status | Implementation Notes |
Accession | Process by which depositors prepare the components of a digital object for submission to Emory’s preservation repository; most activities occur outside of the repository system | Implemented | Manual processes for appraisal and preparation of material Fixity checking during file transfers to pre-ingest storage Automated processes for generating ingest-ready submission packages Repository support for Accession workflow metadata |
Ingest | Process in which the repository software collects or generates the components of a digital object and transfers it to the preservation environment | Implemented | Repository performs highest priority preservation events and provides support for Ingest workflow metadata |
At-rest/ | Ongoing monitoring of ingested objects and files | Partial | Repository provides fixity checking (ongoing and on-demand) with basic reporting Repository provides storage replication and monitoring |
Versioning | Formal capture of modifications to an ingested object and its files so that an actionable version history is created | Planned | Repository enables basic audit trails for FileSets |
Dissemination | Large-scale dissemination of objects to third parties for preservation or discovery | Planned | |
Decommission | Long-term or permanent removal of object from public access | Implemented | Manual review processes with support for Decommission workflow metadata |
Deletion | Permanent removal of content files from public access and repository | Implemented | Manual review and deletion processes with support for Deletion workflow metadata |
Page Contents: