Page Comparison

...

The following spreadsheet template shows the required or recommended for formatting for a Curate-ready pull-list. While the pull-lists prepared during the digitization and review process may vary, the following columns are required for Curate's bulk import method. For information about metadata requirements, see the Cor Metadata Field Usage documentation.

...

Column Heading	Explanation
Item ID	A numeric ID for each individual work in the spreadsheet(e.g. the original row number). Recommended for cross-referencing across pull-list versions later.
deduplication_key*	A unique ID for each individual volume in the collection; typically an ARK or barcode number
other_identifiers	concatenated list of other local identifiers e.g. barcodes, digwf IDs, OCLC, etc. Identifiers should contain a prefix indicating their type, and multiple values should be separated by pipes
emory_ark	Emory ARK id, if applicable
system_of_record_ID*	Alma MMSID
institution*	Name(s) of institutions providing the material, e.g. Emory University
holding_repository*	Name of Library providing the material
administrative_unit	Name of administrative unit within the Library, if applicable
CSV Call Number	Call number will be supplied from Alma, but it is useful to have this on the pull-list for reference.
Enumeration	Volume-level enumeration, if applicable (e.g. Volume 1, Copy 1, Edition etc.)
CSV Title	Title will be supplied from Alma, but it is useful to have this on the pull-list for reference.
content_type*	Supplied as URI. Recommended value: http://id.loc.gov/vocabulary/resourceTypes/txt
emory_rights_statements*	The Emory Libraries supplied rights statement
internal_rights_note	Additional internal rights notes or documentation
rights_statement*	Supplied as URI from rights statement.org values, e.g. http://rightsstatements.org/vocab/NoC-US/1.0/
visibility*	See available access controls (Public, Public Low View, Emory Low Download, Rose High View, Private)
data_classifications*	Emory defined data classification type: Public, Confidential, Internal, Restricted
sensitive_material	Indicate "Yes" if the volume contains sensitive material
sensitive_material_note	Provide additional context for any sensitive material determination
transfer_engineer	The name of the digitization technician
date_digitized	The date of digitization for the volume (EDTF format)
Barcode*	This is used to generate certain volume-level filenames
Base_Path*	The base directory path where content files are stored on the server
MBytes*	The overall file size for all content files in the work
PDF_Path**	The base directory path for volume-level PDF file for the work
PDF_Cnt**	The count of PDF files to be imported
OCR_Path**	The base directory path for volume-level OCR file for the work
OCR_Cnt**	The count of volume-level OCR files to be imported
Disp_Path*	Directory containing the page level image files (TIFFs) > Primary Content: Preservation Master File
Disp_Cnt*	The count of page-level image files to be imported
Txt_Path**	Directory containing the page level plain text files > Primary Content: Transcript File
Txt_Cnt**	The count of page-level text files to be imported
POS_Path**	For Kirtas outputs: directory containing the page level POS files > Primary Content: Extracted Text File
POS_Cnt**	For Kirtas outputs: count of page level POS files to be imported
ALTO_Path**	For LIMB outputs: directory containing the page level Alto XML files > Primary Content: Extracted Text File
ALTO_Cnt**	For LIMB outputs: count of page-level ALTO xml files to be imported
METS_Path**	For LIMB outputs: directory for volume-level METS file to be imported
METS_Cnt**	For LIMB outputs: count of volume-level METS file to be imported
Accession.workflow_rights_basis	Rights basis determination (e.g. Public Domain) for digitization
Accession.workflow_rights_basis_date	Date of rights review (EDTF format)
Accession.workflow_rights_basis_reviewer	Name of individual or office performing rights review
Accession.workflow_rights_basis_note	Rights-related notes about digitization/preservation
Accession.workflow_notes	General notes about digitization/preservation or aquisition
Ingest.workflow_rights_basis	Rights basis determination (e.g. Public Domain) for digitization/preservation
Ingest.workflow_rights_basis_date	Date of rights review (EDTF format)
Ingest.workflow_rights_basis_reviewer	Name of individual or office performing rights review
Ingest.workflow_rights_basis_note	Rights-related notes about ingest or migration
Ingest.workflow_notes	General notes about ingest or migration, e.g. Migrated to Cor repository from LSDI Kirtas workflow during Phase 1 Migrations, 2019

...

Kirtas outputs
LIMB outputs
[Barcode#].pdf
[Barcode#].mets.xml

Page-Level Files

The Curate book import preprocessor makes the following assumptions:

...

Version	Old Version 5	New Version 6
Changes made by	Emily Porter	Emily Porter
Saved on	Nov 09, 2020	Nov 09, 2020

Versions Compared

Key

Page-Level Files