Page Comparison

...

The following spreadsheet template shows the required or recommended for formatting for a Curate-ready pull-list. While the pull-lists prepared during the digitization and review process may vary, the following columns are required for Curate.

Required fields are indicated with an asterisk.

Note: additional metadata will also be extracted from Alma/MARC catalog records; the following fields are recommended for the pull-list itself.

Column Heading	Explanation
Item ID	A numeric ID for each individual work in the spreadsheet(e.g.row number)
deduplication_key*	A unique ID for each individual volume in the collection; typically an ARK or barcode number
other_identifiers	concatenated list of other local identifiers e.g. barcodes, digwf IDs, OCLC, etc.
emory_ark	Emory ARK id, if applicable
system_of_record_ID*	Alma MMSID
institution*	Name(s) of institutions providing the material, e.g. Emory University
holding_repository*	Name of Library providing the material
administrative_unit	Name of administrative unit within the Library, if applicable
CSV Call Number	Not required but useful for reference. Call number will be supplied from Alma.
Enumeration
CSV Title
content_type	OCLC Number	ALMA MMSID	Barcode	DigWF ID	*
emory_rights_statements*
internal_rights_note
rights_statement*
visibility*
data_classifications*
sensitive_material
sensitive_material_note
transfer_engineer
date_digitized
Base_Path	The base directory path where content files are stored on the server
MBytes	The overall file size for all content files in the work
PDF_Path	The base directory path for volume-level PDF file for the work
PDF_Cnt	The count of PDF files to be imported
OCR_Path	The base directory path for volume-level OCR file for the work
OCR_Cnt	The count of volume-level OCR files to be imported
Disp_Path	Directory containing the page level image files (TIFFs) > Primary Content: Preservation Master File
Disp_Cnt	The count of page-level image files to be imported
Txt_Path	Directory containing the page level plain text files > Primary Content: Transcript File
Txt_Cnt	The count of page-level text files to be imported
POS_Path	For Kirtas outputs: directory containing the page level POS files > Primary Content: Extracted Text File
POS_Cnt	For Kirtas outputs: count of page level POS files to be imported
ALTO_Path	For LIMB outputs: directory containing the page level Alto XML files > Primary Content: Extracted Text File
ALTO_Cnt	For LIMB outputs: count of page-level ALTO xml files to be imported
METS_Path	For LIMB outputs: directory for volume-level METS file to be imported
METS_Cnt	For LIMB outputs: count of volume-level METS file to be imported
Accession.workflow_rights_basis
Accession.workflow_rights_basis_date
Accession.workflow_rights_basis_reviewer
Accession.workflow_rights_basis_note
Ingest.workflow_rights_basis
Ingest.workflow_rights_basis_date
Ingest.workflow_rights_basis_reviewer
Ingest.workflow_rights_basis_note
Ingest.workflow_notes

...

Version	Old Version 2	New Version 3
Changes made by	Emily Porter	Emily Porter
Saved on	Nov 09, 2020	Nov 09, 2020

Versions Compared

Key