...
Column Heading | Explanation |
---|---|
Item ID | A numeric ID for each individual work in the spreadsheet(e.g. the original row number). Recommended for cross-referencing across pull-list versions later. |
deduplication_key* | A unique ID for each individual volume in the collection; typically an ARK or barcode number |
other_identifiers | concatenated list of other local identifiers e.g. barcodes, digwf IDs, OCLC, etc. Identifiers should contain a prefix indicating their type, and multiple values should be separated by pipes |
emory_ark | Emory ARK id, if applicable |
system_of_record_ID* | Alma MMSID |
institution* | Name(s) of institutions providing the material, e.g. Emory University |
holding_repository* | Name of Library providing the material |
administrative_unit | Name of administrative unit within the Library, if applicable |
CSV Call Number | Call The call number will be supplied from Alma, but it is useful to have this on the pull-list for reference. |
Enumeration | Volume-level enumeration, if applicable (e.g. Volume 1, Copy 1, Edition etc.) |
CSV Title | Title will be supplied from Alma, but it is useful to have this on the pull-list for reference. |
content_type* | Supplied as URI. Recommended value: http://id.loc.gov/vocabulary/resourceTypes/txt |
emory_rights_statements* | The Emory Libraries supplied rights statement |
internal_rights_note | Additional internal rights notes or documentation |
rights_statement* | Supplied as URI from rights statement.org values, e.g. http://rightsstatements.org/vocab/NoC-US/1.0/ |
visibility* | See available access controls (Public, Public Low View, Emory Low Download, Rose High View, Private) |
data_classifications* | Emory defined data classification type: Public, Confidential, Internal, Restricted |
sensitive_material | Indicate "Yes" if the volume contains sensitive material |
sensitive_material_note | Provide additional context for any sensitive material determination |
transfer_engineer | The name of the digitization technician |
date_digitized | The date of digitization for the volume (EDTF format) |
Barcode* | This is used to generate certain volume-level filenames |
Base_Path* | The base directory path where content files are stored on the server |
MBytes* | The overall file size for all content files in the work |
PDF_Path** | The base directory path for volume-level PDF file for the work |
PDF_Cnt** | The count of PDF files to be imported |
OCR_Path** | The base directory path for volume-level OCR file for the work |
OCR_Cnt** | The count of volume-level OCR files to be imported |
Disp_Path* | Directory containing the page level image files (TIFFs) > Primary Content: Preservation Master File |
Disp_Cnt* | The count of page-level image files to be imported |
Txt_Path** | Directory containing the page level plain text files > Primary Content: Transcript File |
Txt_Cnt** | The count of page-level text files to be imported |
POS_Path** | For Kirtas outputs: directory containing the page level POS files > Primary Content: Extracted Text File |
POS_Cnt** | For Kirtas outputs: count of page level POS files to be imported |
ALTO_Path** | For LIMB outputs: directory containing the page level Alto XML files > Primary Content: Extracted Text File |
ALTO_Cnt** | For LIMB outputs: count of page-level ALTO xml files to be imported |
METS_Path** | For LIMB outputs: directory for volume-level METS file to be imported |
METS_Cnt** | For LIMB outputs: count of volume-level METS file to be imported |
Accession.workflow_rights_basis | Rights basis determination (e.g. Public Domain) for digitization |
Accession.workflow_rights_basis_date | Date of rights review (EDTF format) |
Accession.workflow_rights_basis_reviewer | Name of individual or office performing rights review |
Accession.workflow_rights_basis_note | Rights-related notes about digitization/preservation |
Accession.workflow_notes | General notes about digitization/preservation or aquisition |
Ingest.workflow_rights_basis | Rights basis determination (e.g. Public Domain) for digitization/preservation |
Ingest.workflow_rights_basis_date | Date of rights review (EDTF format) |
Ingest.workflow_rights_basis_reviewer | Name of individual or office performing rights review |
Ingest.workflow_rights_basis_note | Rights-related notes about ingest or migration |
Ingest.workflow_notes | General notes about ingest or migration, e.g. Migrated to Cor repository from LSDI Kirtas workflow during Phase 1 Migrations, 2019 |
...
Filename Conventions for Bulk Import
The Curate bulk-import process is optimized to work with the following filename conventions in use within digitized book collections. If your collection's files use a different convention, please contact LTDS for support.
Volume-Level Files
The Curate book import preprocessor makes the following assumptions:
- Kirtas outputs use "Output" as the base filename for the volume-level PDF and OCR files:
- Output.pdf
- Output.xml
- LIMB outputs use the barcode number for the volume as the filename for the volume-level PDF and METS files:
- [Barcode#].pdf
- [Barcode#].mets.xml
Page-Level Files
The While file naming practices may vary, it is strongly recommended that all filenames contain or end with a numeric part sequence, such as "0001.tif". The Curate book import preprocessor makes the following assumptions about page-level files:
- Kirtas files must filenames have 4 digits using 0 as padding (0001.tif, 0085.tif, etc. )
- LIMB files must filenames have 8 digits using 0 as padding (00000001.tif, 00000085.tif, etc.)
Some file sequences start with zero, some with one. This should be identified as part of the collection preparation process.
...