Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Column HeadingExplanation
Item IDA numeric ID for each individual work in the spreadsheet(e.g. the original row number). Recommended for cross-referencing across pull-list versions later.
deduplication_key*A unique ID for each individual volume in the collection; typically an ARK or barcode number
other_identifiersconcatenated list of other local identifiers e.g. barcodes, digwf IDs, OCLC, etc. Identifiers should contain a prefix indicating their type, and multiple values should be separated by pipes
emory_arkEmory ARK id, if applicable
system_of_record_ID*Alma MMSID
institution*Name(s) of institutions providing the material, e.g. Emory University
holding_repository*Name of Library providing the material
administrative_unitName of administrative unit within the Library, if applicable
CSV Call NumberCall The call number will be supplied from Alma, but it is useful to have this on the pull-list for reference. 
EnumerationVolume-level enumeration, if applicable (e.g. Volume 1, Copy 1, Edition etc.)
CSV TitleTitle will be supplied from Alma, but it is useful to have this on the pull-list for reference. 
content_type*Supplied as URI. Recommended value: http://id.loc.gov/vocabulary/resourceTypes/txt
emory_rights_statements*The Emory Libraries supplied rights statement
internal_rights_noteAdditional internal rights notes or documentation
rights_statement*Supplied as URI from rights statement.org values, e.g. http://rightsstatements.org/vocab/NoC-US/1.0/
visibility*See available access controls (Public, Public Low View, Emory Low Download, Rose High View, Private)
data_classifications*Emory defined data classification type: Public, Confidential, Internal, Restricted
sensitive_materialIndicate "Yes" if the volume contains sensitive material
sensitive_material_noteProvide additional context for any sensitive material determination
transfer_engineerThe name of the digitization technician
date_digitizedThe date of digitization for the volume (EDTF format)
Barcode*This is used to generate certain volume-level filenames
Base_Path*The base directory path where content files are stored on the server
MBytes*The overall file size for all content files in the work
PDF_Path**The base directory path for volume-level PDF file for the work
PDF_Cnt**The count of PDF files to be imported
OCR_Path**The base directory path for volume-level OCR file for the work 
OCR_Cnt**The count of volume-level OCR files to be imported
Disp_Path*Directory containing the page level image files (TIFFs) > Primary Content: Preservation Master File
Disp_Cnt*The count of page-level image files to be imported
Txt_Path**Directory containing the page level plain text files > Primary Content: Transcript File
Txt_Cnt**The count of page-level text files to be imported
POS_Path**For Kirtas outputs: directory containing the page level POS files > Primary Content: Extracted Text File
POS_Cnt**For Kirtas outputs: count of page level POS files to be imported
ALTO_Path**For LIMB outputs: directory containing the page level Alto XML files > Primary Content: Extracted Text File 
ALTO_Cnt**For LIMB outputs: count of page-level ALTO xml files to be imported
METS_Path**For LIMB outputs: directory for volume-level METS file to be imported
METS_Cnt**For LIMB outputs: count of volume-level METS file to be imported
Accession.workflow_rights_basisRights basis determination (e.g. Public Domain) for digitization
Accession.workflow_rights_basis_dateDate of rights review (EDTF format)
Accession.workflow_rights_basis_reviewerName of individual or office performing rights review
Accession.workflow_rights_basis_noteRights-related notes about digitization/preservation
Accession.workflow_notesGeneral notes about digitization/preservation or aquisition
Ingest.workflow_rights_basisRights basis determination (e.g. Public Domain) for digitization/preservation
Ingest.workflow_rights_basis_dateDate of rights review (EDTF format)
Ingest.workflow_rights_basis_reviewerName of individual or office performing rights review
Ingest.workflow_rights_basis_noteRights-related notes about ingest or migration
Ingest.workflow_notesGeneral notes about ingest or migration, e.g. Migrated to Cor repository from LSDI Kirtas workflow during Phase 1 Migrations, 2019

...

Filename Conventions for Bulk Import

The Curate bulk-import process is optimized to work with the following filename conventions in use within digitized book collections. If your collection's files use a different convention, please contact LTDS for support.

Volume-Level Files

The Curate book import preprocessor makes the following assumptions:

  • Kirtas outputs use "Output" as the base filename for the volume-level PDF and OCR files:
    • Output.pdf
    • Output.xml
  • LIMB outputs use the barcode number for the volume as the filename for the volume-level PDF and METS files:
    • [Barcode#].pdf
    • [Barcode#].mets.xml

Page-Level Files

The While file naming practices may vary, it is strongly recommended that all filenames contain or end with a numeric part sequence, such as "0001.tif". The Curate book import preprocessor makes the following assumptions about page-level files:

  • Kirtas files must filenames have 4 digits using 0 as padding (0001.tif, 0085.tif, etc. )
  • LIMB files must filenames have 8 digits using 0 as padding (00000001.tif, 00000085.tif, etc.)

Some file sequences start with zero, some with one. This should be identified as part of the collection preparation process.

...