Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note: this documentation is still under development; additional sections are forthcoming.

Overview

The book import process includes the following steps, some of which will require assistance from the LTDS team:

  1. Preparation of pull-list spreadsheet with metadata and file paths per volume
  2. Export of Alma records for all books/serials in the collection
  3. Preparation of Collection-level metadata spreadsheet
  4. File transfer of all needed files, using the directory structure recorded in the pull-list filepaths
  5. Curate bulk import process

Metadata Preparation

Digitized books utilize metadata from two sources: the original pull-list spreadsheet used for digitization reviews as well as Alma catalog records.

 The following Pull-list metadata fields are required for ingest into the repository:

  • Holding Repository
  • System of Record ID (Alma MMSID)
  • Content Type
  • Emory Rights Statement (Rights - Public Note)
  • Rights Statement - (Desc - RightsStatement.org Designation (URI))
  • Data Classifications
  • Visibility
  • Institution

 The following metadata fields are required in books/serials’ Alma records for ingest into the repository:

  • Title
  • Date Issued/Date Created

Reformatting Pull-List Spreadsheets for Curate Ingest

The following spreadsheet template shows the required or recommended for formatting for a Curate-ready pull-list. While the pull-lists prepared during the digitization and review process may vary, the following columns are required for Curate. Note: additional metadata will also be extracted from Alma records; the following fields are recommended for the pull-list itself.

Column HeadingExplanation
Item IDA numeric ID for each individual work in the spreadsheet(e.g.row number)
deduplication_keyA unique ID for each individual volume in the collection; typically an ARK or barcode number
other_identifiers
emory_ark
system_of_record_IDAlma MMSID
institution
holding_repository
administrative_unit
CSV Call NumberNot required but useful for reference. Call number will be supplied from Alma.
Enumeration
CSV Title
content_type
OCLC Number
ALMA MMSID
Barcode
DigWF ID
emory_rights_statements
internal_rights_note
rights_statement
visibility
data_classifications
sensitive_material
sensitive_material_note
transfer_engineer
date_digitized
Base_PathThe base directory path where content files are stored on the server
MBytesThe overall file size for all content files in the work
PDF_PathThe base directory path for volume-level PDF file for the work
PDF_CntThe count of PDF files to be imported
OCR_PathThe base directory path for volume-level OCR file for the work 
OCR_CntThe count of volume-level OCR files to be imported
Disp_Path
Disp_Cnt
Txt_Path
Txt_Cnt
POS_Path
POS_Cnt
Accession.workflow_rights_basis
Accession.workflow_rights_basis_date
Accession.workflow_rights_basis_reviewer
Accession.workflow_rights_basis_note
Ingest.workflow_rights_basis
Ingest.workflow_rights_basis_date
Ingest.workflow_rights_basis_reviewer
Ingest.workflow_rights_basis_note
Ingest.workflow_notes