Note: this documentation is still under development; additional sections are forthcoming.
Overview
The book import process includes the following steps, some of which will require assistance from the LTDS team:
- Preparation of pull-list spreadsheet with metadata and file paths per volume
- Export of Alma records for all books/serials in the collection
- Preparation of Collection-level metadata spreadsheet
- File transfer of all needed files, using the directory structure recorded in the pull-list filepaths
- Curate bulk import process
Metadata Preparation
Digitized books utilize metadata from two sources: the original pull-list spreadsheet used for digitization reviews as well as Alma catalog records.
The following Pull-list metadata fields are required for ingest into the repository:
- Holding Repository
- System of Record ID (Alma MMSID)
- Content Type
- Emory Rights Statement (Rights - Public Note)
- Rights Statement - (Desc - RightsStatement.org Designation (URI))
- Data Classifications
- Visibility
- Institution
The following metadata fields are required in books/serials’ Alma records for ingest into the repository:
- Title
- Date Issued/Date Created
Reformatting Pull-List Spreadsheets for Curate Ingest
The following spreadsheet template shows the required or recommended for formatting for a Curate-ready pull-list. While the pull-lists prepared during the digitization and review process may vary, the following columns are required for Curate. Note: additional metadata will also be extracted from Alma records; the following fields are recommended for the pull-list itself.
Column Heading | Explanation |
---|---|
Item ID | A numeric ID for each individual work in the spreadsheet(e.g.row number) |
deduplication_key | A unique ID for each individual volume in the collection; typically an ARK or barcode number |
other_identifiers | |
emory_ark | |
system_of_record_ID | Alma MMSID |
institution | |
holding_repository | |
administrative_unit | |
CSV Call Number | Not required but useful for reference. Call number will be supplied from Alma. |
Enumeration | |
CSV Title | |
content_type | |
OCLC Number | |
ALMA MMSID | |
Barcode | |
DigWF ID | |
emory_rights_statements | |
internal_rights_note | |
rights_statement | |
visibility | |
data_classifications | |
sensitive_material | |
sensitive_material_note | |
transfer_engineer | |
date_digitized | |
Base_Path | The base directory path where content files are stored on the server |
MBytes | The overall file size for all content files in the work |
PDF_Path | The base directory path for volume-level PDF file for the work |
PDF_Cnt | The count of PDF files to be imported |
OCR_Path | The base directory path for volume-level OCR file for the work |
OCR_Cnt | The count of volume-level OCR files to be imported |
Disp_Path | |
Disp_Cnt | |
Txt_Path | |
Txt_Cnt | |
POS_Path | |
POS_Cnt | |
Accession.workflow_rights_basis | |
Accession.workflow_rights_basis_date | |
Accession.workflow_rights_basis_reviewer | |
Accession.workflow_rights_basis_note | |
Ingest.workflow_rights_basis | |
Ingest.workflow_rights_basis_date | |
Ingest.workflow_rights_basis_reviewer | |
Ingest.workflow_rights_basis_note | |
Ingest.workflow_notes |