Note: this documentation is still under development; additional sections are forthcoming.
Overview
The book import process includes the following steps, some of which will require assistance from the LTDS team:
- Preparation of pull-list spreadsheet with metadata and file paths per volume
- Export of Alma records for all books/serials in the collection
- Preparation of Collection-level metadata spreadsheet
- File transfer of all needed files, using the directory structure recorded in the pull-list filepaths
- Curate bulk import process
Metadata Preparation
Digitized books utilize metadata from two sources: the original pull-list spreadsheet used for digitization reviews as well as Alma catalog records.
The following Pull-list metadata fields are required for ingest into the repository:
- Holding Repository
- System of Record ID (Alma MMSID)
- Content Type
- Emory Rights Statement (Rights - Public Note)
- Rights Statement - (Desc - RightsStatement.org Designation (URI))
- Data Classifications
- Visibility
- Institution
The following metadata fields are required in books/serials’ Alma records for ingest into the repository:
- Title
- Date Issued/Date Created
Reformatting Pull-List Spreadsheets for Curate Ingest
The following spreadsheet template shows the required or recommended for formatting for a Curate-ready pull-list. While the pull-lists prepared during the digitization and review process may vary, the following columns are required for Curate.
Required fields are indicated with an asterisk.
Note: additional metadata will also be extracted from Alma/MARC catalog records; the following fields are recommended for the pull-list itself.
Column Heading | Explanation |
---|---|
Item ID | A numeric ID for each individual work in the spreadsheet(e.g.row number) |
deduplication_key* | A unique ID for each individual volume in the collection; typically an ARK or barcode number |
other_identifiers | concatenated list of other local identifiers e.g. barcodes, digwf IDs, OCLC, etc. |
emory_ark | Emory ARK id, if applicable |
system_of_record_ID* | Alma MMSID |
institution* | Name(s) of institutions providing the material, e.g. Emory University |
holding_repository* | Name of Library providing the material |
administrative_unit | Name of administrative unit within the Library, if applicable |
CSV Call Number | Not required but useful for reference. Call number will be supplied from Alma. |
Enumeration | |
CSV Title | |
content_type* | |
emory_rights_statements* | |
internal_rights_note | |
rights_statement* | |
visibility* | |
data_classifications* | |
sensitive_material | |
sensitive_material_note | |
transfer_engineer | |
date_digitized | |
Base_Path | The base directory path where content files are stored on the server |
MBytes | The overall file size for all content files in the work |
PDF_Path | The base directory path for volume-level PDF file for the work |
PDF_Cnt | The count of PDF files to be imported |
OCR_Path | The base directory path for volume-level OCR file for the work |
OCR_Cnt | The count of volume-level OCR files to be imported |
Disp_Path | Directory containing the page level image files (TIFFs) > Primary Content: Preservation Master File |
Disp_Cnt | The count of page-level image files to be imported |
Txt_Path | Directory containing the page level plain text files > Primary Content: Transcript File |
Txt_Cnt | The count of page-level text files to be imported |
POS_Path | For Kirtas outputs: directory containing the page level POS files > Primary Content: Extracted Text File |
POS_Cnt | For Kirtas outputs: count of page level POS files to be imported |
ALTO_Path | For LIMB outputs: directory containing the page level Alto XML files > Primary Content: Extracted Text File |
ALTO_Cnt | For LIMB outputs: count of page-level ALTO xml files to be imported |
METS_Path | For LIMB outputs: directory for volume-level METS file to be imported |
METS_Cnt | For LIMB outputs: count of volume-level METS file to be imported |
Accession.workflow_rights_basis | |
Accession.workflow_rights_basis_date | |
Accession.workflow_rights_basis_reviewer | |
Accession.workflow_rights_basis_note | |
Ingest.workflow_rights_basis | |
Ingest.workflow_rights_basis_date | |
Ingest.workflow_rights_basis_reviewer | |
Ingest.workflow_rights_basis_note | |
Ingest.workflow_notes |