Controlled Vocabulary Usage

Prepared by: DLP Metadata Implementation Working Group

Date: March 2018

Status: Final Draft




Overview

This document supplements the metadata specifications produced by the DLP Metadata Implementation Working Group (M-IWG). MIWG documentation stored as spreadsheets includes a column for Value Constraints, which indicates if a field uses a controlled term entry. Sources and/or values for controlled terms are documented in greater detail here. The authorities and local terms documented here are starter recommendations, but may need to be be expanded over time. Note: some sections of this documentation will remain incomplete until implementation occurs.

Original Google Document (restricted)

Descriptive Metadata

Final metadata specification worksheet

Institution (D1) - Controlled Terms - External

Recommended values are from Library of Congress Name Authority File (LCNAF). Usual value would be “Emory University”, in future could have additional participating institutions

Holding Repository (D2) - Controlled Terms - External

Recommended values are from Library of Congress Name Authority File (LCNAF)

Administrative Unit (D3) - Controlled Terms - Local

Local terms for different administrative units within the same library

  • Data type: String?

  • Sample value: Emory University Archives

  • Range of values:

    1. Emory University Archives

    2. Stuart A. Rose Manuscript, Archives, and Rare Book Library

  • Current systems usage: DAMS, KEEP

Content Type (D5) - Controlled Terms - External

Values are from LC Resource Types . For ETD, there is a local list. Supplemental files only. There is a large list of values on Github ( link ), but only 6 available at current release.

  • Data type: URI, String

  • Sample value: Still Image

  • Range of values: LC Resource Types , ETD Local List ( Video; Image; Text; Dataset; Sound; Software)

  • Current systems usage: DAMS, DB, ETD, OE, Keep

Content Genre (D6) - Controlled Terms - Local/External

Values are from a mix of controlled terms (Getty AAT, LCSH, MARC Genre) and some local (for ETDs?)

Primary Language (D10) - Controlled Terms - External

Values are a mix of local lists (DAMS, DV) and controlled terms  ( ISO639-2b and the MARC List for Languages )

  • Data type: URI

  • Sample value: Latvian

  • Range of values: ISO639-2b ; MARC List for Languages

  • Current systems usage: DAMS (currently uses a incomplete local list), DB (Marc List, ETD (local list - English; French; Spanish, plans to expand in future, possibility to ISO list?), OE (MARC List), Keep (no languages), DV (local list, similar to ISO639)

Place of Publication/Production (D28) - Controlled Terms - External

Recommended values from Geonames or other database (such as Getty TGN?)

Subjects - Topics (D30) - Controlled Terms - External

Recommended values are from LCSH, Getty Vocabs, FAST (ETDs currently uses Proquest research topics - up to three, minimum of 1)

Subject - Names (D31) - Controlled Terms - External

Recommended values from LCNAF, VIAF, ULAN

Subject - Geographic Names (D32) - Controlled Terms - External

Recommended values from Geonames or other database (such as Getty TGN?)

Subject - Time Periods (D33) - Controlled Terms - Local/External

Recommended values from database of time periods (LCSH, AAT?)

Thesis/Dissertation Degree (D39) - Controlled Terms - Local

Subtype of sorts for Submission Type

School (D42) - Controlled Terms - Local

Top level unit

Department/Program (D40) -  Controlled Terms - Local

Mid-level unit -

Academic Subfield/Discipline (D41) - Controlled Terms - Local

Lowest level - Certain Department/Programs within Laney and Rollins have subfields

Publisher Version (D44) - Controlled Terms - Local (could investigate external ontology like SPAR)

  • Data type: String or URI depending?

  • Sample value: Final Published Version

  • Range of values:

    1. Preprint: Prior to Peer Review

    2. Post-print: After Peer Review

    3. Final Publisher PDF

  • Current systems usage: OE

Role - Creator (D45) - Controlled Terms - External/Local?

Recommended values are from either locally created ID’s or from external databases from LCNAF, VIAF, etc...

Role - Contributor (D46) - Controlled Terms - External/Local?

Recommended values from locally created ID’s or from external databases from LCNAF, VIAF, etc...

  • Data type: String or URI depending?

  • Sample value: John Doe

  • Range of values: LCNAF ; VIAF , Emory Shared Data, local terms...  

  • Current systems usage: DAMS, DB,  OE, Keep, DV

Role - Thesis/Dissertation Advisor (D47) - Controlled Terms - Local/External?

Free-text currently. Plans to connect to ESD

  • Data type: String

  • Sample value: Witte,John

  • Range of values: N/A

  • Current systems usage: ETD

Role - Committee Member (D48) - Controlled Terms - Local

Free-text currently. Plans to connect to ESD

  • Data type: String

  • Sample value: Smith,Ted A.

  • Range of values: N/A

  • Current systems usage: ETD

Role - Degree Granting Institution (D49) - Controlled Terms - Local

Similar to Institution (D1) regarding values. Is stored (for use with Proquest exports) but not displayed locally.

Role - Sponsor (D50) - Controlled Terms - Local

Locally created IDs or piped from Emory personnel database

  • Data type: String or URI depending?

  • Sample value: Harold K. Simon

  • Range of values: [from Emory Shared Data feed]

  • Current systems usage: OE

Role - Partnering Agencies (D52) - Controlled Terms - Local

Local list - Rollins only. Can select multiple partnering agencies.

  • Data type: String
  • Sample value: Centers for Disease Control and Prevention
  • Range of values: ETD local values

  • Current systems usage: ETD

Role - Grant/Funding Agency (D53) - Controlled Terms - Local

Locally created list of terms; if an appropriate data source can be determined we could use an authority instead.

  • Data type: String?

  • Sample value: National Institute of Environmental Health Sciences : NIEHS

  • Range of values:

  • Current systems usage: OE, DV

Creator/Contributor - Institutional Affiliation (D55) - Controlled Terms - External

Similar to Institution (D1) and Role - Degree Granting Institution (D49)

Rights Statement - Controlled (D57) - Controlled Terms - External

Values will be taken from rightsstatement.org (available by default in Hyrax).

Rights Holder (D58) - Controlled Terms - External/Local

Prefer controlled name entries if available. May need to be locally managed name entries or auto-suggested terms.

Re-use License (D60) - Controlled Terms - External

Values will be taken from Creative Commons, or other vocabularies if applicable (GNU, ODL). Current usage is Creative Commons only. For ETD, might be a future release

Geographic Unit (D66) - Controlled Terms - Local

Currently free-text, but recommend we either implement the below set of local terms, or configure this field to auto-suggest already stored values.

Preservation Metadata

Final Preservation Events/Workflows metadata worksheet

Preservation Event/Workflow Type (PE3)

Controlled terms from LC Preservation Events vocabulary

Initiating User (PE5)

Controlled terms - username or system process name (Names are supplied by the DLP application)

  • Data type: text or URI (depending on implementation)

  • Sample value: eporter

  • Range of values: List of systems/vetted users

  • Current systems usage: New/DLP

Preservation Event/Workflow - Rights Basis (PE12)

Local Controlled terms

  • Data type: String

  • Sample value: Deed of Gift/Sale

  • Range of values:

    • Preservation System Policy (default value unless manually overridden)

    • In copyright

    • In copyright - Section 108

    • In copyright - Section 107

    • Public Domain

    • License

    • Deed of Gift/Sale

    • Institutional Policy

    • Statute

    • Administrative Signoff

  • Current systems usage: New/DLP

File Use Vocabulary - Local Terms

Proposed additions to the PCDM File Use Vocabulary, which is used to relate files within a digital object (used in a preservation package). These terms could be used as relationship entries, or they could serve as a file-labeling scheme, depending on implementation needs.

  • Data type: URI or free text

  • Sample value: http://pcdm.org/use#ExtractedText

  • Range of values: Original PCDM File Use Vocabulary terms:

    • extracted text

    • intermediate file

    • original file

    • preservation metadata file

    • service file

    • thumbnail image

    • transcript

    • character positioning data

  • Additional local terms proposed:

    • Primary File [content file, e.g. ETDs]

    • Supplemental File [content, e.g. ETDs]

    • PREMIS

    • METS

    • Supplemental Technical Metadata

    • Supplemental Descriptive Metadata

    • Supplemental Source Metadata

    • License/agreement

  • Current systems usage: New/DLP

Rights Metadata

Final Rights metadata specification worksheet

Note: additional Rights metadata is documented in the Descriptive and Preservation sections above.

Data Classification (R10)

Categorization of the types of data that may be found in a repository object. Pending Emory IT Security policy development; we will initially use terms provided

  • Data type: String
  • Sample value: Confidential

Sensitive/Objectionable Material (R11)

Indicates if the materials contain sensitive or objectionable information.

  • Data type: String

  • Sample value: Yes

  • Range of values:

    • Yes

    • No (default)

  • Current systems usage: New/DLP

Copyright Question #1 [Permissions beyond Fair use...] (R7) (Yes/No)

ETD submission screening question

  • Data type: String

  • Sample value: Yes

  • Range of values:

    1. Yes

    2. No

  • Current systems usage: ETD

Copyright Question #2 - Does thesis contain content for which you are no longer own copyright… (R8) (Yes/No)

ETD submission screening question

  • Data type: String

  • Sample value: No

  • Range of values:

    1. Yes

    2. No

  • Current systems usage: ETD

Copyright Question #3 [Patentable Material] (R9) (Yes/No)

ETD submission screening question

  • Data type: String

  • Sample value: Yes

  • Range of values:

    1. Yes

    2. No

  • Current systems usage: ETD

Embargo (ETDR6) (Yes/No)

If select No, record will be “Open Access”

  • Data type: String

  • Sample value: N/A

  • Range of values:

    1. Yes

    2. No

  • Current systems usage: ETD

Administrative Metadata

Note: Additional Administrative was inventoried, but the MIWG made the decision not to normalize it as part of the DLP Working Group scope, because it would be impacted by future implementation decisions which may impact staff workflows. As this metadata is finalized, documentation will be added here.

Visibility (AD32)

Supplied for migration or bulk ingest scenarios, to specify collection, object, or file visibility.

  • Local terms, based on Hyrax visibility options.

  • Values

    • Public

    • Emory Network

    • Private

    • [Additional terms TBD as access controls are finalized]

Viewer Settings (AD33)

Supplied for migration or bulk ingest scenarios, to specify visibility and file viewer access controls.

  • Local terms, to flag IIIF viewer configuration options.

  • Values:

    • Standard

    • Restricted

    • [Additional terms TBD as access controls are finalized]


Other Controlled Values in New ETD system

The following entries document unique controlled values utilized by the new ETD application, These fields were either not present in the larger M-IWG’s inventory and normalization process due to the transition from old to new ETDs occuring in parallel, or because they are more closely tied to application functionality vs. serving as traditional metadata. Metadata for the 2017 ETD Rewrite Project is partially recorded here .

Embargo Level (ETDR7) - Controlled Terms - Local

Level of embargo of Thesis/Dissertation. How much of the record can be seen by users. Title and Author cannot be restricted

  • Data type: String

  • Sample value: Files

  • Range of values:

    1. Files

    2. Files and Table of Contents

    3. Files, Table of Contents and Abstract

  • Current systems usage: ETD 2017

Embargo Length (ETDR8) - Controlled Terms - Local

Options depend on school selected. Each school has different embargo length options

  • Data type: String

  • Sample value: 6 Months

  • Range of values:

    1. 6 Months

    2. 1 Year

    3. 2 Year

    4. 6 Years

  • Current systems usage: ETD 2017

Graduation Dates (ETD)

Graduation Dates - new values added by developers as time goes on

  • Data type: String

  • Sample value: Fall 2018

  • Range of values:

    1. Fall 2017

    2. Spring 2018

    3. Summer 2018

    4. Fall 2020

  • Current systems usage: ETD 2017

Submission Type (ETD)

Submission Type - Type of Degree granted. Umbrella field for Degree

  • Data type: String

  • Sample value: Dissertation

  • Range of values:

    1. Honor’s thesis

    2. Master’s thesis

    3. Dissertation

  • Current systems usage: ETD 2017