Metadata/Data Entry Functional Requirements

Prepared by: Metadata Implementation Working Group (M-IWG)

Status: Final Draft

Date: Feb 2018

Reviewed by: MIWG

The following documentation provides additional/misc. requirements identified by the DLP Metadata Implementation Working Group (M-IWG) for data entry needs for specific metadata units, in cases where a Samvera application is providing an editor to create or update metadata. Note: M-IWG also recommends that the implementation team consult with M-IWG members in one or more dedicated metadata development sprints to adjust metadata specifications as needed for implementation.



Names

As a general best practice, M-IWG recommends that names of people, organizations, and geographic places be controlled using either a local or external authority, as opposed to allowing users to enter these as completely free-text. Standardizing name entries as part of the data entry process assists with browse and display capabilities, as well as enabling more precise searches.

Date Entries

In general, dates identified by the M-IWG are machine readable date/time elements that enable date-related functionality such as sorting, and should not contain free-text entries such as “Circa 1921” or “Approximately 1200 BCE” unless these kinds of values can be encoded in a machine readable scheme.

Date entries should always include at least the year , but also month and/or day are if known. MIWG recommends using a standard such as W3C-DTF or ISO8601 that enables variable granularity of date entries (e.g. 1964, 1964-01, 1964-01-01).

The Extended Date Time Format (EDTF) developed by the Library of Congress accommodates a variety of Library use cases for complex dates, and will be included in the next ISO 8601 revision as an extension.

Major Use Cases Needed for Date Entries:

  1. Known dates:

    1. Single date

    2. Date range - start and end

  2. Approximate dates:

    1. Single date

    2. Date range - start and end

  3. Unknown date (for a required entry such as Date Created)

    1. Ability to complete required date fields where date is completely unknown

Per the Core Metadata standard, at least one type of date should be recorded (typically Date Created or Date Published). For Dates that are required, if a date cannot be determined, the UI should provide an option to override the date entry. MIWG recommends providing the user with an option to indicate the date cannot be determined, but not storing that indication as an actual date/time value unless an appropriate encoding option is enabled which permits this. The EDTF standard noted previously contains ways of recording this information, such as uuuu for a completely unknown year. Gems have been developed supporting ETDF (edtf-ruby edtf-humanize).

Dates (Descriptive, Administrative, Rights)

Date Created (D13) - Date

  1. Per the Core Metadata standard, Date Created or Date Issued/Published must be populated as applicable.

  2. Requirements:

    1. Known date (single)

    2. Approximate date (range)

    3. Unknown date

  3. Current usage:

    1. DAMS (free-text but policy dictates ISO 8601 extended format. Single date or date ranges. Approximates),

    2. DB, Keep (ISO 8601 extended format)

    3. DV (free-text)

Date Issued/Published (D14) - Date

  1. Per the Core Metadata standard, Date Created or Date Issued/Published must be populated as applicable. This unit is required if applicable, and should be populated whenever known.

  2. Requirements

    1. Known date (single)

  3. Current usage:

    1. DAMS (free-text)

    2. DigitizedBooks, ETDs (w3cdtf?)

    3. OpenEmory (w3cdtf)

    4. Keep (ISO 8601 extended format)

    5. Dataverse (free-text)

Conference Dates (D15) - Date

  1. Requirements

    1. Known date (range)

  2. Current usage:

    1. OpenEmory

Data Collection (Start and End Dates) (D16) - Date

  1. Requirements

    1. Known date (range)

    2. Repeatable?

  2. Current usage:

    1. Dataverse (free-text date ranges)

Copyright Date (D59) - Date (New)

  1. Requirements

    1. Known date (single)

  2. Current usage:

    1. Planned for future OE workflows (currently mixed with other information in a text string)

    2. Exists in MARC records

Values Entered as Strings vs. URIs

M-IWG has indicated for each metadata unit whether or not the value should be a string (free-text), a URI, or is an either/or. This usage may also be impacted by the use of controlled terms. The following scenarios apply:

Authority URIs

When utilizing URIs from a formal authority (such as the Library of Congress) M-IWG has provided recommended vocabularies to be configured (see the Controlled Vocabulary Usage documentation). The labels for authority URIs should be displayed to end users (vs. the URI itself).

Other URIs/URLs

In other cases, basic URLs are entered for which a linked data source is not configured and a display label will be pulled either from the field label (e.g. Publisher Version), or from a sub-type selection available in the editor: see the Related Material and Standard Identifier sections of this document for other use cases involving URIs.

Local Term Entries - Auto Suggest

For some metadata units, a formal controlled vocabulary may not exist (either local or external). We recommend configuring these fields to auto-suggest already entered values.

Examples:

  • Keywords
  • Names entered for 
    • Role - Creator (Emory person, if a local authority/lookup is not established)
    • Role - Contributor (Emory person, if a local authority/lookup is not established)
    • Role - Grant/Funding Agency
    • Other Roles identified in the Descriptive Metadata Specification
    • Copyright Holder
  • Geographic Unit (if not established as a local vocabulary)
  • Titles for journals/parent works (if not harvested from an external source)
  • Conference/Meeting Name (if not harvested from an external source)

Related Material

The Descriptive Metadata profile contains metadata units for manually specifying related material (external to the repository) for an object. This can be stored either as free text note, or as a URI for web-based material. For the URI mode, the user should be presented with a subset of relationship types. The editor should capture the URI as well as the relationship type, so that the application can display a label for the type of related link presented to an end user.

Related Material (URI) (D23)

  1. Requirements: The following relationship types should be configured:

    1. Finding Aid

    2. Related Publication

    3. Related Dataset

  2. Current usage:

    1. Dataverse

Standard Identifiers

The Descriptive Metadata profile contains metadata units for specifying additional identifiers for a repository asset, which are issued/maintained by an external organization. For the Standard Identifiers metadata unit, the user interface should provide the user with the ability to select from a set of options to specify which standard is being used. This enables the application to display an appropriate label for the type of identifier listed.

Standard Identifier (D18)

  1. Requirements: the following identifier sub-types should be selectable by the user when entering data:

    1. Handle

    2. ISBN

    3. ISSN

    4. PubMed Central ID

    5. DOI

  2. Current Usage

    1. Digitized Books

    2. OpenEmory

    3. Dataverse

Citation Builders

If implementing auto-citation builders for end users, the following fields are recommended:

Research Data Sets

  • Author/Creator
  • Year (Issued) 
  • Title
  • Persistent ID
  • Repository Name
  • Content Version

Archival Material

  • Persistent Identifier
  • Title
  • Collection
  • Holding Repository
  • Institution

Publications/Presentations

Additional citation-supporting metadata is recommended for indexing by Google Scholar and is recommended for publication-oriented materials:

  • Title of parent work
  • Volume
  • Issue
  • Conference Dates
  • Page start/end
  • Name of Conference where presented