Glossary of Terminology

This document in a work in progress. Project terms and definitions established by the project team are listed here. Note: some definitions are for internal use only and may differ from definitions established in the broader Digital Library community.


Bit-Level Preservation

Provided by: Digital Preservation FRG

Bit-level preservation is a baseline level of preservation activity that ensures objects, once ingested, can be maintained in a valid and uncorrupted state; this is done through the use of Bit Stream Copying and Fixity Checking. Bit-level preservation also attempts to provide representation information for the digital object through documentation of the object's file formats.

Bit Stream Copying

Provided by: Digital Preservation FRG

Bit Stream Copying, more commonly known as “backing up your data,” is used to ensure the redundancy of digital content. 

Controlled Vocabulary

Provided by: Metadata IWG

Source: Adapted from Getty Institute and Wikipedia definitions

A formally defined and maintained set of terms and phrases supporting information retrieval and knowledge organization systems, such as authorities, taxonomies, controlled lists; these terms may also be expressed as in RDF as a set of predicates, as used in an ontology.

AKA: "Authority", "Taxonomy", "Controlled List"

Dark Archive

Provided by: DLP Steering

Source: adapted from various definitions (SAA, DPN, Portico)

A dark archive is a repository of material that cannot be or is not intended to be accessed by end-users, has no user interface, and is only available to a select few curatorial staff. By contrast, the DLP repository will be a "grey" archive in which some material may be restricted at the time of ingest, with the intent of making it accessible in the future.

Descriptive Metadata

Provided by: Metadata IWG

Source: adapted from metadata.emory.edu  

Descriptive metadata describes content for search and discovery contexts -- helps connect users to resources, and provides important context about a resource once it is discovered. This type of metadata drives the ability to search, browse, sort, and filter information. 

Digital Object

Provided by: Repository Architecture IWG

Source: http://pcdm.org/models#Object -- 

An Object is an intellectual entity, sometimes called a "work", "digital object", etc. Objects have descriptive metadata, access metadata, may contain files and other Objects as member "parts" or "components". Each level of a work is therefore represented by an Object instance, and is capable of standing on its own, being linked to from Collections and other Objects. 

Digital Preservation

Provided by: Digital Preservation FRG

Source: ALA ALCTS Short Definition Digital preservation combines policies, strategies and actions that ensure access to digital content over time.

Other contributions: https://www.loc.gov/preservation/digital/ and http://ndsa.org/activities/levels-of-digital-preservation/

Digital preservation combines policies, strategies and actions that provide long-term access to digital content. In particular it focuses on storage and geographic location, fixity and data integrity, information security, metadata, file formats, digital content packaging and ingest. To ensure long-term integrity, digital preservation requires ongoing monitoring, curation, and reporting of the content being preserved.

Emory SIP (Submission Information Package)

Provided by:

Source: OAIS definition - An Information Package that is delivered by the Producer to the OAIS [Open Archival Information System] for use in the construction or update of one or more AIPs and/or the associated Descriptive Information.

Emory context:

  • TBD

Emory AIP (Archival Information Package)

Provided by: 

Source: OAIS definition - An Information Package, consisting of the Content Information and the associated Preservation Description Information (PDI), which is preserved in an OAIS.

Emory context:

  • TBD

Emory DIP (Dissemination Information Package)

Provided by: Digital Preservation

Source: OAIS definition - The information package, derived from one or more AIPs, received by the consumer in response to a request to the OAIS [Open Archival Information System]

Emory context:

  • At Emory, we may produce two kinds of DIPS (internal and external).
  • Internal DIPs are selected views of an AIP targeted for the consumer, presented by a Hydra application's front-end. For end-users, this is typically a delivery of access copies and selected metadata fields.
  • External DIPs are packages intended to be exported to other systems for dissemination purposes. This may include packages for HathiTrust, DPLA, Digital Library of Georgia, Primo, etc.  Information packages constructed for preservation services like DPN, APTrust, etc. are not included in the definition of a DIP.

Fixity Checking

Provided by: Digital Preservation FRG

Checking the fixity of digital content ensures that the content remains unchanged or “fixed” over a period of time.  Usually fixity checking is accomplished through by ensuring that hashing algorithms (MD5Sum, SHA256, etc.) are performed on the content and rechecked to ensure they have not changed.

Format Migration

Provided by: Digital Preservation FRG

Source: Preserving Digital Information: Report of the Task Force on Archiving of Digital Information

Format migration is a set of organized tasks designed to achieve the periodic transfer of digital content from one hardware/software configuration to another. Its ultimate purpose is to preserve the integrity of digital content and to retain the ability to retrieve, display, and otherwise use it in the face of constantly changing technology.

Hyrax

Provided by: DLP Staff

Source: Hyrax website

Hyrax is an application built on the Samvera framework, providing a web-based user interface around common repository features and social features. Hyrax offers customizable self-deposit, proxy deposit, and mediated deposit workflows. 

Metadata Application Profile

Provided by: Metadata IWG

Source: Wikipedia

The set of semantic units (descriptive, rights, technical, preservation, etc.) to be defined for the Hydra repository's applications and digital objects that we will manage in Fedora.

AKA:  "Element Set", "Emory Core Metadata", DPLA Metadata Application Profile

Ontology

Provided by: Metadata IWG

Source: Internal definition

A formal specification that defines our repository entities, their properties, and their relationships to one another.

AKA: Data model, concept map, domain specification, OWL, RDF Schema

Preservation Effort

Provided by: Digital Preservation FRG

The amount of activity, human or computer, that is utilized when engaging in the implementation of digital preservation strategies.


Preservation Metadata

Provided by: Metadata IWG

Source: Internal definition (DRAFT)

Information detailing preservation activities applied to a digital object in the repository such as system events, human-initiated workflows, or an audit trail of modifications. Additional information regarding the composition and encoding of digital surrogates specifically is recorded through Technical/Characterization metadata.

Primary Content Types

Provided by: Repository Architecture IWG

Source: Internal definition

A list of major categories, based on the IETF's Internet Media Types categorization (mimetype families), to describe Emory repository content in a consistent way for deliverables in multiple Working Groups

Rights Metadata

Provided by: Metadata IWG

Source: Internal definition

For the DLP Repository, rights metadata falls into three categories: Rights Status, Rights Determinations, and Rights for Preservation Workflows and Events. Rights status metadata indicates the current state of the object’s rights, such as its copyright status, license or agreement status, and additional information about its access or appropriate use. This information is stored directly on the object and informs the selection of appropriate system access controls and visibility. Rights Status metadata has largely been previously identified in the Descriptive Metadata specification and is mostly end user-facing. Rights Determinations information may be stored as part of Descriptive or Administrative metadata and supports library staff activities for researching and providing rights determinations, which then inform repository users’ rights to modify and/or access repository material. Additional rights information may be recorded with a human-initiated preservation workflow, such as those identified by the Digital Preservation Functional Requirements Group. In these cases, repository staff users may need to record additional rights information to explain the context of a particular preservation activity that impacts access to material

Samvera (formerly Hydra)

Provided by: DLP Staff

Source: Samvera website

Samvera is an open-source repository solution built collaboratively to address a broad range of repository needs. Rather than being one-size-fits-all, Samvera leverages an ecosystem of components that lets institutions assemble and deploy robust and durable repository applications that are tailored to their users' needs and workflows. Samvera's platform is built atop the Fedora repository for digital asset management and utilizes Apache Solr for searching & indexing.

Semantic Units

Provided by: Metadata IWG

Source: Internal definition from Metadata IWG, utilizing concepts from the PREMIS 3.0 data Dictionary

A container (such as an element, property, or field) that provides a metadata value, so that it can be assessed independent of its encoding or implementation environment.

Emory context:

  • This definition enables the group to work on mappings across multiple conventions (XML, relational databases, RDF).

Technical Metadata

Provided by: Metadata IWG

Source: Internal definition

Technical, aka characterization, metadata refers specifically to a digital asset and provides information about its file composition, such as mimetype, filesize, creating software, compression, etc. 

Workflow/Administrative Metadata

Provided by: Metadata IWG

Source: internal definition, leveraging aspects of DCC definition and SAA definition

Information about the workflow and processing of digital objects in the repository, primarily to support staff users' activities