Current State Review

Prepared by: Deposit Functional Requirements Group

Last Revised: March 2018

Status: Approved

Summary of Activity

The Deposit Functional Requirements Group’s charge was to provide system functional requirements related to different methods of deposit/ingest into the repository, such as end-user deposit, staff deposit, mediated submission workflows, and batch ingest. The first chartered deliverable instructed the group to:

Document current ingest processes

While the individual process maps themselves comprise the bulk of the deliverable, this document provides a summary of the activity with a summary analysis of common activities, variant/unique activities, and areas for potential process enhancements.

Work Process

The group members identified, documented, and reviewed the following processes in order to provide background on active services, workflows, and participants across the institution (documentation available on Box, with authentication):

  1. Digitized Audio
  2. Course Reserves
  3. Dataverse/Research Data Sets
  4. Digitized Books
  5. Disk Images
  6. ETD Application (Legacy)
  7. ETD Application (New/Hyrax)
  8. The Keep (Ingest Summary for Multiple Workflows)
  9. OpenEmory
  10. Still Images (General)
  11. Still Images (Aeon Requests)
  12. Digitized Video

The set of process maps expanded on prior work conducted in 2016, and includes both Fedora-integrated as well as repository-related processes for which direct repository integration is not yet available. In reviewing the documented process maps, group members clarified both the current process details, and later, discussed gaps/ideas for future state process improvement.

A point of clarification and scoping that arose in the group’s work involves the relationship of broad deposit activities to the specific “ingest” concept as defined by the OAIS model:

the Ingest entity accepts information from producers in the form of SIPs [Submission Information Packages],… generates an AIP [Archival Information Package] from one or more SIPs and extracts Descriptive Information from the AIPs (metadata for search and retrieval, thumbnail images for browsing, etc.). Finally, the Ingest function transfers the newly created AIPs to Archival Storage and the associated Descriptive Information to Data Management.”

While the Deposit FRG’s charter notes as out of scope, “Design of workflows, processes, or tools to construct SIPs for staff deposit scenarios”, process maps documented by team members reference a variety of submission package preparatory activities, which were retained for informing future implementation and workflow context.

In addition to helping segue to future-state discussions, the review of current process maps also identified specific roles and participants which led to the delineation of user profiles for deposit, developed separately as another chartered deliverable.

Current State Processes Review

Common Deposit Process Map Components

Common activities were indicated across the majority of process maps:

  • Digitization, reformatting or capture of digital surrogates
  • Creation, harvesting, or automated extraction of metadata
  • Preparation of additional supplemental files accompanying content
  • Verification and quality control, including formal reviews and approvals
  • Submission/uploading of content and metadata into a repository or repository-like system
  • Assignment of access controls for deposited material, including embargoes

While these activities are core to preparing materials for ingest, the sequence of activities and types of users performing them varies across units and systems. Some workflows contain unique steps and sub-processes, which are noted in the section that follows.

Variable and Unique Deposit Process Map Components

Current state processes initially diverge between “self-deposit”, where Emory content creators submit material through a mediated review process, and “staff deposit” in which material is more directly deposited by library staff or student assistants.

As anticipated by deposit types described in the charter, and further elucidated by workflow discussions, Working Group members identified a number of additional distinct deposit activities:

  • Harvest of metadata and content files from outside systems (e.g. OpenEmory and its relationship to Emory FIRST; ingest of Bags from an external server for the Keep)
  • Multiple methods of descriptive metadata creation for self-deposit and staff deposit are manifested in different metadata editing interfaces (also reflected in the Metadata Implementation Working Groups’ Systems of Record Analysis)
  • Process participant roles (now generalized as user profiles) were noted to perform similar actions but with different labels, and their activities occur in different sequences across the process maps
  • Activity sequences vary: common/core activities in various workflows occur in different order in the current state. Some workflows also incorporate status tracking for deposit prep activities in data sources outside of the repository
  • Certain units deposit objects in a batch and collection-centric context, while others coordinate the deposit of single objects submitted by content creators
  • Self-deposit processes contain unique review and mediation steps vs. staff deposit: the ETDs mediation process further contains distinct sub-processes for each of the participating schools
  • ETD deposits involve dependencies with graduation and academic program approval for publication/release
  • Approvals for “publishing”/visibility vary – sometimes self-deposited objects are immediately made visible, and sometimes they must undergo approval in order to be disseminated
  • Some deposit processes are tightly coupled with systems and data sources external to the Emory Libraries, such as Emory Shared Data, Emory Human Resources data, EmoryFIRST; research data is currently deposited into the Odum Institute-hosted Dataverse; some process maps include dissemination to third party platforms such as HathiTrust

Current State Gaps / Desired Future State Features

As part of the documentation process, process map creators were also encouraged to identify known “pain points”. The following were identified by Working Group members as areas for future improvement:

  • Ability to upload large files
  • Batch upload/creation of objects (including from ingest from a spreadsheet)
  • Progress indicators for upload of large files and other repository deposit steps
  • Ability to create complex object structures (including whole/part relationships)
  • Saving mediated submissions as drafts (identified for new ETD Application)
  • Complexity of “capture” process for creating digital surrogates requires specialized expertise (not delegable to student assistants)
  • Opportunities for greater automation of steps (e.g. reducing mediation by email, work recorded in spreadsheets, ensuring destination Collections are appropriately configured prior to deposit)
  • Dependencies with external systems with independent operations and roadmaps such as EmoryFIRST, SymplecticElements, HathiTrust, Odum Dataverse impact Libraries’ workflows
  • Storing and managing deposit agreements as part of repository package (variable formats and approaches currently exist)
  • Expansion of potential depositor bases (e.g. expanding OpenEmory to include graduate students, enabling submissions from campus staff for University Archives) 

Additional desired future state features were identified through the separate Deposit Requirements activity.

Considerations for Implementation

The following considerations are also noted for future implementation work:

  • DLP implementation teams should coordinate with service owners to adapt existing pre-deposit activities and workflows as needed to work with future repository software: current state repository-related applications may provide more specialized and unit-customized SIP preparation functions than DLP/Hyrax will provide.
  • Service owners should also adjust workflows and practices as needed relative to the new Digital Preservation Policy. The Digital Preservation FRG’s  SIP Decision Document describes the elements required of a SIP:
    • Content deposited to Emory’s preservation repository must include the primary characteristics identified in Emory’s Digital Preservation Policy. Secondary characteristics may be deposited alongside the primary characteristics.
    • The Digital Preservation Policy further elaborates on these elements.
    • The Archival Information Package (AIP) specification also describes required and optional files and datastreams for inclusion in the preservation object.
    • Implementation teams should coordinate with service owners to investigate relationships between self-deposited content across self-deposit workflows (e.g. relationships between ETDs, OpenEmory, and research data created by Emory scholars)
    • Implementation teams should investigate Hyrax roadmap for batch-upload capability for importing from spreadsheets
    • Monitor impacts for current deposit activities relative to cloud hosting/storage (in scope for the Technology Implementation Working Group and Technical Design Phase)