Reporting Requirements
Prepared by: Repository Management FRG
Last Revised: March 2018
Reviewed by: DLP Core Team, DLP Steering
Status: Approved
Charter Deliverable Context
As part of its in-scope activities, the Management FRG was charged with
- “Develop[ing] requirements for reporting (e.g. inventory, storage projections, staff activity, throughput: turnaround for ingest/process materials; volume of work)”.
This document summarizes requirements gathering in this area. In considering the scope of “reporting”, the group received requests for information seeking activities ranging from simple searches to more complex queries requiring data calculation and customized, filtered outputs. Reporting needs gathered included some related activities which might be informed by an internal staff search, and some which might be addressed by Google/Web analytics. Traditional characteristics of integrated product reporting include:
- Simple count of items based on criteria
- Growth/change of common repository-wide data over time
- Ability to run a canned report, adjusting with pre-set filters
- Export/download capability
- Integrated user interface display or visual
- Integration with product interface/dashboard
- Potential to be scheduled/run at pre-defined intervals
More custom reporting needs were also surfaced, however, such as the ability to query against multiple criteria, performing calculations on data, and displaying results with grouping/breakdowns, calculations, multiple metadata fields displayed per row. These requests will be assessed in greater detail for implementation, to determine feasibility for direct product integration vs. developer assisted delivery.
Data Collection
The Management FRG produced a reporting requirements gathering template in January, with inputs from Library Technology and Digital Strategies. The questionnaire prompted respondents to identify the underlying question the repository data should inform, audience for the report, anticipated data sources and metadata needed for the output, as well as identifying level of urgency for the reporting need. Additional details such as timespans for information, types of users needing access to the resulting data, and anticipated calculations of data were also requested in order to inform the underlying queries and filtering needed.
Responses were gathered from FRG members’ immediate areas of representation (Rose Library, Scholarly Communications Office, Content Division, Digital Library Program) as well as the Chair of the Digital Collections Steering Committee. The full responses are available in Box.
Analysis of Reporting Requests
The FRG received over 50 reporting requests. Due to the large volume of responses received, and varying levels of urgency for each request, the FRG members agreed to focus its efforts on compiling summary of overall reporting needs as opposed to producing an implementation-level assessment of each response.
Each reporting request received an initial review and appraisal, logged in an inventory of report requests received. Basic summary information was recorded, including:
- Submitting unit
- Report Summary/Title
- Priority (Submitter-supplied)
- Category
Summary - Categories and Urgency of Reporting Needs
Category | Instances | Includes High Priority Request* |
Rights/Embargoes | 18% | Y |
Academic Program Statistics | 14% |
|
Content Analysis | 14% |
|
Workflow/Processing Activity | 13% |
|
Metadata Management* | 11% | Y |
Inventory | 7% | Y |
Individual Staff Activity | 7% |
|
Preservation Health | 5% | Y |
Storage | 4% | Y |
End User Analytics/Usage Analytics | 4% |
|
Dissemination | 4% |
|
* one or more reports in this category indicated as an urgent need (13% of all reports total)
Reporting Requests Submitted: Highest Priority
The following reporting requests were submitted with the highest priority option indicated:
- How many digital objects exist in the repository
- Total storage utilization across all services/copies
- Overview of the collections held by my library/stewardship area (number of collections and items in the repository, the number of items by collection, overall size, and the size by content type)
- Preservation status - fixity checks
- Preservation status - file formats and versions
- List of objects that fulfill multiple metadata criteria/requirements
- Which objects in the repository do not contain all required metadata elements
The following factors impacting technical feasibility of report requests were also noted in the initial appraisal:
- Existing Hyrax base product functionality
- Planned Hyrax enhancements
- Google Analytics capability
- Achievable via internal/staff search or browse interface
- Custom/local development anticipated
- Potential gaps in data sources/metadata
Note: this initial assessment is subject to change, based on deeper implementation reviews by technical team members and changes to Samvera product offerings over the coming year.
Capability | % of Reports | Extent/Note |
Current Hyrax Dashboard | 4% | Addresses request needs as is |
Current Hyrax Dashboard | 41% | Partial match for request |
Planned for Hyrax Dashboard | 43% | Partial match for request |
Current Google Analytics | 20% | Partial match for request |
Anticipated DLP staff search or browse interface | 73% | Partial match for request (implementation TBD) |
Potential data gaps | 48% | (Data confirmations needed) |
Custom development anticipated | 82% | (Additional assessment needed) |
Relation to Staff/Internal Search Needs
In some cases, reporting requests gathered by the Management FRG could be at least partially addressed by using an internal staff search of planned repository metadata (including more extensive administrative, technical, and preservation metadata that is visible to internal staff users only). For example, a count of objects meeting specified criteria in existing metadata may readily be returned, but a detailed output returning a customized list of fields with filters/groupings applied would require custom development. The Management FRG and Metadata IWG will review these report requests to inform additional staff search needs.
Hyrax Reporting Capabilities
As part of its review and assessment, the Management FRG also reviewed Samvera community documentation regarding current and planned Hyrax software offerings for reporting and analytics. The following reporting dashboard features are indicated as available in the current Hyrax version, based on available Samvera community documentation as of January, 2018.
Current Product: Administrative Dashboard(available to all registered users):
- Number of objects deposited
- Number of registered users
- Number of downloads for individual files
- Number of pageviews for individual files
- Visualizations for downloads and page views for each file - these visualizations are viewable (via "Analytics" link) on open access files to anyone.
Current Product: Statisticsavailable only to usersin the administrator role:
- Determining various statistics (number of users, number of files) within date ranges (i.e., number of new users between September 2014 and September 2015);
- Number of total works, collections, and files in system (with breakdown by access controls);
- Number of total users, who they are, number of files they've deposited;
- Top five users (users with most files deposited)
- Top file formats
- New user signups
- Repository growth - past 90 days
- Repository Objects - Status (Published)
- Administrative Set - works and files per Administrative Set
- Collections - # of "items"
- Collections by Visibility
- Collections by Resource Type, Creator, Contributor, Keyword, Subject, Publisher, Type/Collection Type
Planned Enhancements
The Hyrax Analytics Working Grouphas identified additional reporting and dashboard enhancements beyond the current product offering; development is planned for 2018. Areas for enhancements include:
- General dashboard enhancements/additions
- Collection-based reporting scope (all Collections, and/or individual Collections)
- Work/Object-level reporting (repository-wide)
- File-level reporting (repository-wide)
- Users (repository-wide: depositors and visitors)
Additional Requirements and Recommendations for Implementation
Ability to Scope/Limit Reporting Data
The majority of reporting requests included a need to constrain the resulting report data to one particular collection, Library, or business unit. This general need is significant because it will require more custom development to achieve in our future shared repository context vs. our current applications’ structure. In current Samvera product offerings, reporting capabilities are either scoped to report on the entire repository, or may potentially be available at an individual Collection, Object, or File level. Additionally, as noted earlier in this document, some statistics are only available to users who have been granted full administrator capabilities. The following user stories show examples of this need:
- As a Library/Collection Manager, I want to get usage analytics for material in my Library only, so that I can report to my Library staff about how our collections are being utilized
- As a Library/Collection Manager, I want to be able to view user activity for users modifying objects/files in my collections only, so that I can contact a specific user in my team about their work
- As a Library/Collection Manager, I want to run reports that are scoped to my Library or collection hierarchy only, so that I can exclude data that doesn't relate to my material
Additional Recommendations for Implementation:
The Management FRG recommends the following additional actions in preparation for implementation:
- Monitor the Samvera Hydra Analytics Working Group’s (HAWG) development efforts in planned future releases of Hyrax
- Share related use cases to the Samvera Permissions Analysis Working Group (PAWG) to inform permission needs for reporting-related entities and product interface: in some cases, reporting capabilities exist which are only available to administrator-level users
- Work with service owners and DLP developers to confirm full implementation details for report requests submitted, focusing on the most urgent requests first
- Investigate capabilities to perform ad-hoc, complex metadata searches in Blacklight and/or SOLR to support metadata management requirements, which may be too variable for long term reporting; additionally, investigate ability of Blacklight and SOLR to export arbitrary complex queries