Use of the Hydra/Sufia repository and Portland Common Data Model for research data description, organization, and access

Content

Use of the Hydra/Sufia repository and Portland Common Data Model for research data description, organization, and access

Metadata

Title

Use of the Hydra/Sufia repository and Portland Common Data Model for research data description, organization, and access

Description

Oregon State University's ScholarsArchive OSU institutional repository contains over 26,000 undergraduate, masters, and doctoral level theses and dissertations published from 1902 to the present. Increasingly, these electronic theses and dissertations have multiple supplementary files associated with them including datasets and software code. In addition, deposit of faculty research articles that build on the research contained in ETDs and supplementary datasets is becoming increasingly common. In our current repository, supplementary data and related content files for ETDs are co-located with the thesis document without regard for representation of the relationships among file types or differentiation in the description of files. The metadata primarily describes the thesis document (PDF) but does not adequately describe the accompanying supplementary files and related documents. This creates a problem for reporting, description, discovery, and reuse of those supplementary files and for contextualizing the research contained in the ETD with other content in the repository.In 2015, Oregon State University Libraries and Press (OSULP) began to migrate the ScholarsArchive OSU institutional repository from DSpace to the Hydra-Sufia platform. The selection of the Hydra-Sufia repository platform, which takes advantage of the Portland Common Data Model (PCDM), provides the library with an opportunity to represent the intellectual and structural relationships among distinct but related files (ETDs, appendices, datasets, presentations, papers, external resources, etc.). This paper describes and demonstrates OSULP's prototype repository architecture that explicitly defines relationships between ETDs and their supplementary files and datasets. We demonstrate the benefits of describing individual files as primary objects in the repository and using PCDM to contextualize the file in relationship with other resources in the repository. We provide concrete examples of how this architectural migration improves the representation of repository content to end-users. Last, we will demonstrate that this data model allows us to improve the discovery, and facilitate the publication, of datasets.\n

Date

2016