Skip to main content

Data deposition

Dep-ENGData deposition is the process of placing research data into a secure, organized repository, ensuring it is preserved, accessible, and reusable over time. In Research Data Management (RDM), data deposition serves as a cornerstone for maintaining research transparency, reproducibility, and longevity. Data deposition is closely tied to the principles of open science and FAIR (Findable, Accessible, Interoperable, Reusable) data, as it facilitates data sharing and accessibility among researchers and the public.

As research becomes increasingly data-driven and collaborative, effective data deposition practices are essential to managing data throughout its lifecycle. Data repositories – whether institutional, discipline-specific, or open-access – provide structured environments that support data storage, organization, and retrieval, safeguarding research outputs for future use and discovery.

In this chapter you will learn about:

  • the reasons for depositing, including increasing research openness, improving data accessibility, fostering collaboration and data sharing;
  • the steps in the data deposit process, including data preparation, choosing an appropriate repository, creating metadata, defining licensing and access rights, etc.;
  • the challenges of data deposit, which are affected by both data privacy concerns and resource constraints;
  • good practices in data deposition – early planning, use of standardised metadata schemes, selection of an appropriate repository, etc.;
  • future trends in data deposit, driven by technological advances and changes in data sharing norms.

Data deposition is crucial in modern research due to its numerous benefits for researchers, institutions, and society at large. By securely depositing data in a repository, researchers ensure that their work is accessible, verifiable, and reusable.

Key reasons why data deposition is essential:

  • data preservation: repositories provide long-term storage, protecting data from loss due to hardware failures, accidental deletions, or other risks;
  • increased research transparency: depositing data makes research results more transparent, allowing others to verify findings, replicate studies, and validate results;
  • compliance with funding and publishing requirements: many funding bodies and journals now require data deposition as a condition for grant approval or publication, promoting responsible data management;
  • enhanced data discoverability: deposited data is indexed and organized within repositories, making it easier to search, discover, and access;
  • facilitation of collaboration and data sharing: data in repositories is accessible to a wider audience, fostering collaboration across disciplines and regions.

By supporting data deposition, research institutions and funders contribute to a culture of data-sharing and openness, accelerating scientific progress and enabling more efficient use of research resources.

The data deposition process involves several steps that ensure data is adequately prepared, organized, and submitted to a repository. This process may vary slightly depending on the type of data, discipline, or repository requirements, but generally includes:

  • data preparation: researchers should ensure that data is clean, organized, and accompanied by comprehensive metadata, which includes details such as data source, collection methods, and contextual information;
  • choosing an appropriate repository: researchers should select a repository that aligns with their field or institutional guidelines. Some widely used repositories include Zenodo (general research data), Dryad (biological and environmental sciences), and ICPSR (social science data);
  • metadata creation: adding descriptive metadata is crucial for discoverability and reuse. Metadata should follow standards relevant to the data type or field, such as Dublin Core for general data or MIxS for microbiome data;
  • licensing and access rights: researchers should determine the appropriate licensing for their data, deciding on open or restricted access based on the data’s sensitivity or proprietary nature;
  • submission and verification: after uploading data, researchers may need to verify the deposition, ensuring all information is correct and complete. Some repositories offer a peer review or curation process to ensure data quality;
  • assigning persistent identifiers: once deposited, data is often assigned a Digital Object Identifier (DOI), a unique and permanent identifier that makes data easy to cite and track.

Each step in the deposition process is critical to making data secure, accessible, and usable, supporting the goals of RDM and data-sharing principles.

Data deposition, while valuable, presents several challenges that researchers and institutions must address to ensure effective data management:

  • data sensitivity and privacy concerns: depositing sensitive data, such as health or personal information, requires strict privacy measures. Anonymizing or de-identifying data can be complex and time-consuming;
  • resource constraints: the deposition process, including data cleaning, metadata creation, and submission, can be resource-intensive, requiring time, expertise, and sometimes funding;
  • repository selection: choosing an appropriate repository can be challenging due to the variety of options and requirements. Some researchers may also be uncertain whether to use an institutional, discipline-specific, or general-purpose repository;
  • licensing and access decisions: determining appropriate licensing for data can be confusing, especially for researchers unfamiliar with open-access requirements, Creative Commons licenses, or proprietary data considerations;
  • quality control and standardization: ensuring data quality, consistency, and compliance with metadata standards can be challenging, especially for large datasets with complex structures;
  • long-term maintenance and updates: over time, deposited data may require updates or corrections. Ensuring data remains accurate and usable can be challenging, particularly if resources for maintenance are limited.

These challenges underscore the need for institutional support, training, and adequate resources to help researchers navigate the data deposition process effectively.

Following best practices can make data deposition more efficient and effective, helping researchers create well-organized, accessible datasets that meet repository standards and user naeeds.

Key best practices for data deposition:

  • plan early: researchers should plan for data deposition early in the project, considering data management, cleaning, and metadata requirements. Early planning helps streamline the deposition process later;
  • use standardized metadata schemas: adopting metadata standards such as DataCite or Dublin Core ensures that data is compatible with repository requirements and easily discoverable by others;
  • choose the right repository: selecting a repository that fits the data’s discipline or type increases visibility and accessibility. Researchers can consult institutional guidelines or use registry platforms like re3data.org for guidance;
  • establish clear licensing and access rights: deciding on licensing and access levels ensures that data can be shared responsibly. Researchers should review open-access policies and consult their institution or repository for guidance;
  • collaborate with data management experts: consulting data librarians, IT staff, or RDM professionals can help researchers meet repository requirements, develop high-quality metadata, and select appropriate storage formats;
  • document thoroughly: comprehensive documentation, including methodology, file organization, and data formats, aids in making data understandable and reusable, even for users unfamiliar with the original research.

By following these best practices, researchers can ensure that their deposited data is well-prepared, accessible, and valuable for long-term use, benefiting both the research community and society.

Datu deponēšanas joma turpina attīstīties līdz ar tehnoloģiju progresu un datu koplietošanas normu izmaiņām. Datu deponēšanas nākotnes tendences, visticamāk, iekļaus automatizāciju, uzlabotu savietojamību un paplašinātas datu koplietošanas iespējas.

Jaunās tendences un nākotnes virzieni:

  • automatizēta datu un metadatu ģenerēšana: mākslīgais intelekts un mašīnmācīšanās arvien vairāk tiek izmantoti, lai automatizētu datu tīrīšanu, metadatu ģenerēšanu un failu organizēšanu, samazinot pētnieku slodzi un uzlabojot precizitāti.
  • uzlabota repozitoriju savietojamība: nākotnē repozitoriji varētu integrēties savā starpā, ļaujot vieglāk kombinēt, meklēt un piekļūt datiem no dažādiem avotiem, veicinot starpdisciplināru pētniecību.
  • blokķēžu tehnoloģija datu integritātei: blokķēžu tehnoloģija varētu nodrošināt  pārbaudāmus datu deponēšanas ierakstus, uzlabojot atklātību un izsekojot datu izmaiņām laika gaitā;
  • reāllaika datu deponēšana un atjauninājumi: tā kā datu vākšana un analīze kļūst ātrāka, pētnieki varētu deponēt datus reāllaikā vai biežāk atjaunināt deponētos datus, uzlabojot datu pieejamību un precizitāti.
  • integrācija ar atvērtās zinātnes iniciatīvām: datu deponēšanas prakse visticamāk vairāk saskaņosies ar atvērtās zinātnes politikām, kas uzsver publisku datu pieejamību un reproducējamību;
  • lietotājam draudzīgi repozitoriji un vizualizācijas rīki: repozitoriji, visticamāk, piedāvās lietotājam draudzīgākus interfeisus un datu vizualizācijas iespējas, padarot datu saprašanu, izpēti un atkārtotu izmantošanu vienkāršāku pētniekiem un sabiedrībai.

Šīs tendences norāda uz nākotni, kur datu deponēšana būs vairāk automatizēta, integrēta un pieejama, veicinot sadarbīgāku un atklātāku pētniecības vidi.

Médiathèques HES-SO Valais-Wallais. (2024). Choosing a data repository. [online] [accessed 11/24/2024]. Available: https://hevs-ch.libguides.com/RDM/Choosing-repository

Queen’s University Belfast. (2024). Research Data Management: Finishing a research project. [online] [accessed 11/24/2024]. Available: https://libguides.qub.ac.uk/ResearchDataManagement/FinishingAProject

RCSI Library. (2024). Data sharing and long-term preservation. [online] [accessed 11/24/2024]. Available: https://libguides.rcsi.ie/rdm/preservation

The Chinese University of Hong Kong Library. (2024). Research Data Management: Data Deposit. [online] [accessed 11/24/2024]. Available: https://libguides.lib.cuhk.edu.hk/rdm/datadeposit

UCLA Library. (2024). Data Deposit and Sharing. [online] [accessed 11/24/2024]. Available: https://guides.library.ucla.edu/c.php?g=180539&p=1189101