FAIR principles and data
The advancement of technology and the rise of data-driven research have led to unprecedented amounts of data being generated across various scientific fields. While the availability of data offers immense potential for new discoveries, it also introduces challenges in data management, accessibility, and long-term usability. The FAIR data principles were introduced to address these challenges and provide a framework that makes data Findable, Accessible, Interoperable, and Reusable. The goal of FAIR is to ensure that research data is efficiently shared, easily located, and usable in diverse contexts, thus fostering collaboration and accelerating scientific progress.
The FAIR principles were formally established in 2016 through the publication of “The FAIR Guiding Principles for scientific data management and stewardship” by Wilkinson et al. These principles have since gained global recognition and are widely implemented across scientific disciplines, aiming to create a sustainable ecosystem of open science where data is a readily available resource.
In this chapter, you can expect to learn about:
- FAIR data principles that promote effective data sharing and use;
- the role of FAIR-compliant data in scientific research, promoting re-use and collaboration between research organisations;
- practical implementation of the FAIR principles, that describe the methods and tools available to make these principles a reality;
- challenges in implementing FAIR data, which describes the main problems that researchers may face and the methods to address them;
- the future of FAIR data and its implications for science, describing what the FAIR principles will mean for future research and how they will benefit society.
The acronym FAIR stands for Findable, Accessible, Interoperable, and Reusable. Each principle addresses a specific aspect of data management and provides guidelines for effective data sharing and usage:
- Findable: data should be easy to locate for both humans and computers. This requires rich metadata, unique identifiers (such as DOIs), and storage in searchable repositories;
- Accessible: once found, data should be retrievable through well-defined protocols. FAIR does not mean that all data must be openly accessible; rather, it should be available under specified conditions with clear access permissions;
- Interoperable: data must be compatible with other datasets and computational tools, which necessitates the use of standardized formats, shared vocabularies, and frameworks that ensure data integration and comparability;
- Reusable: to maximize the value of data, it should be reusable for future research. Reusability depends on the provision of detailed metadata, usage licenses, and adherence to community standards.
The FAIR principles are meant to work synergistically, ensuring that data is not only available but also usable in a meaningful way, supporting cross-disciplinary research and innovation.
FAIR data is crucial for modern research as it promotes transparency, reproducibility, and long-term data usability. By adhering to FAIR principles, researchers can make data more accessible to a wider audience, facilitating secondary research, and enabling verification of results. FAIR data helps prevent data loss and waste by providing systematic ways to store and manage research data.
Benefits of FAIR data:
- enhanced research efficiency: researchers save time by reusing existing data rather than replicating data collection;
- increased impact: studies have shown that open-access, FAIR-aligned data often receive more citations;
- better collaboration: FAIR data principles make it easier for interdisciplinary teams to collaborate, as standardized data allows seamless integration across domains.
The COVID-19 pandemic exemplified the importance of FAIR data as scientists worldwide shared data and accelerated vaccine development by utilizing open data on the virus, its spread, and vaccine trial results.
The successful implementation of FAIR data principles requires specific tools, platforms, and standards. Key components include data repositories, metadata standards, persistent identifiers, and data management tools.
Key tools and platforms:
- data repositories: platforms like Zenodo, Dryad, and Figshare are popular repositories that support the storage and sharing of FAIR-compliant data;
- data management plans (DMPs): tools such as the DMPTool help researchers outline how they will handle data following FAIR principles, from data collection through to sharing and archiving;
- metadata standards: to support interoperability, metadata standards such as Dublin Core for general data or MIxS for microbiome research are applied. Adopting standardized formats enables data to be meaningfully combined with other datasets;
- persistent Identifiers: PIDs are unique and permanently assigned identifiers that uniquely identify an object, person, organisation, dataset, digital resource or other entity. Their main purpose is to ensure that the associated resource or information is easily found and accessible over time, despite technological or structural changes.
Funding agencies and research institutions often require researchers to create FAIR-aligned data management plans, highlighting the commitment to sustainable data practices.
While the benefits of FAIR data are substantial, the adoption of these principles comes with several challenges:
- cost and infrastructure: implementing FAIR data practices requires investment in technology and infrastructure. Many researchers, especially in developing countries, lack the financial resources to maintain FAIR-compliant repositories or advanced data storage systems;
- cultural barriers: in some fields, researchers may hesitate to share data due to concerns about data misuse or losing control over their work. Shifting to a culture of openness and collaboration necessitates changes in incentives and recognition systems;
- complexity of standards: the diversity of data types and formats across scientific fields makes it challenging to create universal standards. Tailored solutions are often necessary, but this can complicate the goal of interoperability;
- privacy and security: particularly in fields such as healthcare, there are ethical and legal considerations around data sharing. Balancing the openness required by FAIR with privacy concerns is an ongoing issue, requiring robust data governance and anonymization techniques.
Addressing these challenges will require a concerted effort from governments, institutions, and researchers to provide financial, technical, and policy support.
The future of FAIR data is promising as technological advancements, and policy shifts increasingly favor open science and data sharing. Innovations in AI, machine learning, and blockchain technology are likely to enhance data management, making FAIR principles easier to implement. AI tools, for example, can automate data curation and help to detect and correct metadata errors, streamlining the process of maintaining FAIR data.
Policy and funding support. Organizations such as the European Union, through its Horizon 2020 program, and the United States’ National Institutes of Health have introduced mandates requiring data management aligned with FAIR principles. Many funding agencies now require open-access publication and data sharing as conditions for grant eligibility, encouraging researchers worldwide to adopt FAIR practices.
Future prospects:
- AI-powered data management: AI can facilitate the curation of FAIR data by identifying patterns in datasets, suggesting metadata, and automating data cleaning;
- autonomous data sharing: blockchain technology may offer secure ways to share and verify data, ensuring privacy and trust;
- increased collaboration across disciplines: as FAIR data becomes more prevalent, interdisciplinary research will benefit from seamless data integration across domains, driving scientific discoveries.
By adhering to FAIR principles, the scientific community can ensure that data is a shared, reusable resource, fueling innovation and creating a more collaborative, transparent, and productive research environment.
Barker M., Chue Hong N.P., Katz D.S., Lamprecht A.L., Martinez-Ortiz C., Psomopoulos F., Harrow J., Castro L. J., Gruenpeter M., Martinez P.A., Honeyman T. (2022). Introducing the FAIR Principles for research software. Scientific Data, Vol. 9, pp. 1–6. DOI: 10.1038/s41597-022-01710-x
Boeckhout M., Zielhuis G.A., Bredenoord A.L. (2018). The FAIR guiding principles for data stewardship: fair enough? European Journal of Human Genetics, Vol. 26, pp. 931–936. DOI: 10.1038/s41431-018-0160-0
GO FAIR. (2024). FAIR Principles. [online] [accessed 11/23/2024]. Available: https://www.go-fair.org/fair-principles/
Izglītības un zinātnes ministrija. (2021). Latvijas atvērtās zinātnes stratēģija 2021.-2027. gadam. 30 lpp.
Mons B., Schultes E., Liu F., Jacobsen A. (2020). The FAIR principles: First generation implementation choices and challenges. Data Intelligence, Vol. 2, pp. 1–9. DOI: 10.1162/dint_e_00023
OpenAIRE. (2020). How to make your data FAIR. [online] [accessed 11/24/2024]. Available: https://www.openaire.eu/how-to-make-your-data-fair
Wilkinson M.D., Dumontier M., Aalbersberg Ij.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.W., da Silva Santos L.B., Bourne P.E., Bouwman J., Brookes A.J., Clark T., Crosas M., Dillo I., Dumon O., Edmunds S., Evelo C.T., Finkers R., … Mons B. (2016). Comment: The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, Vol. 3, pp. 1–9. DOI: 10.1038/sdata.2016.18