Published in

Frontiers Media, Frontiers in Systems Biology, (2), 2022

DOI: 10.3389/fsysb.2022.918792

Links

Tools

Export citation

Search in Google Scholar

Data Availability of Open T-Cell Receptor Repertoire Data, a Systematic Assessment

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Modern data-driven research has the power to promote novel biomedical discoveries through secondary analyses of raw data. Therefore, it is important to ensure data-driven research with great reproducibility and robustness for promoting a precise and accurate secondary analysis of the immunogenomics data. In scientific research, rigorous conduct in designing and conducting experiments is needed, specifically in scientific writing and reporting results. It is also crucial to make raw data available, discoverable, and well described or annotated in order to promote future re-analysis of the data. In order to assess the data availability of published T cell receptor (TCR) repertoire data, we examined 11,918 TCR-Seq samples corresponding to 134 TCR-Seq studies ranging from 2006 to 2022. Among the 134 studies, only 38.1% had publicly available raw TCR-Seq data shared in public repositories. We also found a statistically significant association between the presence of data availability statements and the increase in raw data availability (p = 0.014). Yet, 46.8% of studies with data availability statements failed to share the raw TCR-Seq data. There is a pressing need for the biomedical community to increase awareness of the importance of promoting raw data availability in scientific research and take immediate action to improve its raw data availability enabling cost-effective secondary analysis of existing immunogenomics data by the larger scientific community.