F1000Research, Wellcome Open Research, (7), p. 187, 2022
DOI: 10.12688/wellcomeopenres.17605.1
Full text: Download
The vision of the Earth BioGenome Project1 is to complete reference genomes for all of the planet’s ~2M described eukaryotic species in the coming decade. To contribute to this global endeavour, the Darwin Tree of Life Project (DToL2) was launched in 2019 with the aim of generating complete genomes for the ~70k described eukaryotic species that can be found in Britain and Ireland. One of the early tasks of the DToL project was to determine, define, and standardise the important metadata that must accompany every sample contributing to this ambitious project. This ensures high-quality contextual information is available for the associated data, enabling a richer set of information upon which to search and filter datasets as well as enabling interoperability between datasets used for downstream analysis. Here we describe some of the key factors we considered in the process of determining, defining, and documenting the metadata required for DToL project samples. The manifest and Standard Operating Procedure that are referred to throughout this paper are likely to be useful for other projects, and we encourage re-use while maintaining the standards and rules set out here.