Published in

Wiley, Molecular Ecology Resources, 4(23), p. 742-755, 2022

DOI: 10.1111/1755-0998.13746

Links

Tools

Export citation

Search in Google Scholar

Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses

Journal article published in 2022 by François Keck ORCID, Marjorie Couton ORCID, Florian Altermatt ORCID
This paper was not found in any repository, but could be made available legally by the author.
This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

AbstractAssessment of biodiversity using metabarcoding data, such as from bulk or environmental DNA sampling, is becoming increasingly relevant in ecology, biodiversity sciences and monitoring. Thereby, the taxonomic identification of species from their DNA sequences relies strongly on reference databases that link genetic sequences to taxonomic names. These databases vary in completeness and availability, depending on the taxonomic group studied and the genetic region targeted. The incompleteness of reference databases is an important argument to explain the nondetection by metabarcoding of species supposedly present. However, there exist further and generally overlooked problems with reference databases that can lead to false or inaccurate inferences of taxonomic assignment. Here, we synthesize all possible problems inherent to reference databases. In particular, we identify a complete, mutually nonexclusive list of seven classes of challenges when it comes to selecting, developing and using a reference database for taxonomic assignment. These are: (i) mislabelling, (ii) sequencing errors, (iii) sequence conflict, (iv) taxonomic conflict, (v) low taxonomic resolution, (vi) missing taxa and (vii) missing intraspecific variants. For each problem identified, we provide a description of possible consequences on the taxonomic assignment process. We illustrate the respective problem with examples taken from the literature or obtained by quantitative analyses of public databases, such as GenBank or BOLD. Finally, we discuss possible solutions to the identified problems and how to navigate them. Only by raising users' awareness of the limitations of metabarcoding data and DNA reference databases will adequate interpretations of these data be achieved.