Scaling‐up RADseq methods for large datasets of non‐invasive samples: Lessons for library construction and data preprocessing

Arantes, Larissa S.; Caccavo, Jilda A.; Sullivan, James K.; Sparmann, Sarah; Mbedi, Susan; Höner, Oliver P.; Mazzoni, Camila J.

Published in

Wiley, Molecular Ecology Resources, 2023

DOI: 10.1111/1755-0998.13859

Tools

Export citation

Search in Google Scholar

Scaling‐up RADseq methods for large datasets of non‐invasive samples: Lessons for library construction and data preprocessing

Journal article published in 2023 by Larissa S. Arantes

, Jilda A. Caccavo

, James K. Sullivan

, Sarah Sparmann

, Susan Mbedi

, Oliver P. Höner

, Camila J. Mazzoni

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

AbstractGenetic non‐invasive sampling (gNIS) is a critical tool for population genetics studies, supporting conservation efforts while imposing minimal impacts on wildlife. However, gNIS often presents variable levels of DNA degradation and non‐endogenous contamination, which can incur considerable processing costs. Furthermore, the use of restriction‐site‐associated DNA sequencing methods (RADseq) for assessing thousands of genetic markers introduces the challenge of obtaining large sets of shared loci with similar coverage across multiple individuals. Here, we present an approach to handling large‐scale gNIS‐based datasets using data from the spotted hyena population inhabiting the Ngorongoro Crater in Tanzania. We generated 3RADseq data for more than a thousand individuals, mostly from faecal mucus samples collected non‐invasively and varying in DNA degradation and contamination level. Using small‐scale sequencing, we screened samples for endogenous DNA content, removed highly contaminated samples, confirmed overlap fragment length between libraries, and balanced individual representation in a sequencing pool. We evaluated the impact of (1) DNA degradation and contamination of non‐invasive samples, (2) PCR duplicates and (3) different SNP filters on genotype accuracy based on Mendelian error estimated for parent–offspring trio datasets. Our results showed that when balanced for sequencing depth, contaminated samples presented similar genotype error rates to those of non‐contaminated samples. We also showed that PCR duplicates and different SNP filters impact genotype accuracy. In summary, we showed the potential of using gNIS for large‐scale genetic monitoring based on SNPs and demonstrated how to improve control over library preparation by using a weighted re‐pooling strategy that considers the endogenous DNA content.

Published in

Links

Tools

Scaling‐up RADseq methods for large datasets of non‐invasive samples: Lessons for library construction and data preprocessing

Abstract