Published in

F1000Research, F1000Research, (12), p. 327, 2023

DOI: 10.12688/f1000research.129581.1

Links

Tools

Export citation

Search in Google Scholar

AmpSeqR: an R package for amplicon deep sequencing data analysis

Journal article published in 2023 by Jiru Han, Jacob E. Munro ORCID, Melanie Bahlo ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Red circle
Preprint: archiving forbidden
Red circle
Postprint: archiving forbidden
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Amplicon sequencing (AmpSeq) is a methodology that targets specific genomic regions of interest for polymerase chain reaction (PCR) amplification so that they can be sequenced to a high depth of coverage. Amplicons are typically chosen to be highly polymorphic, usually with several highly informative, high frequency single nucleotide polymorphisms (SNPs) segregating in an amplicon of 100–200 base pair (bp). This allows high sensitivity detection and quantification of the frequency of each sequence within each sample making it suitable for applications such as low frequency somatic mosaicism detection or minor clone detection in mixed samples. AmpSeq is being increasingly applied to both biological and medical studies, in applications such as cancer, infectious diseases and brain mosaicism studies. Current bioinformatics pipelines for AmpSeq data processing lack downstream analysis, have difficulty distinguishing between true sequences and PCR sequencing errors and artifacts, and often require bioinformatic expertise. We present a new R package: AmpSeqR, designed for the processing of deep short-read amplicon sequencing data, with a focus on infectious diseases. The pipeline integrates several existing R packages combining them with newly developed functions to perform optimal filtering of reads to remove noise and improve the accuracy of the detected sequences data, permitting detection of very low frequency clones in mixed samples. The package provides useful functions including data pre-processing, amplicon sequence variants (ASVs) estimation, data post-processing, data visualization, and automatically generates a comprehensive Rmarkdown report that contains all essential results facilitating easy inclusion into reports and publications. AmpSeqR is publicly available at https://github.com/bahlolab/AmpSeqR.