Long-read amplicon denoising

Kumar, Venkatesh; Vollbrecht, Thomas; Chernyshev, Mark; Mohan, Sanjay; Hanst, Brian; Bavafa, Nicholas; Lorenzo, Antonia; Kumar, Nikesh; Ketteringham, Robert; Eren, Kemal; Golden, Michael; Oliveira, Michelli F.; Murrell, Ben

Published in

Oxford University Press, Nucleic Acids Research, 18(47), p. e104-e104, 2019

DOI: 10.1093/nar/gkz657

Tools

Export citation

Search in Google Scholar

Long-read amplicon denoising

Journal article published in 2019 by Venkatesh Kumar, Thomas Vollbrecht

, Mark Chernyshev, Sanjay Mohan, Brian Hanst

, Nicholas Bavafa, Antonia Lorenzo, Nikesh Kumar, Robert Ketteringham, Kemal Eren, Michael Golden, Michelli F. Oliveira, Ben Murrell

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Abstract Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available.

Published in

Links

Tools

Long-read amplicon denoising

Abstract