UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction

Tsagiopoulou, Maria; Maniou, Maria Christina; Pechlivanis, Nikolaos; Togkousidis, Anastasis; Kotrová, Michaela; Hutzenlaub, Tobias; Kappas, Ilias; Chatzidimitriou, Anastasia; Psomopoulos, Fotis

Published in

Frontiers Media, Frontiers in Genetics, (12), 2021

DOI: 10.3389/fgene.2021.660366

Tools

Export citation

Search in Google Scholar

UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction

Journal article published in 2021 by Maria Tsagiopoulou

, Maria Christina Maniou, Nikolaos Pechlivanis, Anastasis Togkousidis, Michaela Kotrová, Tobias Hutzenlaub, Ilias Kappas

, Anastasia Chatzidimitriou, Fotis Psomopoulos

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.

Published in

Links

Tools

UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction

Abstract