Published in

Oxford University Press, Bioinformatics, 7(33), p. 964-970, 2016

DOI: 10.1093/bioinformatics/btw748

Links

Tools

Export citation

Search in Google Scholar

Improved VCF normalization for accurate VCF comparison

Journal article published in 2016 by Arash Bayat, Bruno Gaëta, Aleksandar Ignjatovic, Sri Parameswaran ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Abstract Motivation The Variant Call Format (VCF) is widely used to store data about genetic variation. Variant calling workflows detect potential variants in large numbers of short sequence reads generated by DNA sequencing and report them in VCF format. To evaluate the accuracy of variant callers, it is critical to correctly compare their output against a reference VCF file containing a gold standard set of variants. However, comparing VCF files is a complicated task as an individual genomic variant can be represented in several different ways and is therefore not necessarily reported in a unique way by different software. Results We introduce a VCF normalization method called Best Alignment Normalisation (BAN) that results in more accurate VCF file comparison. BAN applies all the variations in a VCF file to the reference genome to create a sample genome, and then recalls the variants by aligning this sample genome back with the reference genome. Since the purpose of BAN is to get an accurate result at the time of VCF comparison, we define a better normalization method as the one resulting in less disagreement between the outputs of different VCF comparators. Availability and Implementation The BAN Linux bash script along with required software are publicly available on https://sites.google.com/site/banadf16 Supplementary information Supplementary data are available at Bioinformatics online.