A general approach to single-nucleotide polymorphism discovery

Marth, Gabor T.; Korf, Ian; Yandell, Mark D.; Yeh, Raymond T.; Gu, Zhijie; Zakeri, Hamideh; Stitziel, Nathan O.; Hillier, LaDeana; Kwok, Pui-Yan; Gish, Warren R.

Published in

Nature Research, Nature Genetics, 4(23), p. 452-456, 1999

DOI: 10.1038/70570

Tools

Export citation

Search in Google Scholar

A general approach to single-nucleotide polymorphism discovery

Journal article published in 1999 by Gabor T. Marth, Ian Korf, Mark D. Yandell, Raymond T. Yeh, Zhijie Gu, Hamideh Zakeri, Nathan O. Stitziel

, LaDeana Hillier, Pui-Yan Kwok, Warren R. Gish

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence as a template on which to layer often unmapped, fragmentary sequence data and to use base quality values to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.

Published in

Links

Tools

A general approach to single-nucleotide polymorphism discovery

Abstract