Distribution of words with a predefined range of mismatches to a DNA probe in bacterial genomes

Michael Melko, O.; Mushegian, Arcady R.

Published in

Oxford University Press (OUP), Bioinformatics, 1(20), p. 67-74

DOI: 10.1093/bioinformatics/btg374

Tools

Export citation

Search in Google Scholar

Distribution of words with a predefined range of mismatches to a DNA probe in bacterial genomes

Journal article published in 2003 by O. Michael Melko, Arcady R. Mushegian

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

MOTIVATION: Hybridization of oligonucleotides with longer nucleotide sequences is an essential step in nucleic acid biosynthesis in vitro and in vivo, in oligonucleotide-based diagnostics, and in therapeutic applications of oligonucleotides. A major factor determining sensitivity and selectivity of hybridization is the number of base pair mismatches that occur in an ungapped alignment of the oligonucleotide (probe) and a longer sequence (target). RESULTS: The k-distance match count between the probe and the target is defined as the number of ungapped alignments between the two sequences that have exactly k mismatches, and the k-neighbor match count is defined as the sum of the j-distance match counts for j between 0 and k. We derive a novel formula for the probability of a k-distance match. This formula is based on the assumption that the target is strand-symmetric Bernoulli text (i.e. nucleotides are independently, identically distributed in the target and satisfy Chargaff's second parity rule). Our model predicts that the GC-content in both the probe and the target significantly affects the match count expectation. The ratio of k-neighbor match counts in two distinct genomes for a given probe is a measure of its specificity. We calculated such ratios for pairs of bacterial genomes with different combinations of length, GC-content and phylogenetic distance. Examination of the extreme values of these ratios indicates that probes with a high discriminative power exist for each tested pair.

Published in

Links

Tools

Distribution of words with a predefined range of mismatches to a DNA probe in bacterial genomes

Abstract