Links

Tools

Export citation

Search in Google Scholar

Studies in Probabilistic Sequence Alignment and Evolution

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

The complete sequencing of whole genomes presents opportunities for detailed study of molecular evolution. This thesis combines theoretical developments of Bayesian approaches in bioinformatics with analysis of duplications in the recently completed {em C.elegans} genome. Developments in the Bayesian probabilistic framework for sequence analysis using hidden Markov models (HMMs) are described. The principal HMM algorithms are reviewed including alignment, training and model comparison. Theory is developed for prediction of alignment accuracy and tested using simulations. Software to provide accuracy measures for multiple alignments, based on the popular HMMER suite of profile-based alignment algorithms, is presented and evaluated with reference to the Pfam database of multiple alignments. Several of these statistical techniques are applied to an analysis of genomic duplications in the {em C.elegans} genome. The completion of this - the first animal genome - offers an opportunity to study the random duplication that are believed to be the first step in the evolution of a new gene. The construction of a database of non-coding duplications is described and measurements of molecular evolutionary parameters in {em C.elegans} are calculated from the data and reported. A method of dating gene duplications using alignments between conserved introns is presented and compared to existing methods using Bayesian techniques developed earlier in the dissertation. Amongst the principal agents involved in creating genomic duplications are transposons; one of the simplest families of transposon is the Tc1-{em mariner} family, of which two distinct active subfamilies are well-known in {em C.elegans}. Using HMM profiles, six new subfamilies of {em mariner}-like transposon have been identified in the {em C.elegans} genome. Several of the new subfamilies display interesting homologies to one another, suggestive of common mechanisms of transpositional catalysis. Finally, the software tools developed during this project are described and made available for public retrieval from the Sanger Centre web site.