Sequence similarity is more relevant than species specificity in probabilistic backtranslation

Ferro, Alfredo; Giugno, Rosalba; Pigola, Giuseppe; Pulvirenti, Alfredo; Di Pietro, Cinzia; Purrello, Michele; Ragusa, Marco

Published in

BioMed Central, BMC Bioinformatics, 1(8), 2007

DOI: 10.1186/1471-2105-8-58

Tools

Export citation

Search in Google Scholar

Sequence similarity is more relevant than species specificity in probabilistic backtranslation

Journal article published in 2007 by Alfredo Ferro, Rosalba Giugno

, Giuseppe Pigola, Alfredo Pulvirenti

, Cinzia Di Pietro

, Michele Purrello, Marco Ragusa

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Background: Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. Results: This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. Conclusion: The predicti on quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.

Published in

Links

Tools

Sequence similarity is more relevant than species specificity in probabilistic backtranslation

Abstract