Published in

Oxford University Press, Nucleic Acids Research, 12(41), p. e121-e121, 2013

DOI: 10.1093/nar/gkt263

Links

Tools

Export citation

Search in Google Scholar

Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions

Journal article published in 2013 by Jaina Mistry, Robert D. Finn, Sean R. Eddy ORCID, Alex Bateman ORCID, Marco Punta
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Detection of protein homology via sequence similar-ity has important applications in biology, from protein structure and function prediction to recon-struction of phylogenies. Although current methods for aligning protein sequences are powerful, chal-lenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to>13000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated in-clusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.