Dynamics of domain coverage of the protein sequence universe

Rekapalli, Bhanu; Wuichet, Kristin; Peterson, Gregory D.; Zhulin, Igor B.

Published in

BioMed Central, BMC Genomics, 1(13), 2012

DOI: 10.1186/1471-2164-13-634

Tools

Export citation

Search in Google Scholar

Dynamics of domain coverage of the protein sequence universe

Journal article published in 2012 by Bhanu Rekapalli, Kristin Wuichet, Gregory D. Peterson, Igor B. Zhulin

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Abstract Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data.

Published in

Links

Tools

Dynamics of domain coverage of the protein sequence universe

Abstract