Published in

Frontiers Media, Frontiers in Genetics, (5), 2014

DOI: 10.3389/fgene.2014.00098

Links

Tools

Export citation

Search in Google Scholar

categoryCompare, an analytical tool based on feature annotations

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Assessment of high-throughput –omics data initially focuses on relative or raw levels of a particular feature, such as an expression value for a transcript, protein, or metabolite. At a second level, analyses of annotations including known or predicted functions and associations of each individual feature, attempt to distill biological context. Most currently available comparative- and meta-analyses methods are dependent on the availability of identical features across data sets, and concentrate on determining features that are differentially expressed across experiments, some of which may be considered “biomarkers”. The heterogeneity of measurement platforms and inherent variability of biological systems confounds the search for robust biomarkers indicative of a particular condition. In many instances, however, multiple data sets show involvement of common biological processes or signaling pathways, even though individual features are not commonly measured or differentially expressed between them.We developed a methodology, CATEGORYCOMPARE, for cross-platform and cross-sample comparison of high-throughput data at the annotation level. We assessed the utility of the approach using hypothetical data, as well as determining similarities and differences in the set of processes in two instances: 1) denervated skin vs. denervated muscle, and 2) colon from Crohn’s disease vs. colon from ulcerative colitis. The hypothetical data showed that in many cases comparing annotations gave superior results to comparing only at the gene level. Improved analytical results depended as well on the number of genes included in the annotation term, the amount of noise in relation to the number of genes expressing in unenriched annotation categories, and the specific method in which samples are combined.CATEGORYCOMPARE is available from http://bioconductor.org/packages/release/bioc/html/categoryCompare.html