Links

Tools

Export citation

Search in Google Scholar

FSOL - a workflow for the detection of patient subgroups and affected molecular features in high-throughput omics data

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

In personalized medicine, one major goal is the identification of yet unknown patient subgroups with specific gene or protein expression. Different subgroups can indicate different molecular subtypes of a disease. These subtypes might correlate with disease progression, prognosis or therapy response, and the subgroup-specific genes or proteins are potential drug targets. Using high-throughput molecular data, the aim is to characterize the patient subgroup by identifying both the set of samples that shows a distinct expression pattern as well as the set of features that are affected. We present the new workflow FSOL for the identification of patient subgroups from two sample comparisons (e.g. healthy vs. diseased). First, a pre-filtering based on the univariate score FisherSum (FS) is applied to assess subgroup-specific expression of the features. FS has been shown to outperform competing methods in several settings. Second, the selected features are compared regarding the samples that form the affected subgroup. This step uses the OrderedList (OL) method that was originally developed for the comparison of result lists from gene expression studies. We compare our workflow FSOL to a reference workflow based on biclustering using real world and simulated data. On a leukemia data set, a true biological subgroup can be detected with higher stability by FSOL. On simulated data, FSOL shows higher sensitivity and accuracy compared to biclustering especially for small to moderate differences. The exploratory approach FSOL may help in identifying yet unknown mechanisms in pathologic processes and may assist in the generation of new research hypotheses.