Public Library of Science, PLoS ONE, 3(6), p. e18202, 2011
DOI: 10.1371/journal.pone.0018202
Full text: Download
Background Public data integration may help overcome challenges in clinical implementation of microarray profiles. We integrated several ovarian cancer datasets to identify a reproducible predictor of survival. Methodology/Principal Findings Four microarray datasets from different institutions comprising 265 advanced stage tumors were uniformly reprocessed into a single training dataset, also adjusting for inter-laboratory variation (“batch-effect”). Supervised principal component survival analysis was employed to identify prognostic models. Models were independently validated in a 61-patient cohort using a custom array genechip and a publicly available 229-array dataset. Molecular correspondence of high- and low-risk outcome groups between training and validation datasets was demonstrated using Subclass Mapping. Previously established molecular phenotypes in the 2nd validation set were correlated with high and low-risk outcome groups. Functional representational and pathway analysis was used to explore gene networks associated with high and low risk phenotypes. A 19-gene model showed optimal performance in the training set (median OS 31 and 78 months, p