Full text: Download
Publicly available gene expression datasets were analyzed to develop a chromophobe and oncocytoma related gene signature (COGS) to distinguish chRCC from RO. The datasets GSE11151, GSE19982, GSE2109, GSE8271 and GSE11024 were combined into a discovery dataset. The transcriptomic differences were identified with unsupervised learning in the discovery dataset (97.8% accuracy) with density based UMAP (DBU). The top 30 genes were identified by univariate gene expression analysis and ROC analysis, to create a gene signature called COGS. COGS, combined with DBU, was able to differentiate chRCC from RO in the discovery dataset with an accuracy of 97.8%. The classification accuracy of COGS was validated in an independent meta-dataset consisting of TCGA-KICH and GSE12090, where COGS could differentiate chRCC from RO with 100% accuracy. The differentially expressed genes were involved in carbohydrate metabolism, transcriptomic regulation by TP53, beta-catenin-dependent Wnt signaling, and cytokine (IL-4 and IL-13) signaling highly active in cancer cells. Using multiple datasets and machine learning, we constructed and validated COGS as a tool that can differentiate chRCC from RO and complement histology in routine clinical practice to distinguish these two tumors.