Full text: Download
Big data processing, using omics data integration and machine learning (ML) methods, drive efforts to discover diagnostic and prognostic biomarkers for clinical decision making. Previously, we used the TCGA database for gene expression profiling of breast, ovary, and endometrial cancers, and identified a top-scoring network centered on the ERBB2 gene, which plays a crucial role in carcinogenesis in the three estrogen-dependent tumors. Here, we focused on microRNA expression signature similarity, asking whether they could target the ERBB family. We applied an ML approach on integrated TCGA miRNA profiling of breast, endometrium, and ovarian cancer to identify common miRNA signatures differentiating tumor and normal conditions. Using the ML-based algorithm and the miRTarBase database, we found 205 features and 158 miRNAs targeting ERBB isoforms, respectively. By merging the results of both databases and ranking each feature according to the weighted Support Vector Machine model, we prioritized 42 features, with accuracy (0.98), AUC (0.93–95% CI 0.917–0.94), sensitivity (0.85), and specificity (0.99), indicating their diagnostic capability to discriminate between the two conditions. In vitro validations by qRT-PCR experiments, using model and parental cell lines for each tumor type showed that five miRNAs (hsa-mir-323a-3p, hsa-mir-323b-3p, hsa-mir-331-3p, hsa-mir-381-3p, and hsa-mir-1301-3p) had expressed trend concordance between breast, ovarian, and endometrium cancer cell lines compared with normal lines, confirming our in silico predictions. This shows that an integrated computational approach combined with biological knowledge, could identify expression signatures as potential diagnostic biomarkers common to multiple tumors.