Identification of a minimum number of genes to predict triple-negative breast cancer subgroups from gene expression profiles

Akhouayri, Laila; Ostano, Paola; Mello-Grand, Maurizia; Gregnanin, Ilaria; Crivelli, Francesca; Laurora, Sara; Liscia, Daniele; Leone, Francesco; Santoro, Angela; Mulè, Antonino; Guarino, Donatella; Maggiore, Claudia; Carlino, Angela; Magno, Stefano; Scatolini, Maria; Di Leone, Alba; Masetti, Riccardo; Chiorino, Giovanna

Published in

BioMed Central, Human Genomics, 1(16), 2022

DOI: 10.1186/s40246-022-00436-6

Tools

Export citation

Search in Google Scholar

Identification of a minimum number of genes to predict triple-negative breast cancer subgroups from gene expression profiles

Journal article published in 2022 by Laila Akhouayri, Paola Ostano, Maurizia Mello-Grand, Ilaria Gregnanin, Francesca Crivelli, Sara Laurora, Daniele Liscia, Francesco Leone, Angela Santoro, Antonino Mulè, Donatella Guarino, Claudia Maggiore, Angela Carlino, Stefano Magno, Maria Scatolini and other authors.

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Abstract Background Triple-negative breast cancer (TNBC) is a very heterogeneous disease. Several gene expression and mutation profiling approaches were used to classify it, and all converged to the identification of distinct molecular subtypes, with some overlapping across different approaches. However, a standardised tool to routinely classify TNBC in the clinics and guide personalised treatment is lacking. We aimed at defining a specific gene signature for each of the six TNBC subtypes proposed by Lehman et al. in 2011 (basal-like 1 (BL1); basal-like 2 (BL2); mesenchymal (M); immunomodulatory (IM); mesenchymal stem-like (MSL); and luminal androgen receptor (LAR)), to be able to accurately predict them. Methods Lehman’s TNBCtype subtyping tool was applied to RNA-sequencing data from 482 TNBC (GSE164458), and a minimal subtype-specific gene signature was defined by combining two class comparison techniques with seven attribute selection methods. Several machine learning algorithms for subtype prediction were used, and the best classifier was applied on microarray data from 72 Italian TNBC and on the TNBC subset of the BRCA-TCGA data set. Results We identified two signatures with the 120 and 81 top up- and downregulated genes that define the six TNBC subtypes, with prediction accuracy ranging from 88.6 to 89.4%, and even improving after removal of the least important genes. Network analysis was used to identify highly interconnected genes within each subgroup. Two druggable matrix metalloproteinases were found in the BL1 and BL2 subsets, and several druggable targets were complementary to androgen receptor or aromatase in the LAR subset. Several secondary drug–target interactions were found among the upregulated genes in the M, IM and MSL subsets. Conclusions Our study took full advantage of available TNBC data sets to stratify samples and genes into distinct subtypes, according to gene expression profiles. The development of a data mining approach to acquire a large amount of information from several data sets has allowed us to identify a well-determined minimal number of genes that may help in the recognition of TNBC subtypes. These genes, most of which have been previously found to be associated with breast cancer, have the potential to become novel diagnostic markers and/or therapeutic targets for specific TNBC subsets.

Published in

Links

Tools

Identification of a minimum number of genes to predict triple-negative breast cancer subgroups from gene expression profiles

Abstract