Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Kotlov, Nikita; Shaposhnikov, Kirill; Tazearslan, Cagdas; Chasse, Madison; Baisangurov, Artur; Podsvirova, Svetlana; Fernandez, Dawn; Abdou, Mary; Kaneunyenye, Leznath; Morgan, Kelley; Cheremushkin, Ilya; Zemskiy, Pavel; Chelushkin, Maxim; Sorokina, Maria; Belova, Ekaterina; Khorkova, Svetlana; Lozinsky, Yaroslav; Nuzhdina, Katerina; Vasileva, Elena; Kravchenko, Dmitry; Suryamohan, Kushal; Nomie, Krystle; Curran, John; Fowler, Nathan; Bagaev, Alexander

Published in

Nature Research, Communications Biology, 1(7), 2024

DOI: 10.1038/s42003-024-06020-z

Tools

Export citation

Search in Google Scholar

Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Journal article published in 2024 by Nikita Kotlov

, Kirill Shaposhnikov

, Cagdas Tazearslan, Madison Chasse, Artur Baisangurov, Svetlana Podsvirova, Dawn Fernandez, Mary Abdou, Leznath Kaneunyenye, Kelley Morgan, Ilya Cheremushkin, Pavel Zemskiy, Maxim Chelushkin

, Maria Sorokina

, Ekaterina Belova and other authors.

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractWith the increased use of gene expression profiling for personalized oncology, optimized RNA sequencing (RNA-seq) protocols and algorithms are necessary to provide comparable expression measurements between exome capture (EC)-based and poly-A RNA-seq. Here, we developed and optimized an EC-based protocol for processing formalin-fixed, paraffin-embedded samples and a machine-learning algorithm, Procrustes, to overcome batch effects across RNA-seq data obtained using different sample preparation protocols like EC-based or poly-A RNA-seq protocols. Applying Procrustes to samples processed using EC and poly-A RNA-seq protocols showed the expression of 61% of genes (N = 20,062) to correlate across both protocols (concordance correlation coefficient > 0.8, versus 26% before transformation by Procrustes), including 84% of cancer-specific and cancer microenvironment-related genes (versus 36% before applying Procrustes; N = 1,438). Benchmarking analyses also showed Procrustes to outperform other batch correction methods. Finally, we showed that Procrustes can project RNA-seq data for a single sample to a larger cohort of RNA-seq data. Future application of Procrustes will enable direct gene expression analysis for single tumor samples to support gene expression-based treatment decisions.

Published in

Links

Tools

Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data

Abstract