Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Wieder, Cecilia; Frainay, Clément; Poupin, Nathalie; Rodríguez-Mier, Pablo; Vinson, Florence; Cooke, Juliette; Lai, Rachel Pj; Bundy, Jacob G.; Jourdan, Fabien; Ebbels, Timothy

Published in

Public Library of Science, PLoS Computational Biology, 9(17), p. e1009105, 2021

DOI: 10.1371/journal.pcbi.1009105

Tools

Export citation

Search in Google Scholar

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Journal article published in 2021 by Cecilia Wieder

, Clément Frainay

, Nathalie Poupin

, Pablo Rodríguez-Mier

, Florence Vinson

, Juliette Cooke

, Rachel Pj Lai

, Jacob G. Bundy

, Fabien Jourdan

, Timothy Ebbels

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention. Using five publicly available datasets, we demonstrated that changes in parameters such as the background set, differential metabolite selection methods, and pathway database used can result in profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases (KEGG, Reactome, and BioCyc), led to vastly different results in both the number and function of significantly enriched pathways. Factors that are specific to metabolomics data, such as the reliability of compound identification and the chemical bias of different analytical platforms also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.

Published in

Links

Tools

Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis

Abstract