Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID

Shen, Huibin; Zamboni, Nicola; Heinonen, Markus; Rousu, Juho

Published in

MDPI, Metabolites, 2(3), p. 484-505, 2013

DOI: 10.3390/metabo3020484

Tools

Export citation

Search in Google Scholar

Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID

Journal article published in 2013 by Huibin Shen, Nicola Zamboni, Markus Heinonen, Juho Rousu

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Metabolite identification is a major bottleneck in metabolomics due to the number and diversity of the molecules. To alleviate this bottleneck, computational methods and tools that reliably filter the set of candidates are needed for further analysis by human experts. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for developing a new genre of metabolite identification methods that rely on machine learning as the primary vehicle for identification. In this paper we describe the machine learning approach used in FingerID, its application to the CASMI challenges and some results that were not part of our challenge submission. In short, FingerID learns to predict molecular fingerprints from a large collection of MS/MS spectra, and uses the predicted fingerprints to retrieve and rank candidate molecules from a given large molecular database. Furthermore, we introduce a web server for FingerID, which was applied for the first time to the CASMI challenges. The challenge results show that the new machine learning framework produces competitive results on those challenge molecules that were found within the relatively restricted KEGG compound database. Additional experiments on the PubChem database confirm the feasibility of the approach even on a much larger database, although room for improvement still remains.

Published in

Links

Tools

Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID

Abstract