Frontiers Media, Frontiers in Bioinformatics, (2), 2022
DOI: 10.3389/fbinf.2022.842964
Full text: Download
In natural products research, chemodiverse extracts coming from multiple organisms are explored for novel bioactive molecules, sometimes over extended periods. Samples are usually analyzed by liquid chromatography coupled with fragmentation mass spectrometry to acquire informative mass spectral ensembles. Such data is then exploited to establish relationships among analytes or samples (e.g., via molecular networking) and annotate metabolites. However, the comparison of samples profiled in different batches is challenging with current metabolomics methods since the experimental variation—changes in chromatographical or mass spectrometric conditions - hinders the direct comparison of the profiled samples. Here we introduce MEMO—MS2 BasEd SaMple VectOrization—a method allowing to cluster large amounts of chemodiverse samples based on their LC-MS/MS profiles in a retention time agnostic manner. This method is particularly suited for heterogeneous and chemodiverse sample sets. MEMO demonstrated similar clustering performance as state-of-the-art metrics considering fragmentation spectra. More importantly, such performance was achieved without the requirement of a prior feature alignment step and in a significantly shorter computational time. MEMO thus allows the comparison of vast ensembles of samples, even when analyzed over long periods of time, and on different chromatographic or mass spectrometry platforms. This new addition to the computational metabolomics toolbox should drastically expand the scope of large-scale comparative analysis.