Review and research on feature selection methods from NMR data in biological fluids

Semmar, Nabil; Canlet, Cecile; Delplanque, Bernadette; Le Ruyet, Pascale; Paris, Alain; Martin, Jean-Charles

Links

[www.researchgate.net] | PDF

Tools

Export citation

Search in Google Scholar

Review and research on feature selection methods from NMR data in biological fluids

Journal article published in 2014 by Nabil Semmar, Cecile Canlet, Bernadette Delplanque, Pascale Le Ruyet, Alain Paris, Jean-Charles Martin

This paper is available in a repository.

Full text: Download

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

Metabolic pools of biological matrices can be extensively analyzed by NMR. Measured data consist of hundreds of NMR signals with different chemical shifts and intensities representing different metabolites' types and levels, respectively. Relevant predictive NMR signals need to be extracted from the pool using variable selection methods. This paper presents both a review and research on this metabolomics field. After reviews on discriminant potentials and statistical analyses of NMR data in biological fields, the paper presents an original approach to extract a small number of NMR signals in a biological matrix A (BM-A) in order to predict metabolic levels in another biological matrix B (BM-B). Initially, NMR dataset of BM-A was decomposed into several row-column homogeneous blocks using hierarchical cluster analysis (HCA). Then, each block was subjected to a complete set of Jackknifed correspondence analysis (CA) by removing separately each individual (row). Each CA condensed the numerous NMR signals into some principal components (PCs). The different PCs representing the (n - 1) active individuals were used as latent variables in a stepwise multi-linear regression to predict metabolic levels in BM-B. From the built regression model, metabolite level in the outside individual was predicted (for next model validation). From all the PCs-based regression models resulting from all the jackknifed CA applied on all the individuals, the most contributive NMR signals were identified by their highest absolute contributions to PCs. Finally, these selected NMR signals (measured in BMA) were used to build final population and sub-population regression models predicting metabolite levels in BM-B