Dissemin is shutting down on January 1st, 2025

Published in

Elsevier, Analytica Chimica Acta, (813), p. 25-34, 2014

DOI: 10.1016/j.aca.2014.01.025

Links

Tools

Export citation

Search in Google Scholar

Iterative weighting of multiblock data in the orthogonal partial least squares framework

Journal article published in 2014 by Julien Boccard ORCID, Douglas N. Rutledge
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

The integration of multiple data sources has emerged as a pivotal aspect to assess complex systems comprehensively. This new paradigm requires the ability to separate common and redundant from specific and complementary information during the joint analysis of several data blocks. However, inherent problems encountered when analysing single tables are amplified with the generation of multiblock datasets. Finding the relationships between data layers of increasing complexity constitutes therefore a challenging task. In the present work, an algorithm is proposed for the supervised analysis of multiblock data structures. It associates the advantages of interpretability from the orthogonal partial least squares (OPLS) framework and the ability of common component and specific weights analysis (CCSWA) to weight each data table individually in order to grasp its specificities and handle efficiently the different sources of Y-orthogonal variation. Three applications are proposed for illustration purposes. A first example refers to a quantitative structure-activity relationship study aiming to predict the binding affinity of flavonoids toward the P-glycoprotein based on physicochemical properties. A second application concerns the integration of several groups of sensory attributes for overall quality assessment of a series of red wines. A third case study highlights the ability of the method to combine very large heterogeneous data blocks from Omics experiments in systems biology. Results were compared to the reference multiblock partial least squares (MBPLS) method to assess the performance of the proposed algorithm in terms of predictive ability and model interpretability. In all cases, ComDim-OPLS was demonstrated as a relevant data mining strategy for the simultaneous analysis of multiblock structures by accounting for specific variation sources in each dataset and providing a balance between predictive and descriptive purpose.