Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data

Kosuke Yoshida, Junichiro Yoshimoto, Kenji Doya

Research output: Contribution to journalArticlepeer-review

29 Citations (Scopus)

Abstract

Background: Advance in high-throughput technologies in genomics, transcriptomics, and metabolomics has created demand for bioinformatics tools to integrate high-dimensional data from different sources. Canonical correlation analysis (CCA) is a statistical tool for finding linear associations between different types of information. Previous extensions of CCA used to capture nonlinear associations, such as kernel CCA, did not allow feature selection or capturing of multiple canonical components. Here we propose a novel method, two-stage kernel CCA (TSKCCA) to select appropriate kernels in the framework of multiple kernel learning. Results: TSKCCA first selects relevant kernels based on the HSIC criterion in the multiple kernel learning framework. Weights are then derived by non-negative matrix decomposition with L1 regularization. Using artificial datasets and nutrigenomic datasets, we show that TSKCCA can extract multiple, nonlinear associations among high-dimensional data and multiplicative interactions among variables. Conclusions: TSKCCA can identify nonlinear associations among high-dimensional data more reliably than previous nonlinear CCA methods.

Original languageEnglish
Article number108
JournalBMC Bioinformatics
Volume18
Issue number1
DOIs
Publication statusPublished - 14-02-2017
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data'. Together they form a unique fingerprint.

Cite this