TY - JOUR
T1 - Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions
AU - Tokuda, Tomoki
AU - Yoshimoto, Junichiro
AU - Shimizu, Yu
AU - Okada, Go
AU - Takamura, Masahiro
AU - Okamoto, Yasumasa
AU - Yamawaki, Shigeto
AU - Doya, Kenji
N1 - Publisher Copyright:
© 2017 Tokuda et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2017/10
Y1 - 2017/10
N2 - We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.
AB - We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.
UR - http://www.scopus.com/inward/record.url?scp=85031783953&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031783953&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0186566
DO - 10.1371/journal.pone.0186566
M3 - Article
C2 - 29049392
AN - SCOPUS:85031783953
SN - 1932-6203
VL - 12
JO - PloS one
JF - PloS one
IS - 10
M1 - e0186566
ER -