Establishment of a standardized system to perform population structure analyses with limited sample size or with different sets of SNP genotypes

Natsuhiko Kumasaka, Yumi Yamaguchi-Kabata, Atsushi Takahashi, Michiaki Kubo, Yusuke Nakamura, Naoyuki Kamatani

Research output: Contribution to journalArticle

6 Citations (Scopus)


Recent studies have demonstrated that principal component analysis (PCA) can detect the presence of population mixture and admixture in a sample and thus can be used to correct population stratification in genome-wide association studies (GWAS). We propose a complementary approach to PCA that compensates for potential weaknesses associated with PCA, so that one can perform population structure analyses using limited numbers of subjects and single-nucleotide polymorphisms (SNPs). Our method first requires a PCA of the largest reference sample from a population to standardize the system. Once the system is established, it can perform PCA for each individual with a much smaller number of SNPs drawn from the same population. This is because of the introduction of the probabilistic PCA, so that the prediction of the principal components (PCs) is performed under a rigorous probabilistic framework. The subsequent linear discriminant analysis also helps to understand from which ancestries or subpopulations a given individual is more likely to derive, in terms of posterior probabilities given the predicted PCs. A real-world prototype of the system for the Japanese population is developed based on 19 260 subjects, which illustrates the potential usefulness of the system as an aid in the detection of population structures in validation samples, or to help with the correction of population stratification in GWAS.

Original languageEnglish
Pages (from-to)525-533
Number of pages9
JournalJournal of Human Genetics
Issue number8
Publication statusPublished - 01-08-2010


All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this