Establishment of a standardized system to perform population structure analyses with limited sample size or with different sets of SNP genotypes

Natsuhiko Kumasaka, Yumi Yamaguchi-Kabata, Atsushi Takahashi, Michiaki Kubo, Yusuke Nakamura, Naoyuki Kamatani

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Recent studies have demonstrated that principal component analysis (PCA) can detect the presence of population mixture and admixture in a sample and thus can be used to correct population stratification in genome-wide association studies (GWAS). We propose a complementary approach to PCA that compensates for potential weaknesses associated with PCA, so that one can perform population structure analyses using limited numbers of subjects and single-nucleotide polymorphisms (SNPs). Our method first requires a PCA of the largest reference sample from a population to standardize the system. Once the system is established, it can perform PCA for each individual with a much smaller number of SNPs drawn from the same population. This is because of the introduction of the probabilistic PCA, so that the prediction of the principal components (PCs) is performed under a rigorous probabilistic framework. The subsequent linear discriminant analysis also helps to understand from which ancestries or subpopulations a given individual is more likely to derive, in terms of posterior probabilities given the predicted PCs. A real-world prototype of the system for the Japanese population is developed based on 19 260 subjects, which illustrates the potential usefulness of the system as an aid in the detection of population structures in validation samples, or to help with the correction of population stratification in GWAS.

Original languageEnglish
Pages (from-to)525-533
Number of pages9
JournalJournal of Human Genetics
Volume55
Issue number8
DOIs
Publication statusPublished - 01-08-2010

Fingerprint

Sample Size
Single Nucleotide Polymorphism
Principal Component Analysis
Genotype
Population
Genome-Wide Association Study
Discriminant Analysis

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this

Kumasaka, Natsuhiko ; Yamaguchi-Kabata, Yumi ; Takahashi, Atsushi ; Kubo, Michiaki ; Nakamura, Yusuke ; Kamatani, Naoyuki. / Establishment of a standardized system to perform population structure analyses with limited sample size or with different sets of SNP genotypes. In: Journal of Human Genetics. 2010 ; Vol. 55, No. 8. pp. 525-533.
@article{0cfc09b4927e4de1b66204d6c992a73a,
title = "Establishment of a standardized system to perform population structure analyses with limited sample size or with different sets of SNP genotypes",
abstract = "Recent studies have demonstrated that principal component analysis (PCA) can detect the presence of population mixture and admixture in a sample and thus can be used to correct population stratification in genome-wide association studies (GWAS). We propose a complementary approach to PCA that compensates for potential weaknesses associated with PCA, so that one can perform population structure analyses using limited numbers of subjects and single-nucleotide polymorphisms (SNPs). Our method first requires a PCA of the largest reference sample from a population to standardize the system. Once the system is established, it can perform PCA for each individual with a much smaller number of SNPs drawn from the same population. This is because of the introduction of the probabilistic PCA, so that the prediction of the principal components (PCs) is performed under a rigorous probabilistic framework. The subsequent linear discriminant analysis also helps to understand from which ancestries or subpopulations a given individual is more likely to derive, in terms of posterior probabilities given the predicted PCs. A real-world prototype of the system for the Japanese population is developed based on 19 260 subjects, which illustrates the potential usefulness of the system as an aid in the detection of population structures in validation samples, or to help with the correction of population stratification in GWAS.",
author = "Natsuhiko Kumasaka and Yumi Yamaguchi-Kabata and Atsushi Takahashi and Michiaki Kubo and Yusuke Nakamura and Naoyuki Kamatani",
year = "2010",
month = "8",
day = "1",
doi = "10.1038/jhg.2010.63",
language = "English",
volume = "55",
pages = "525--533",
journal = "Journal of Human Genetics",
issn = "1434-5161",
publisher = "Nature Publishing Group",
number = "8",

}

Establishment of a standardized system to perform population structure analyses with limited sample size or with different sets of SNP genotypes. / Kumasaka, Natsuhiko; Yamaguchi-Kabata, Yumi; Takahashi, Atsushi; Kubo, Michiaki; Nakamura, Yusuke; Kamatani, Naoyuki.

In: Journal of Human Genetics, Vol. 55, No. 8, 01.08.2010, p. 525-533.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Establishment of a standardized system to perform population structure analyses with limited sample size or with different sets of SNP genotypes

AU - Kumasaka, Natsuhiko

AU - Yamaguchi-Kabata, Yumi

AU - Takahashi, Atsushi

AU - Kubo, Michiaki

AU - Nakamura, Yusuke

AU - Kamatani, Naoyuki

PY - 2010/8/1

Y1 - 2010/8/1

N2 - Recent studies have demonstrated that principal component analysis (PCA) can detect the presence of population mixture and admixture in a sample and thus can be used to correct population stratification in genome-wide association studies (GWAS). We propose a complementary approach to PCA that compensates for potential weaknesses associated with PCA, so that one can perform population structure analyses using limited numbers of subjects and single-nucleotide polymorphisms (SNPs). Our method first requires a PCA of the largest reference sample from a population to standardize the system. Once the system is established, it can perform PCA for each individual with a much smaller number of SNPs drawn from the same population. This is because of the introduction of the probabilistic PCA, so that the prediction of the principal components (PCs) is performed under a rigorous probabilistic framework. The subsequent linear discriminant analysis also helps to understand from which ancestries or subpopulations a given individual is more likely to derive, in terms of posterior probabilities given the predicted PCs. A real-world prototype of the system for the Japanese population is developed based on 19 260 subjects, which illustrates the potential usefulness of the system as an aid in the detection of population structures in validation samples, or to help with the correction of population stratification in GWAS.

AB - Recent studies have demonstrated that principal component analysis (PCA) can detect the presence of population mixture and admixture in a sample and thus can be used to correct population stratification in genome-wide association studies (GWAS). We propose a complementary approach to PCA that compensates for potential weaknesses associated with PCA, so that one can perform population structure analyses using limited numbers of subjects and single-nucleotide polymorphisms (SNPs). Our method first requires a PCA of the largest reference sample from a population to standardize the system. Once the system is established, it can perform PCA for each individual with a much smaller number of SNPs drawn from the same population. This is because of the introduction of the probabilistic PCA, so that the prediction of the principal components (PCs) is performed under a rigorous probabilistic framework. The subsequent linear discriminant analysis also helps to understand from which ancestries or subpopulations a given individual is more likely to derive, in terms of posterior probabilities given the predicted PCs. A real-world prototype of the system for the Japanese population is developed based on 19 260 subjects, which illustrates the potential usefulness of the system as an aid in the detection of population structures in validation samples, or to help with the correction of population stratification in GWAS.

UR - http://www.scopus.com/inward/record.url?scp=77957563048&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77957563048&partnerID=8YFLogxK

U2 - 10.1038/jhg.2010.63

DO - 10.1038/jhg.2010.63

M3 - Article

C2 - 20555335

AN - SCOPUS:77957563048

VL - 55

SP - 525

EP - 533

JO - Journal of Human Genetics

JF - Journal of Human Genetics

SN - 1434-5161

IS - 8

ER -