PlatinumCNV

A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data

Natsuhiko Kumasaka, Hironori Fujisawa, Naoya Hosono, Yukinori Okada, Atsushi Takahashi, Yusuke Nakamura, Michiaki Kubo, Naoyuki Kamatani

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

We present a statistical model for allele-specific patterns of copy number polymorphisms (CNPs) in commercial single nucleotide polymorphism (SNP) array data. This model is based on the observation that fluorescent signal intensities tend to cluster into clouds of similar allele-specific copy number (ASCN) genotypes at each SNP locus. To capture the tendency of this clustering to be made vague by instrumental errors, our model allows for cluster memberships to overlap each other, according to a Bayesian Gaussian mixture model (GMM). This approach is flexible, allowing for both absolute scale differences and X/Y scale imbalances of fluorescent signal intensities. The resulting model is also robust toward unobserved ASCN genotypes, which can be problematic for ordinary GMMs. We illustrated the utility of the model by applying it to commercial SNP array intensity data obtained from the Illumina HumanHap 610K platform. We retrieved more than 4,000 allele-specific CNPs, though 99% of them showed rather simple allele-specific CNP patterns with only a single aneuploid haplotype among the normal haplotypes. The genotyping accuracy was assessed by two approaches, quantitative PCR and replicated subjects. The results of both of these approaches demonstrated mean genotyping error rates of 1%. We demonstrated a preliminary genome-wide association study of three hematological traits. The result exhibited that it could form the foundation for new, more effective statistical methods for the mapping of both disease genes and quantitative trait loci with genome-wide CNPs. The methods described in this work are implemented in a software package, PlatinumCNV, available on the Internet.

Original languageEnglish
Pages (from-to)831-844
Number of pages14
JournalGenetic Epidemiology
Volume35
Issue number8
DOIs
Publication statusPublished - 01-12-2011
Externally publishedYes

Fingerprint

DNA Copy Number Variations
Single Nucleotide Polymorphism
Alleles
Haplotypes
Genotype
Quantitative Trait Loci
Genome-Wide Association Study
Aneuploidy
Statistical Models
Internet
Cluster Analysis
Software
Genome
Polymerase Chain Reaction

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Genetics(clinical)

Cite this

Kumasaka, N., Fujisawa, H., Hosono, N., Okada, Y., Takahashi, A., Nakamura, Y., ... Kamatani, N. (2011). PlatinumCNV: A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data. Genetic Epidemiology, 35(8), 831-844. https://doi.org/10.1002/gepi.20633
Kumasaka, Natsuhiko ; Fujisawa, Hironori ; Hosono, Naoya ; Okada, Yukinori ; Takahashi, Atsushi ; Nakamura, Yusuke ; Kubo, Michiaki ; Kamatani, Naoyuki. / PlatinumCNV : A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data. In: Genetic Epidemiology. 2011 ; Vol. 35, No. 8. pp. 831-844.
@article{a2cb1cb5399247f5a580e3726a0078ac,
title = "PlatinumCNV: A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data",
abstract = "We present a statistical model for allele-specific patterns of copy number polymorphisms (CNPs) in commercial single nucleotide polymorphism (SNP) array data. This model is based on the observation that fluorescent signal intensities tend to cluster into clouds of similar allele-specific copy number (ASCN) genotypes at each SNP locus. To capture the tendency of this clustering to be made vague by instrumental errors, our model allows for cluster memberships to overlap each other, according to a Bayesian Gaussian mixture model (GMM). This approach is flexible, allowing for both absolute scale differences and X/Y scale imbalances of fluorescent signal intensities. The resulting model is also robust toward unobserved ASCN genotypes, which can be problematic for ordinary GMMs. We illustrated the utility of the model by applying it to commercial SNP array intensity data obtained from the Illumina HumanHap 610K platform. We retrieved more than 4,000 allele-specific CNPs, though 99{\%} of them showed rather simple allele-specific CNP patterns with only a single aneuploid haplotype among the normal haplotypes. The genotyping accuracy was assessed by two approaches, quantitative PCR and replicated subjects. The results of both of these approaches demonstrated mean genotyping error rates of 1{\%}. We demonstrated a preliminary genome-wide association study of three hematological traits. The result exhibited that it could form the foundation for new, more effective statistical methods for the mapping of both disease genes and quantitative trait loci with genome-wide CNPs. The methods described in this work are implemented in a software package, PlatinumCNV, available on the Internet.",
author = "Natsuhiko Kumasaka and Hironori Fujisawa and Naoya Hosono and Yukinori Okada and Atsushi Takahashi and Yusuke Nakamura and Michiaki Kubo and Naoyuki Kamatani",
year = "2011",
month = "12",
day = "1",
doi = "10.1002/gepi.20633",
language = "English",
volume = "35",
pages = "831--844",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "8",

}

Kumasaka, N, Fujisawa, H, Hosono, N, Okada, Y, Takahashi, A, Nakamura, Y, Kubo, M & Kamatani, N 2011, 'PlatinumCNV: A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data', Genetic Epidemiology, vol. 35, no. 8, pp. 831-844. https://doi.org/10.1002/gepi.20633

PlatinumCNV : A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data. / Kumasaka, Natsuhiko; Fujisawa, Hironori; Hosono, Naoya; Okada, Yukinori; Takahashi, Atsushi; Nakamura, Yusuke; Kubo, Michiaki; Kamatani, Naoyuki.

In: Genetic Epidemiology, Vol. 35, No. 8, 01.12.2011, p. 831-844.

Research output: Contribution to journalArticle

TY - JOUR

T1 - PlatinumCNV

T2 - A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data

AU - Kumasaka, Natsuhiko

AU - Fujisawa, Hironori

AU - Hosono, Naoya

AU - Okada, Yukinori

AU - Takahashi, Atsushi

AU - Nakamura, Yusuke

AU - Kubo, Michiaki

AU - Kamatani, Naoyuki

PY - 2011/12/1

Y1 - 2011/12/1

N2 - We present a statistical model for allele-specific patterns of copy number polymorphisms (CNPs) in commercial single nucleotide polymorphism (SNP) array data. This model is based on the observation that fluorescent signal intensities tend to cluster into clouds of similar allele-specific copy number (ASCN) genotypes at each SNP locus. To capture the tendency of this clustering to be made vague by instrumental errors, our model allows for cluster memberships to overlap each other, according to a Bayesian Gaussian mixture model (GMM). This approach is flexible, allowing for both absolute scale differences and X/Y scale imbalances of fluorescent signal intensities. The resulting model is also robust toward unobserved ASCN genotypes, which can be problematic for ordinary GMMs. We illustrated the utility of the model by applying it to commercial SNP array intensity data obtained from the Illumina HumanHap 610K platform. We retrieved more than 4,000 allele-specific CNPs, though 99% of them showed rather simple allele-specific CNP patterns with only a single aneuploid haplotype among the normal haplotypes. The genotyping accuracy was assessed by two approaches, quantitative PCR and replicated subjects. The results of both of these approaches demonstrated mean genotyping error rates of 1%. We demonstrated a preliminary genome-wide association study of three hematological traits. The result exhibited that it could form the foundation for new, more effective statistical methods for the mapping of both disease genes and quantitative trait loci with genome-wide CNPs. The methods described in this work are implemented in a software package, PlatinumCNV, available on the Internet.

AB - We present a statistical model for allele-specific patterns of copy number polymorphisms (CNPs) in commercial single nucleotide polymorphism (SNP) array data. This model is based on the observation that fluorescent signal intensities tend to cluster into clouds of similar allele-specific copy number (ASCN) genotypes at each SNP locus. To capture the tendency of this clustering to be made vague by instrumental errors, our model allows for cluster memberships to overlap each other, according to a Bayesian Gaussian mixture model (GMM). This approach is flexible, allowing for both absolute scale differences and X/Y scale imbalances of fluorescent signal intensities. The resulting model is also robust toward unobserved ASCN genotypes, which can be problematic for ordinary GMMs. We illustrated the utility of the model by applying it to commercial SNP array intensity data obtained from the Illumina HumanHap 610K platform. We retrieved more than 4,000 allele-specific CNPs, though 99% of them showed rather simple allele-specific CNP patterns with only a single aneuploid haplotype among the normal haplotypes. The genotyping accuracy was assessed by two approaches, quantitative PCR and replicated subjects. The results of both of these approaches demonstrated mean genotyping error rates of 1%. We demonstrated a preliminary genome-wide association study of three hematological traits. The result exhibited that it could form the foundation for new, more effective statistical methods for the mapping of both disease genes and quantitative trait loci with genome-wide CNPs. The methods described in this work are implemented in a software package, PlatinumCNV, available on the Internet.

UR - http://www.scopus.com/inward/record.url?scp=82455203887&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82455203887&partnerID=8YFLogxK

U2 - 10.1002/gepi.20633

DO - 10.1002/gepi.20633

M3 - Article

VL - 35

SP - 831

EP - 844

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 8

ER -