TY - JOUR
T1 - PlatinumCNV
T2 - A Bayesian Gaussian mixture model for genotyping copy number polymorphisms using SNP array signal intensity data
AU - Kumasaka, Natsuhiko
AU - Fujisawa, Hironori
AU - Hosono, Naoya
AU - Okada, Yukinori
AU - Takahashi, Atsushi
AU - Nakamura, Yusuke
AU - Kubo, Michiaki
AU - Kamatani, Naoyuki
PY - 2011/12
Y1 - 2011/12
N2 - We present a statistical model for allele-specific patterns of copy number polymorphisms (CNPs) in commercial single nucleotide polymorphism (SNP) array data. This model is based on the observation that fluorescent signal intensities tend to cluster into clouds of similar allele-specific copy number (ASCN) genotypes at each SNP locus. To capture the tendency of this clustering to be made vague by instrumental errors, our model allows for cluster memberships to overlap each other, according to a Bayesian Gaussian mixture model (GMM). This approach is flexible, allowing for both absolute scale differences and X/Y scale imbalances of fluorescent signal intensities. The resulting model is also robust toward unobserved ASCN genotypes, which can be problematic for ordinary GMMs. We illustrated the utility of the model by applying it to commercial SNP array intensity data obtained from the Illumina HumanHap 610K platform. We retrieved more than 4,000 allele-specific CNPs, though 99% of them showed rather simple allele-specific CNP patterns with only a single aneuploid haplotype among the normal haplotypes. The genotyping accuracy was assessed by two approaches, quantitative PCR and replicated subjects. The results of both of these approaches demonstrated mean genotyping error rates of 1%. We demonstrated a preliminary genome-wide association study of three hematological traits. The result exhibited that it could form the foundation for new, more effective statistical methods for the mapping of both disease genes and quantitative trait loci with genome-wide CNPs. The methods described in this work are implemented in a software package, PlatinumCNV, available on the Internet.
AB - We present a statistical model for allele-specific patterns of copy number polymorphisms (CNPs) in commercial single nucleotide polymorphism (SNP) array data. This model is based on the observation that fluorescent signal intensities tend to cluster into clouds of similar allele-specific copy number (ASCN) genotypes at each SNP locus. To capture the tendency of this clustering to be made vague by instrumental errors, our model allows for cluster memberships to overlap each other, according to a Bayesian Gaussian mixture model (GMM). This approach is flexible, allowing for both absolute scale differences and X/Y scale imbalances of fluorescent signal intensities. The resulting model is also robust toward unobserved ASCN genotypes, which can be problematic for ordinary GMMs. We illustrated the utility of the model by applying it to commercial SNP array intensity data obtained from the Illumina HumanHap 610K platform. We retrieved more than 4,000 allele-specific CNPs, though 99% of them showed rather simple allele-specific CNP patterns with only a single aneuploid haplotype among the normal haplotypes. The genotyping accuracy was assessed by two approaches, quantitative PCR and replicated subjects. The results of both of these approaches demonstrated mean genotyping error rates of 1%. We demonstrated a preliminary genome-wide association study of three hematological traits. The result exhibited that it could form the foundation for new, more effective statistical methods for the mapping of both disease genes and quantitative trait loci with genome-wide CNPs. The methods described in this work are implemented in a software package, PlatinumCNV, available on the Internet.
KW - Allele-specific copy number
KW - Empirical Bayes estimation
KW - Genome-wide association study
KW - Oligonucleotide assay
KW - Quantitative PCR
UR - http://www.scopus.com/inward/record.url?scp=82455203887&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=82455203887&partnerID=8YFLogxK
U2 - 10.1002/gepi.20633
DO - 10.1002/gepi.20633
M3 - Article
C2 - 22125222
AN - SCOPUS:82455203887
SN - 0741-0395
VL - 35
SP - 831
EP - 844
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 8
ER -