Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data

Toshihiro Kishikawa, Yukihide Momozawa, Takeshi Ozeki, Taisei Mushiroda, Hidenori Inohara, Yoichiro Kamatani, Michiaki Kubo, Yukinori Okada

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

In the design of whole-genome sequencing (WGS) studies, sequencing depth is a crucial parameter to define variant calling accuracy and study cost, with no standard recommendations having been established. We empirically evaluated the variant calling accuracy of the WGS pipeline using ultra-deep WGS data (approximately 410×). We randomly sampled sequence reads and constructed a series of simulation WGS datasets with a variety of gradual depths (n = 54; from 0.05× to 410×). Next, we evaluated the genotype concordances of the WGS data with those in the SNP microarray data or the WGS data using all the sequence reads. In addition, we assessed the accuracy of HLA allele genotyping using the WGS data with multiple software tools (PHLAT, HLA-VBseq, HLA-HD, and SNP2HLA). The WGS data with higher depths showed higher concordance rates, and >13.7× depth achieved as high as >99% of concordance. Comparisons with the WGS data using all the sequence reads showed that SNVs achieved >95% of concordance at 17.6× depth, whereas indels showed only 60% concordance. For the accuracy of HLA allele genotyping using the WGS data, 13.7× depth showed sufficient accuracy while performance heterogeneity among the software tools was observed (the highest concordance of 96.9% was observed with HLA-HD). Improvement in HLA genotyping accuracy by further increasing the depths was limited. These results suggest a medium degree of the WGS depth setting (approximately 15×) to achieve both accurate SNV calling and cost-effectiveness, whereas relatively higher depths are required for accurate indel calling.

Original languageEnglish
Article number1784
JournalScientific reports
Volume9
Issue number1
DOIs
Publication statusPublished - 01-12-2019

Fingerprint

Genome
Software
Alleles
Cost-Benefit Analysis
Single Nucleotide Polymorphism
Genotype
Costs and Cost Analysis

All Science Journal Classification (ASJC) codes

  • General

Cite this

Kishikawa, T., Momozawa, Y., Ozeki, T., Mushiroda, T., Inohara, H., Kamatani, Y., ... Okada, Y. (2019). Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data. Scientific reports, 9(1), [1784]. https://doi.org/10.1038/s41598-018-38346-0
Kishikawa, Toshihiro ; Momozawa, Yukihide ; Ozeki, Takeshi ; Mushiroda, Taisei ; Inohara, Hidenori ; Kamatani, Yoichiro ; Kubo, Michiaki ; Okada, Yukinori. / Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data. In: Scientific reports. 2019 ; Vol. 9, No. 1.
@article{ce36c2af7b864305bf4e8365d8d4e03c,
title = "Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data",
abstract = "In the design of whole-genome sequencing (WGS) studies, sequencing depth is a crucial parameter to define variant calling accuracy and study cost, with no standard recommendations having been established. We empirically evaluated the variant calling accuracy of the WGS pipeline using ultra-deep WGS data (approximately 410×). We randomly sampled sequence reads and constructed a series of simulation WGS datasets with a variety of gradual depths (n = 54; from 0.05× to 410×). Next, we evaluated the genotype concordances of the WGS data with those in the SNP microarray data or the WGS data using all the sequence reads. In addition, we assessed the accuracy of HLA allele genotyping using the WGS data with multiple software tools (PHLAT, HLA-VBseq, HLA-HD, and SNP2HLA). The WGS data with higher depths showed higher concordance rates, and >13.7× depth achieved as high as >99{\%} of concordance. Comparisons with the WGS data using all the sequence reads showed that SNVs achieved >95{\%} of concordance at 17.6× depth, whereas indels showed only 60{\%} concordance. For the accuracy of HLA allele genotyping using the WGS data, 13.7× depth showed sufficient accuracy while performance heterogeneity among the software tools was observed (the highest concordance of 96.9{\%} was observed with HLA-HD). Improvement in HLA genotyping accuracy by further increasing the depths was limited. These results suggest a medium degree of the WGS depth setting (approximately 15×) to achieve both accurate SNV calling and cost-effectiveness, whereas relatively higher depths are required for accurate indel calling.",
author = "Toshihiro Kishikawa and Yukihide Momozawa and Takeshi Ozeki and Taisei Mushiroda and Hidenori Inohara and Yoichiro Kamatani and Michiaki Kubo and Yukinori Okada",
year = "2019",
month = "12",
day = "1",
doi = "10.1038/s41598-018-38346-0",
language = "English",
volume = "9",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

Kishikawa, T, Momozawa, Y, Ozeki, T, Mushiroda, T, Inohara, H, Kamatani, Y, Kubo, M & Okada, Y 2019, 'Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data', Scientific reports, vol. 9, no. 1, 1784. https://doi.org/10.1038/s41598-018-38346-0

Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data. / Kishikawa, Toshihiro; Momozawa, Yukihide; Ozeki, Takeshi; Mushiroda, Taisei; Inohara, Hidenori; Kamatani, Yoichiro; Kubo, Michiaki; Okada, Yukinori.

In: Scientific reports, Vol. 9, No. 1, 1784, 01.12.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data

AU - Kishikawa, Toshihiro

AU - Momozawa, Yukihide

AU - Ozeki, Takeshi

AU - Mushiroda, Taisei

AU - Inohara, Hidenori

AU - Kamatani, Yoichiro

AU - Kubo, Michiaki

AU - Okada, Yukinori

PY - 2019/12/1

Y1 - 2019/12/1

N2 - In the design of whole-genome sequencing (WGS) studies, sequencing depth is a crucial parameter to define variant calling accuracy and study cost, with no standard recommendations having been established. We empirically evaluated the variant calling accuracy of the WGS pipeline using ultra-deep WGS data (approximately 410×). We randomly sampled sequence reads and constructed a series of simulation WGS datasets with a variety of gradual depths (n = 54; from 0.05× to 410×). Next, we evaluated the genotype concordances of the WGS data with those in the SNP microarray data or the WGS data using all the sequence reads. In addition, we assessed the accuracy of HLA allele genotyping using the WGS data with multiple software tools (PHLAT, HLA-VBseq, HLA-HD, and SNP2HLA). The WGS data with higher depths showed higher concordance rates, and >13.7× depth achieved as high as >99% of concordance. Comparisons with the WGS data using all the sequence reads showed that SNVs achieved >95% of concordance at 17.6× depth, whereas indels showed only 60% concordance. For the accuracy of HLA allele genotyping using the WGS data, 13.7× depth showed sufficient accuracy while performance heterogeneity among the software tools was observed (the highest concordance of 96.9% was observed with HLA-HD). Improvement in HLA genotyping accuracy by further increasing the depths was limited. These results suggest a medium degree of the WGS depth setting (approximately 15×) to achieve both accurate SNV calling and cost-effectiveness, whereas relatively higher depths are required for accurate indel calling.

AB - In the design of whole-genome sequencing (WGS) studies, sequencing depth is a crucial parameter to define variant calling accuracy and study cost, with no standard recommendations having been established. We empirically evaluated the variant calling accuracy of the WGS pipeline using ultra-deep WGS data (approximately 410×). We randomly sampled sequence reads and constructed a series of simulation WGS datasets with a variety of gradual depths (n = 54; from 0.05× to 410×). Next, we evaluated the genotype concordances of the WGS data with those in the SNP microarray data or the WGS data using all the sequence reads. In addition, we assessed the accuracy of HLA allele genotyping using the WGS data with multiple software tools (PHLAT, HLA-VBseq, HLA-HD, and SNP2HLA). The WGS data with higher depths showed higher concordance rates, and >13.7× depth achieved as high as >99% of concordance. Comparisons with the WGS data using all the sequence reads showed that SNVs achieved >95% of concordance at 17.6× depth, whereas indels showed only 60% concordance. For the accuracy of HLA allele genotyping using the WGS data, 13.7× depth showed sufficient accuracy while performance heterogeneity among the software tools was observed (the highest concordance of 96.9% was observed with HLA-HD). Improvement in HLA genotyping accuracy by further increasing the depths was limited. These results suggest a medium degree of the WGS depth setting (approximately 15×) to achieve both accurate SNV calling and cost-effectiveness, whereas relatively higher depths are required for accurate indel calling.

UR - http://www.scopus.com/inward/record.url?scp=85061242163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061242163&partnerID=8YFLogxK

U2 - 10.1038/s41598-018-38346-0

DO - 10.1038/s41598-018-38346-0

M3 - Article

C2 - 30741997

AN - SCOPUS:85061242163

VL - 9

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 1784

ER -

Kishikawa T, Momozawa Y, Ozeki T, Mushiroda T, Inohara H, Kamatani Y et al. Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data. Scientific reports. 2019 Dec 1;9(1). 1784. https://doi.org/10.1038/s41598-018-38346-0