Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing

Shunichi Kosugi, Yukihide Momozawa, Xiaoxi Liu, Chikashi Terao, Michiaki Kubo, Yoichiro Kamatani

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. Results: We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. Conclusion: These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.

Original languageEnglish
Article number117
JournalGenome Biology
Volume20
Issue number1
DOIs
Publication statusPublished - 03-06-2019

Fingerprint

genome
Genome
range size
detection
evaluation
human diseases
genotype
Genotype
gene
Genes

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Cell Biology

Cite this

Kosugi, Shunichi ; Momozawa, Yukihide ; Liu, Xiaoxi ; Terao, Chikashi ; Kubo, Michiaki ; Kamatani, Yoichiro. / Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. In: Genome Biology. 2019 ; Vol. 20, No. 1.
@article{b8d82891d4344379ac89060b2ebdc435,
title = "Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing",
abstract = "Background: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. Results: We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. Conclusion: These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.",
author = "Shunichi Kosugi and Yukihide Momozawa and Xiaoxi Liu and Chikashi Terao and Michiaki Kubo and Yoichiro Kamatani",
year = "2019",
month = "6",
day = "3",
doi = "10.1186/s13059-019-1720-5",
language = "English",
volume = "20",
journal = "Genome Biology",
issn = "1474-7596",
publisher = "BioMed Central",
number = "1",

}

Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. / Kosugi, Shunichi; Momozawa, Yukihide; Liu, Xiaoxi; Terao, Chikashi; Kubo, Michiaki; Kamatani, Yoichiro.

In: Genome Biology, Vol. 20, No. 1, 117, 03.06.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing

AU - Kosugi, Shunichi

AU - Momozawa, Yukihide

AU - Liu, Xiaoxi

AU - Terao, Chikashi

AU - Kubo, Michiaki

AU - Kamatani, Yoichiro

PY - 2019/6/3

Y1 - 2019/6/3

N2 - Background: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. Results: We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. Conclusion: These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.

AB - Background: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. Results: We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. Conclusion: These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.

UR - http://www.scopus.com/inward/record.url?scp=85066823681&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066823681&partnerID=8YFLogxK

U2 - 10.1186/s13059-019-1720-5

DO - 10.1186/s13059-019-1720-5

M3 - Article

VL - 20

JO - Genome Biology

JF - Genome Biology

SN - 1474-7596

IS - 1

M1 - 117

ER -