TY - JOUR
T1 - MicroSEC filters sequence errors for formalin-fixed and paraffin-embedded samples
AU - Ikegami, Masachika
AU - Kohsaka, Shinji
AU - Hirose, Takeshi
AU - Ueno, Toshihide
AU - Inoue, Satoshi
AU - Kanomata, Naoki
AU - Yamauchi, Hideko
AU - Mori, Taisuke
AU - Sekine, Shigeki
AU - Inamoto, Yoshihiro
AU - Yatabe, Yasushi
AU - Kobayashi, Hiroshi
AU - Tanaka, Sakae
AU - Mano, Hiroyuki
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - The clinical sequencing of tumors is usually performed on formalin-fixed, paraffin-embedded samples and results in many sequencing errors. We identified that most of these errors are detected in chimeric reads caused by single-strand DNA molecules with microhomology. During the end-repair step of library preparation, mutations are introduced by the mis-annealing of two single-strand DNA molecules comprising homologous sequences. The mutated bases are distributed unevenly near the ends in the individual reads. Our filtering pipeline, MicroSEC, focuses on the uneven distribution of mutations in each read and removes the sequencing errors in formalin-fixed, paraffin-embedded samples without over-eliminating the mutations detected also in fresh frozen samples. Amplicon-based sequencing using 97 mutations confirmed that the sensitivity and specificity of MicroSEC were 97% (95% confidence interval: 82–100%) and 96% (95% confidence interval: 88–99%), respectively. Our pipeline will increase the reliability of the clinical sequencing and advance the cancer research using formalin-fixed, paraffin-embedded samples.
AB - The clinical sequencing of tumors is usually performed on formalin-fixed, paraffin-embedded samples and results in many sequencing errors. We identified that most of these errors are detected in chimeric reads caused by single-strand DNA molecules with microhomology. During the end-repair step of library preparation, mutations are introduced by the mis-annealing of two single-strand DNA molecules comprising homologous sequences. The mutated bases are distributed unevenly near the ends in the individual reads. Our filtering pipeline, MicroSEC, focuses on the uneven distribution of mutations in each read and removes the sequencing errors in formalin-fixed, paraffin-embedded samples without over-eliminating the mutations detected also in fresh frozen samples. Amplicon-based sequencing using 97 mutations confirmed that the sensitivity and specificity of MicroSEC were 97% (95% confidence interval: 82–100%) and 96% (95% confidence interval: 88–99%), respectively. Our pipeline will increase the reliability of the clinical sequencing and advance the cancer research using formalin-fixed, paraffin-embedded samples.
UR - http://www.scopus.com/inward/record.url?scp=85121398736&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121398736&partnerID=8YFLogxK
U2 - 10.1038/s42003-021-02930-4
DO - 10.1038/s42003-021-02930-4
M3 - Article
C2 - 34912045
AN - SCOPUS:85121398736
SN - 2399-3642
VL - 4
JO - Communications biology
JF - Communications biology
IS - 1
M1 - 1396
ER -