TY - JOUR
T1 - Comparison among random forest, logistic regression, and existing clinical risk scores for predicting outcomes in patients with atrial fibrillation
T2 - A report from the J-RHYTHM registry
AU - Watanabe, Eiichi
AU - Noyama, Shunsuke
AU - Kiyono, Ken
AU - Inoue, Hiroshi
AU - Atarashi, Hirotsugu
AU - Okumura, Ken
AU - Yamashita, Takeshi
AU - Lip, Gregory Y.H.
AU - Kodani, Eitaro
AU - Origasa, Hideki
N1 - Publisher Copyright:
© 2021 The Authors. Clinical Cardiology published by Wiley Periodicals LLC.
PY - 2021/9
Y1 - 2021/9
N2 - Background: Machine learning (ML) has emerged as a promising tool for risk stratification. However, few studies have applied ML to risk assessment of patients with atrial fibrillation (AF). Hypothesis: We aimed to compare the performance of random forest (RF), logistic regression (LR), and conventional risk schemes in predicting the outcomes of AF. Methods: We analyzed data from 7406 nonvalvular AF patients (median age 71 years, female 29.2%) enrolled in a nationwide AF registry (J-RHYTHM Registry) and who were followed for 2 years. The endpoints were thromboembolisms, major bleeding, and all-cause mortality. Models were generated from potential predictors using an RF model, stepwise LR model, and the thromboembolism (CHADS2 and CHA2DS2-VASc) and major bleeding (HAS-BLED, ORBIT, and ATRIA) scores. Results: For thromboembolisms, the C-statistic of the RF model was significantly higher than that of the LR model (0.66 vs. 0.59, p =.03) or CHA2DS2-VASc score (0.61, p <.01). For major bleeding, the C-statistic of RF was comparable to the LR (0.69 vs. 0.66, p =.07) and outperformed the HAS-BLED (0.61, p <.01) and ATRIA (0.62, p <.01) but not the ORBIT (0.67, p =.07). The C-statistic of RF for all-cause mortality was comparable to the LR (0.78 vs. 0.79, p =.21). The calibration plot for the RF model was more aligned with the observed events for major bleeding and all-cause mortality. Conclusions: The RF model performed as well as or better than the LR model or existing clinical risk scores for predicting clinical outcomes of AF.
AB - Background: Machine learning (ML) has emerged as a promising tool for risk stratification. However, few studies have applied ML to risk assessment of patients with atrial fibrillation (AF). Hypothesis: We aimed to compare the performance of random forest (RF), logistic regression (LR), and conventional risk schemes in predicting the outcomes of AF. Methods: We analyzed data from 7406 nonvalvular AF patients (median age 71 years, female 29.2%) enrolled in a nationwide AF registry (J-RHYTHM Registry) and who were followed for 2 years. The endpoints were thromboembolisms, major bleeding, and all-cause mortality. Models were generated from potential predictors using an RF model, stepwise LR model, and the thromboembolism (CHADS2 and CHA2DS2-VASc) and major bleeding (HAS-BLED, ORBIT, and ATRIA) scores. Results: For thromboembolisms, the C-statistic of the RF model was significantly higher than that of the LR model (0.66 vs. 0.59, p =.03) or CHA2DS2-VASc score (0.61, p <.01). For major bleeding, the C-statistic of RF was comparable to the LR (0.69 vs. 0.66, p =.07) and outperformed the HAS-BLED (0.61, p <.01) and ATRIA (0.62, p <.01) but not the ORBIT (0.67, p =.07). The C-statistic of RF for all-cause mortality was comparable to the LR (0.78 vs. 0.79, p =.21). The calibration plot for the RF model was more aligned with the observed events for major bleeding and all-cause mortality. Conclusions: The RF model performed as well as or better than the LR model or existing clinical risk scores for predicting clinical outcomes of AF.
UR - http://www.scopus.com/inward/record.url?scp=85111398093&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111398093&partnerID=8YFLogxK
U2 - 10.1002/clc.23688
DO - 10.1002/clc.23688
M3 - Article
C2 - 34318510
AN - SCOPUS:85111398093
SN - 0160-9289
VL - 44
SP - 1305
EP - 1315
JO - Clinical Cardiology
JF - Clinical Cardiology
IS - 9
ER -