Background: Machine learning (ML) has emerged as a promising tool for risk stratification. However, few studies have applied ML to risk assessment of patients with atrial fibrillation (AF). Hypothesis: We aimed to compare the performance of random forest (RF), logistic regression (LR), and conventional risk schemes in predicting the outcomes of AF. Methods: We analyzed data from 7406 nonvalvular AF patients (median age 71 years, female 29.2%) enrolled in a nationwide AF registry (J-RHYTHM Registry) and who were followed for 2 years. The endpoints were thromboembolisms, major bleeding, and all-cause mortality. Models were generated from potential predictors using an RF model, stepwise LR model, and the thromboembolism (CHADS2 and CHA2DS2-VASc) and major bleeding (HAS-BLED, ORBIT, and ATRIA) scores. Results: For thromboembolisms, the C-statistic of the RF model was significantly higher than that of the LR model (0.66 vs. 0.59, p =.03) or CHA2DS2-VASc score (0.61, p <.01). For major bleeding, the C-statistic of RF was comparable to the LR (0.69 vs. 0.66, p =.07) and outperformed the HAS-BLED (0.61, p <.01) and ATRIA (0.62, p <.01) but not the ORBIT (0.67, p =.07). The C-statistic of RF for all-cause mortality was comparable to the LR (0.78 vs. 0.79, p =.21). The calibration plot for the RF model was more aligned with the observed events for major bleeding and all-cause mortality. Conclusions: The RF model performed as well as or better than the LR model or existing clinical risk scores for predicting clinical outcomes of AF.
All Science Journal Classification (ASJC) codes
- Cardiology and Cardiovascular Medicine