TY - JOUR
T1 - Dataset dependency of low-density lipoprotein-cholesterol estimation by machine learning
AU - Hidekazu, Ishida
AU - Nagasawa, Hiroki
AU - Yamamoto, Yasuko
AU - Doi, Hiroki
AU - Saito, Midori
AU - Ishihara, Yuya
AU - Fujita, Takashi
AU - Ishida, Mariko
AU - Kato, Yohei
AU - Kikuchi, Ryosuke
AU - Matsunami, Hidetoshi
AU - Takemura, Masao
AU - Ito, Hiroyasu
AU - Saito, Kuniaki
N1 - Publisher Copyright:
© The Author(s) 2023.
PY - 2023/11
Y1 - 2023/11
N2 - Objectives: We evaluated the applicability of a machine learning–based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets. Methods: Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (N = 2664), clinical patients at Gifu University Hospital (N = 7409), and clinical patients at Fujita Health University Hospital (N = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method. Results: The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification. Conclusion: Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.
AB - Objectives: We evaluated the applicability of a machine learning–based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets. Methods: Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (N = 2664), clinical patients at Gifu University Hospital (N = 7409), and clinical patients at Fujita Health University Hospital (N = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method. Results: The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification. Conclusion: Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.
KW - Friedewald formula
KW - Low-density lipoprotein-cholesterol
KW - Martin method
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85163048095&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163048095&partnerID=8YFLogxK
U2 - 10.1177/00045632231180408
DO - 10.1177/00045632231180408
M3 - Article
C2 - 37218090
AN - SCOPUS:85163048095
SN - 0004-5632
VL - 60
SP - 396
EP - 405
JO - Annals of Clinical Biochemistry
JF - Annals of Clinical Biochemistry
IS - 6
ER -