TY - JOUR
T1 - Assessing knowledge about medical physics in language-generative AI with large language model
T2 - using the medical physicist exam
AU - Kadoya, Noriyuki
AU - Arai, Kazuhiro
AU - Tanaka, Shohei
AU - Kimura, Yuto
AU - Tozuka, Ryota
AU - Yasui, Keisuke
AU - Hayashi, Naoki
AU - Katsuta, Yoshiyuki
AU - Takahashi, Haruna
AU - Inoue, Koki
AU - Jingu, Keiichi
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Japanese Society of Radiological Technology and Japan Society of Medical Physics 2024.
PY - 2024/12
Y1 - 2024/12
N2 - This study aimed to evaluate the performance for answering the Japanese medical physicist examination and providing the benchmark of knowledge about medical physics in language-generative AI with large language model. We used questions from Japan’s 2018, 2019, 2020, 2021 and 2022 medical physicist board examinations, which covered various question types, including multiple-choice questions, and mainly focused on general medicine and medical physics. ChatGPT-3.5 and ChatGPT-4.0 (OpenAI) were used. We compared the AI-based answers with the correct ones. The average accuracy rates were 42.2 ± 2.5% (ChatGPT-3.5) and 72.7 ± 2.6% (ChatGPT-4), showing that ChatGPT-4 was more accurate than ChatGPT-3.5 [all categories (except for radiation-related laws and recommendations/medical ethics): p value < 0.05]. Even with the ChatGPT model with higher accuracy, the accuracy rates were less than 60% in two categories; radiation metrology (55.6%), and radiation-related laws and recommendations/medical ethics (40.0%). These data provide the benchmark for knowledge about medical physics in ChatGPT and can be utilized as basic data for the development of various medical physics tools using ChatGPT (e.g., radiation therapy support tools with Japanese input).
AB - This study aimed to evaluate the performance for answering the Japanese medical physicist examination and providing the benchmark of knowledge about medical physics in language-generative AI with large language model. We used questions from Japan’s 2018, 2019, 2020, 2021 and 2022 medical physicist board examinations, which covered various question types, including multiple-choice questions, and mainly focused on general medicine and medical physics. ChatGPT-3.5 and ChatGPT-4.0 (OpenAI) were used. We compared the AI-based answers with the correct ones. The average accuracy rates were 42.2 ± 2.5% (ChatGPT-3.5) and 72.7 ± 2.6% (ChatGPT-4), showing that ChatGPT-4 was more accurate than ChatGPT-3.5 [all categories (except for radiation-related laws and recommendations/medical ethics): p value < 0.05]. Even with the ChatGPT model with higher accuracy, the accuracy rates were less than 60% in two categories; radiation metrology (55.6%), and radiation-related laws and recommendations/medical ethics (40.0%). These data provide the benchmark for knowledge about medical physics in ChatGPT and can be utilized as basic data for the development of various medical physics tools using ChatGPT (e.g., radiation therapy support tools with Japanese input).
KW - Artificial intelligence
KW - ChatGPT
KW - Examination
KW - Medical physicist
KW - Radiotherapy
UR - http://www.scopus.com/inward/record.url?scp=85203448346&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203448346&partnerID=8YFLogxK
U2 - 10.1007/s12194-024-00838-2
DO - 10.1007/s12194-024-00838-2
M3 - Article
C2 - 39254919
AN - SCOPUS:85203448346
SN - 1865-0333
VL - 17
SP - 929
EP - 937
JO - Radiological Physics and Technology
JF - Radiological Physics and Technology
IS - 4
ER -