TY - JOUR
T1 - Interrater agreement and variability in visual reading of [18F] flutemetamol PET images
AU - BATON Study Group
AU - Takenaka, Akinori
AU - Nihashi, Takashi
AU - Sakurai, Keita
AU - Notomi, Keiji
AU - Ono, Hokuto
AU - Inui, Yoshitaka
AU - Ito, Shinji
AU - Arahata, Yutaka
AU - Takeda, Akinori
AU - Ishii, Kazunari
AU - Ishii, Kenji
AU - Ito, Kengo
AU - Toyama, Hiroshi
AU - Nakamura, Akinori
AU - Kato, Takashi
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2025/1
Y1 - 2025/1
N2 - Objective: The purpose of this study was to validate the concordance of visual ratings of [18F] flutemetamol amyloid positron emission tomography (PET) images and to investigate the correlation between the agreement of each rater and the Centiloid (CL) scale. Methods: A total of 192 participants, clinically classified as cognitively normal (CN) (n = 59), mild cognitive impairment (MCI) (n = 65), Alzheimer’s disease (AD) (n = 55), or non-AD dementia (n = 13), participated in this study. Three experts conducted visual ratings of the amyloid PET images for all 192 patients, assigning a confidence level to each rating on a three-point scale (certain, probable, or neither). The positive or negative determination of amyloid PET results was made by majority vote. The CL value was calculated using the CapAIBL pipeline. Results: Overall, 101 images were determined to be positive, and 91 images were negative. Of the 101 positive images, the three raters were in complete agreement for 92 images and in disagreement for 9 images. Of the 91 negative images, the three raters were in complete agreement for 75 images and in disagreement for 16 images. Interrater reliability among the three experts was particularly high, with both Fleiss’ kappa and Conger’s kappa measuring 0.83 (0.76–0.89). The CL values of the unanimous positive group were significantly greater than those of the other groups, whereas the CL values of the unanimous negative group were significantly lower than those of the other groups. Images with rater disagreement had intermediate CLs. In cases with a high confidence level, the positive or negative visual ratings were in almost complete agreement. However, as confidence levels decreased, experts’ visual ratings became more variable. The lower the confidence level was, the greater the number of cases with disagreement in the visual ratings. Conclusion: Three experts independently rated 192 amyloid PET images, achieving a high level of interrater agreement. However, in patients with intermediate amyloid accumulation, visual ratings varied. Therefore, determining positive and negative decisions in these patients should be performed with caution.
AB - Objective: The purpose of this study was to validate the concordance of visual ratings of [18F] flutemetamol amyloid positron emission tomography (PET) images and to investigate the correlation between the agreement of each rater and the Centiloid (CL) scale. Methods: A total of 192 participants, clinically classified as cognitively normal (CN) (n = 59), mild cognitive impairment (MCI) (n = 65), Alzheimer’s disease (AD) (n = 55), or non-AD dementia (n = 13), participated in this study. Three experts conducted visual ratings of the amyloid PET images for all 192 patients, assigning a confidence level to each rating on a three-point scale (certain, probable, or neither). The positive or negative determination of amyloid PET results was made by majority vote. The CL value was calculated using the CapAIBL pipeline. Results: Overall, 101 images were determined to be positive, and 91 images were negative. Of the 101 positive images, the three raters were in complete agreement for 92 images and in disagreement for 9 images. Of the 91 negative images, the three raters were in complete agreement for 75 images and in disagreement for 16 images. Interrater reliability among the three experts was particularly high, with both Fleiss’ kappa and Conger’s kappa measuring 0.83 (0.76–0.89). The CL values of the unanimous positive group were significantly greater than those of the other groups, whereas the CL values of the unanimous negative group were significantly lower than those of the other groups. Images with rater disagreement had intermediate CLs. In cases with a high confidence level, the positive or negative visual ratings were in almost complete agreement. However, as confidence levels decreased, experts’ visual ratings became more variable. The lower the confidence level was, the greater the number of cases with disagreement in the visual ratings. Conclusion: Three experts independently rated 192 amyloid PET images, achieving a high level of interrater agreement. However, in patients with intermediate amyloid accumulation, visual ratings varied. Therefore, determining positive and negative decisions in these patients should be performed with caution.
KW - Amyloid
KW - Centiloid scale
KW - Positron emission tomography
KW - Visual rating
UR - http://www.scopus.com/inward/record.url?scp=85204770100&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204770100&partnerID=8YFLogxK
U2 - 10.1007/s12149-024-01977-7
DO - 10.1007/s12149-024-01977-7
M3 - Article
C2 - 39316332
AN - SCOPUS:85204770100
SN - 0914-7187
VL - 39
SP - 68
EP - 76
JO - Annals of Nuclear Medicine
JF - Annals of Nuclear Medicine
IS - 1
M1 - 103949
ER -