TY - JOUR
T1 - Classification of glomerular pathological findings using deep learning and nephrologist–AI collective intelligence approach
AU - Uchino, Eiichiro
AU - Suzuki, Kanata
AU - Sato, Noriaki
AU - Kojima, Ryosuke
AU - Tamada, Yoshinori
AU - Hiragi, Shusuke
AU - Yokoi, Hideki
AU - Yugami, Nobuhiro
AU - Minamiguchi, Sachiko
AU - Haga, Hironori
AU - Yanagita, Motoko
AU - Okuno, Yasushi
N1 - Publisher Copyright:
© 2020
PY - 2020/9
Y1 - 2020/9
N2 - Background: Automated classification of glomerular pathological findings is potentially beneficial in establishing an efficient and objective diagnosis in renal pathology. While previous studies have verified the artificial intelligence (AI) models for the classification of global sclerosis and glomerular cell proliferation, there are several other glomerular pathological findings required for diagnosis, and the comprehensive models for the classification of these major findings have not yet been reported. Whether the cooperation between these AI models and clinicians improves diagnostic performance also remains unknown. Here, we developed AI models to classify glomerular images for major findings required for pathological diagnosis and investigated whether those models could improve the diagnostic performance of nephrologists. Methods: We used a dataset of 283 kidney biopsy cases comprising 15,888 glomerular images that were annotated by a total of 25 nephrologists. AI models to classify seven pathological findings: global sclerosis, segmental sclerosis, endocapillary proliferation, mesangial matrix accumulation, mesangial cell proliferation, crescent, and basement membrane structural changes, were constructed using deep learning by fine-tuning of InceptionV3 convolutional neural network. Subsequently, we compared the agreement to truth labels between majority decision among nephrologists with or without the AI model as a voter. Results: Our model for global sclerosis showed high performance (area under the curve: periodic acid-Schiff, 0.986; periodic acid methenamine silver, 0.983); the models for the other findings also showed performance close to those of nephrologists. By adding the AI model output to majority decision among nephrologists, out of the 14 constructed models, the results of the majority decision showed improvement in sensitivity for 10 models (four of them were statistically significant) and specificity for eight models (five significant). Conclusion: Our study showed a proof-of-concept for the classification of multiple glomerular findings in a comprehensive method of deep learning and suggested its potential effectiveness in improving diagnostic accuracy of clinicians.
AB - Background: Automated classification of glomerular pathological findings is potentially beneficial in establishing an efficient and objective diagnosis in renal pathology. While previous studies have verified the artificial intelligence (AI) models for the classification of global sclerosis and glomerular cell proliferation, there are several other glomerular pathological findings required for diagnosis, and the comprehensive models for the classification of these major findings have not yet been reported. Whether the cooperation between these AI models and clinicians improves diagnostic performance also remains unknown. Here, we developed AI models to classify glomerular images for major findings required for pathological diagnosis and investigated whether those models could improve the diagnostic performance of nephrologists. Methods: We used a dataset of 283 kidney biopsy cases comprising 15,888 glomerular images that were annotated by a total of 25 nephrologists. AI models to classify seven pathological findings: global sclerosis, segmental sclerosis, endocapillary proliferation, mesangial matrix accumulation, mesangial cell proliferation, crescent, and basement membrane structural changes, were constructed using deep learning by fine-tuning of InceptionV3 convolutional neural network. Subsequently, we compared the agreement to truth labels between majority decision among nephrologists with or without the AI model as a voter. Results: Our model for global sclerosis showed high performance (area under the curve: periodic acid-Schiff, 0.986; periodic acid methenamine silver, 0.983); the models for the other findings also showed performance close to those of nephrologists. By adding the AI model output to majority decision among nephrologists, out of the 14 constructed models, the results of the majority decision showed improvement in sensitivity for 10 models (four of them were statistically significant) and specificity for eight models (five significant). Conclusion: Our study showed a proof-of-concept for the classification of multiple glomerular findings in a comprehensive method of deep learning and suggested its potential effectiveness in improving diagnostic accuracy of clinicians.
KW - Artificial intelligence
KW - Collective intelligence
KW - Deep learning
KW - Renal pathology
UR - http://www.scopus.com/inward/record.url?scp=85087958098&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087958098&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2020.104231
DO - 10.1016/j.ijmedinf.2020.104231
M3 - Article
C2 - 32682317
AN - SCOPUS:85087958098
SN - 1386-5056
VL - 141
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 104231
ER -