Evaluation of the performance of generative artificial intelligence in generating radiology reports
Objective To evaluate the performance of two categories of generative artificial intelligence(AI)in generating abdominal radiology reports,and compare with the performance of radiologists.Methods The radiology reports of 300 patients who underwent abdominal CT scan and MRI in the Third Affiliated Hospital of Sun Yat-sen University from June 2023 to May 2024 were retrospectively studied.The generative AI models of ERNIE 4.0 and Claude 3.5 Sonnet were utilized to re-generate radiology reports of 300 patients.Five radiologists evaluated the comprehensiveness,accuracy,expressiveness,hallucinations,and acceptance without revision of the impressions using a five-point Likert scale.Friedman test and Nemenyi test were used to compare the performance between two models and radiologists.Results CT and MRI reports from 300 patients were evaluated.For comprehensiveness,Claude 3.5 Sonnet was on a par with human physicians,and both were superior to ERNIE 4.0(scores of 4.86±0.37 vs.4.76±0.46 vs.4.40±0.64;comparison between the first two,P=0.200,comparison between the first two and the third,both P<0.01).For accuracy,Radiologists outperformed both ERNIE 4.0 and Claude 3.5 Sonnet(scores of 4.96±0.22 vs.4.66±0.57 vs.4.69±0.57;comparison between the first and the latter two,both P<0.01).For acceptance without revision,Claude 3.5 Sonnet was on a par with human physicians,and both were superior to ERNIE 4.0(scores of 4.64±0.53 vs.4.69±0.54 vs.4.30±0.59;comparison between the first two,P=0.595,comparison between the first two and the third,both P<0.01).Expressiveness and hallucinations metrics showed minimal variations among the three(all P>0.05).Conclusions Claude 3.5 Sonnet yields comparable performance to radiologists in generating radiology reports,indicating that advanced generative AI has the potential to assist radiologists,improve the work efficiency and reduce cognitive burden.
Generative artificial intelligenceNatural language processingRadiology reportAbdomen