Construction and Verification of Differential Diagnosis Model of Mycobacterium Avium-Intracellular Complex Group Lung Disease and Primary Pulmonary Tuberculosis Based on CT Features and Machine Learning
Purpose To construct and validate a machine learning-based diagnostic model for distinguishing between Mycobacterium avium-intracellular complex pulmonary disease(MAC-PD)and pulmonary tuberculosis(PTB)via chest CT images.Materials and Methods Retrospective data from patients diagnosed with MAC-PD and PTB between May 2021 and August 2022 at Beijing Chest Hospital,Capital Medical University,which were collected as the training set.The prospective external validation set was obtained from patients at the First Affiliated Hospital of Henan University of Chinese Medicine between September 2022 and May 2023.Clinical and radiological data were analyzed,and multivariable logistic regression,random forest and support vector machine(SVM)models were established and externally validated using the validation set.The diagnostic performance of models were evaluated using receiver operating characteristic curve and precision-recall curve,and the differences of the areas under the curve of various models were compared via the Delong test.Results There were significant differences in age and hemoptysis rate between the two groups(t=30.414,P<0.001;χ2=6.186,P=0.013).There were statistically significant differences in cavity types and morphology between the two groups(χ2=6.546,P=0.011;χ2=24.113,P<0.001),but there was no significant difference in the distribution and characteristics of cavitary lesions(P>0.05).There were significant differences in the types and distribution of bronchiectasis between the two groups(χ2=4.634,P=0.031;χ2=23.145,P<0.001).Compared with logistic regression and random forest models,the SVM model had better differential diagnostic performance,and the area under the receiver operating characteristic curve,sensitivity,specificity,accuracy,positive predictive value and negative predictive value were 0.960(95%CI 0.935-0.985),85.7%,93.6%,90.5%,93.3%,88.0%and 0.885(95%CI 0.803-0.967),respectively,76.7%,80.0%,78.3%,79.3%,77.4%.The precision-recall curve showed that the SVM model had high precision and low recall,that was,the model performs well.Conclusion The machine learning-based models exhibits excellent diagnostic performance and can assist in differentiating MAC-PD and PTB.