Machine learning study of characteristic variables and predictive models for patients with H.pylori infection
Objective To analyze the risk factors of infection in patients with H.pylori infection,establish a predic-tion model for patients with H.pylori infection,and provide reference for the prevention and treatment of H.pylori infec-tion.Methods A total of 1 477 patients tested for H.pylori in Zhongshan Hospital of Traditional Chinese Medicine,Dongfeng People's Hospital of Zhongshan and Zhongshan South District Hospital from Jul.2021 to May 2022 were se-lected as study subjects.The results of gastroscopy,14 C and 13 C breath tests were used to divide the H.pylori tested pop-ulation into infected and non-infected groups,and questionnaires were administered to investigate the contents of the sur-vey,including the basic a total of 63 variables were included in the survey,including the basic conditions,clinical fea-tures,chronic underlying diseases,life and dietary habits.The results were subjected to multifactorial analysis of H.pylori infection using single factor and Logistic regression in machine learning,decision tree analysis,and Logistic regression with added interaction terms,and the area under the ROC curve,sensitivity and specificity of the 3 models were com-pared to verify the accuracy of the models,and a prediction model of H.pylori infection was established,and the charac-teristics and risk factors were established as a forest plot.Results The AUC of Logistic regression was 0.7361,sensi-tivity was 0.7615,and specificity was 0.6034.The AUC of decision tree was 0.6528,sensitivity was 0.6801,and specificity was 0.5773.The AUC of Logistic with the addition of interaction term was 0.7388,sensitivity was 0.7588,and specificity was 0.6034.The multifactorial Logistic regression with the addition of interaction terms showed that hav-ing stomach bloating,bad breath and halitasis,cooking lunch at home,not having it at home but having the habit of using communal chopsticks when going out,having infected family members living together use public chopsticks only after the epidemic,lived in 4 to 10 floors,and also had stomach bloating and both bad breath and halitosis.Conclusion Stomach bloating,having bad breath and halitasis,having both stomach bloating and bad breath and halitasis,cooking lunch at home,number of floors lived in,whether to use public chopsticks at home,whether to have the habit of using public chopsticks,whether to have family members infected with H.pylori are the characteristic factors of H.pylori infec-tion,and Logistic regression model was used as the main model for variable screening.The AUC area of the model after adding the interaction is improved,and the prediction model with the interaction term has good predictive ability for H.pylori infected patients,easily to calculate,economical and convenient to use,and suitable for regional extension.
Helicobacter pyloriBinary Logistic regression modelDecision treeForest plotInteraction term