The Automatic BCLC Staging Model for Hepatocellular Carcinoma
Objective To develop an automated Barcelona clinic liver cancer(BCLC)staging system for hepatocellular carcinoma(HCC)based on big data platform.Methods The clinical data of HCC patients admitted to Mengchao Hepatobiliary Hospital of Fujian Medical University from January 2020 to December 2022 were collected.The standardized full-dimension dataset of patients(700 dimensions per case)was constructed by the ETL(extract-transform-load)tool.A total of 1 076 HCC patients admitted to Mengchao Hepatobiliary Hospital of Fujian Medical University from January 2020 to December 2022 were selected.According to the 2016 BCLC staging standard,12 related dimensions including hepatic encephalopathy,ascites,total bilirubin,albumin,prothrombin time,tumor number,tumor diameter,portal vein tumor thrombus,extrahepatic metastasis and patient performance were extracted from the data set.Such as natural language processing based on machine learning and XGBoost(eXtreme gradient boosting)module based on Python language were used to construct an automated BCLC staging model.A total of 191 HCC patients from January 2020 to December 2022 were randomly selected for previous case testing.A total of 180 HCC patients from January 2020 to December 2022 were selected for new case testing.Two attending hepatobiliary surgeons manually reviewed the staging of the test cases,and standard staging was obtained for correction.The accuracy and practicability of the model,the differences among the automatic staging,case record staging and standard staging were compared.Results The automated BCLC staging model of HCC was successfully constructed based on the big data methodology.The accuracy of the model was 93.33%in the validation set of 150 cases,indicating that the model was successfully established.The test results of previous cases showed that the accuracy of automated staging was 98.43%after the correction of standard staging,and 3 cases were wrong,including 1 case of stage 0 and 2 cases of stage A.The accuracy rate of staging was 96.33%,and 7 cases were wrong,including 2 cases of stage 0 and 5 cases of stage A.The test results of new cases showed that the accuracy of automated staging was 95.56%after the correction of standard staging,and 8 cases were wrong,including 1 case of stage 0,1 case of stage A,4 cases of stage B,2 cases of stage C,and 0 case of stage D.The accuracy rate of staging was 96.11%,and 7 cases were wrong,including 2 cases of stage 0,1 case of stage A,2 cases of stage B,2 cases of stage C,and 0 case of stage D.Conclusion The automated BCLC staging system for HCC is efficient and accurate.There is still room for improvement in data standardization,which is worthy of clinical promotion.
hepatocellular carcinomaBCLC stagingbig dataETL toolsmachine learningnatural language processingXGBOOST