Objective To propose a novel network architecture,VUMix-Net,which combines three-di-mensional(3D)V-Net and two-dimensional(2D)U-Net,for automatically delineating the clinical target volume(CTV)of esophageal cancer,and evaluate its segmentation performance and application value.Methods CT image data from 55 patients with upper esophageal cancer were collected,their CTVs were u-niformly delineated by a physician with 10 years of clinical experience in radiotherapy.3D V-Net and VUMix-Net were used to segment the CTVs simultaneously.The Dice similarity coefficient(DSC)and the 95th percentile Hausdorff distance(HD)are used as quantitative indicators to evaluate the automatic segm-entation performance of these two models.CT image data of 10 esophageal cancer radiotherapy patients who had undergone manual delineation were selected randomly,and the VUMix Net network model were applied to generate corresponding automatic delinea-tion results.These results were then clinically evalu-ated by 2 chief physicians under blinded conditions.The differences in scores between the artificial intel-ligence(AI)group and the ground truth(GT)group were compared,and the consistency of the evaluations from two physicians were analyzed.Results In the automatic delineation of CTV,the 3D-DSC and 95HD values of VUMix Net were significantly better than those of V-Net,and the differences were statistically sig-nificant,with P values of 0.006 and less than 0.001,respectively.The evaluation results of the two physi-cians showed that the CTV delineated by the VUMix Net model conforms to the clinical application situation.The differences between the AI and GT groups were not statistically significant for either physician A or physi-cian B,with P values of 0.222 and 0.361,respectively.And there was no significant difference between physi-cian A and physician B for either the AI or GT groups,with P values of 0.638 and 0.761,respectively.Conclusion The new mode(VUMix-Net)convolutional neural network showed certain advantages in auto-matic CTV delineating,and could automatically delineate CTV of esophageal cancer that met the clinical re-quirements and was comparable to the quality of manual delineating.