Aiming at the problem of significant uncertainty in the quality inspection of carburized gears caused by manual observation of microstructure tissue images,the STN-Mask-RCNN model based on Mask-RCNN to segment residual austenite and martensite in carburized gears was proposed.The backbone feature extraction network of Mask-RCNN was replaced with Swin Transformer,and the NAS-FPN module combining FPN and neural retrieval algorithm were introduced,and the CBAM attention mechanism was added in the Mask image segmentation branch.Finally,compared the model with DeepLabV3+and U-Net models,and performed ablation experiments to analyze the relationship between each variable and network performance.The experiments show that the proposed model has strong segmentation capabilities for residual austenite and martensite in carburized gears,with an mean pixel accuracy of 90.64% .The overall performance is significantly better than other model structures,and each module contributes to the improvement of the model performance to varying degrees.