Occlusive clothing image segmentation based on context extraction and attention fusion
Objective Visual analysis of clothing attracts attention,while convenitional methods for clothing parsing fail to capture richer information about clothing details due to various factors including complex backgrounds and mutual occlusion of clothing.Therefore,a novel clothing image instance segmentation method is proposed to effectively extract and segment the multi-pose and mutually occluded target clothing in complex scenes for the subsequent processing of clothing analysis,retrieval,and other tasks to better meet targeted needs for personalized clothing design,retrieval,and matching.Method The output features of ResNet were optimized by using a context extraction module to enhance the recognition and extraction of feature representations of occlusive clothing.Then the attention mechanism of residual connectivity was introduced to adaptively focus on capturing the semantic inter-dependencies in the spatial and channel dimensions of occlusive clothing images.As the last step,CIoU computational principle was used as the criterion for non-maximal suppression,while focusing on the overlapping and non-overlapping regions of the predicted box and the real box to select the optimal target box that covers the occlusive clothing to the fullest extent.Results In qualitative comparison with Mask R-CNN as well as Mask Scoring R-CNN and YoLact methods,the proposed method showed stronger mask perception and inference ability,effectively decoupling the overlapping relationship between masked garment instances with more accurate segmentation visual effect.In addition,accuracy(AP)was used as an evaluation index for further quantitative analysis of the improved model,and the segmentation accuracy Amp under different IoU was 49.3%,which was 3.6%higher than the original model.Meanwhile,by comparing the segmentation accuracy of each improved model for different occlusion degrees,it was seen that the Mask R-CNN model had the lowest segmentation accuracy for various occlusion degrees,while with the optimization of CEM,AM and CIoU strategy,the accuracy of the improved model in minor occlusion APL1,moderate occlusion APL2 and severe occlusion APL3 was improved by 4.3%,4.2%and 4.8%,respectively,and the most significant improvement in segmentation accuracy was for severely occluded clothing.Finally,the accuracy of the proposed method was compared with that of Mask R-CNN,Mask Scoring R-CNN,SOLOv1,and Yolact.The overall accuracy of Yolact model for segmenting clothing with different degrees of occlusion was slightly lower,the overall accuracy of Mask Scoring R-CNN for segmenting clothing was slightly higher than that of Mask R-CNN,and SOLOv1 achieved similar segmentation accuracy as Mask R-CNN.The accuracy of the proposed method was significantly better than that of other methods for segmentation of garments with different occlusion degrees,where APL3 for segmentation of severely occlusive clothing was improved the most,which was 4.8%higher than Mask R-CNN and 4.2%-11.1%higher than other models.Conclusion By embedding the context extraction module,attention mechanism module,and CIoU computation strategy into Mask R-CNN network,a novel clothing instance segmentation model is constructed,with enhanced recognition and extraction ability of the model for clothing features.The semantic inter-dependencies between masked clothing feature maps in spatial and channel dimensions are captured,and the segmentation accuracy for each clothing is improved.The optimal target frame is predicted for each clothing instance,which improves the accuracy of the model for segmenting occlusive clothing instances.Through a series of comprehensive experiments,the feasibility and effectiveness of the proposed method are proved,providing a new idea for the research of clothing image instance segmentation.