Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation
To address the issues of insufficient multi-scale feature expression ability and insufficient utilization of shallow features in memory network algorithms,a Video Object Segmentation(VOS)algorithm based on multi-scale feature enhancement and global local feature aggregation is proposed in this paper.Firstly,the multi-scale feature enhancement module fuses different scale feature information from reference mask branches and reference RGB branches to enhance the expression ability of multi-scale features;At the same time,a global local feature aggregation module is established,which utilizes convolution operations of different sizes of receptive fields to extract features,through the feature aggregation module,the features of the global and local regions are adaptively fused.This fusion method can better capture the global features and detailed information of the target,improving the accuracy of segmentation;Finally,a cross layer fusion module is designed to improve the accuracy of masks segmentation by utilizing the spatial details of shallow features.By fusing shallow features with deep features,it can better capture the details and edge information of the target.The experimental results show that on the public datasets DAVIS2016,DAVIS2017,and YouTube 2018,the comprehensive performance of our algorithm reaches 91.8%,84.5%,and 83.0%,respectively,and can run in real-time on both single and multi-objective segmentation tasks.
Video Object Segmentation(VOS)Memory networkSiamese networkFeature fusionMask refinement