Occluded Video Instance Segmentation Method Based on Feature Fusion of Tracking and Detection in Time Sequence
Video instance segmentation is a visual task that has emerged in recent years,which introduces temporal characteristics on the basis of image instance segmentation.It aims to simultaneously segment objects in each frame and achieve inter frame ob-ject tracking.A large amount of video data has been generated with the rapid development of mobile Internet and artificial intelli-gence.However,due to shooting angles,rapid motion,and partial occlusion,objects in videos often split or blur,posing significant challenges in accurately segmenting targets from video data and processing and analyzing them.After consulting and practicing,it is found that existing video instance segmentation methods perform poorly in occluded situations.In response to the above issues,this paper proposes an improved occlusion video instance segmentation algorithm,which improves segmentation performance by integrating the temporal features of Transformer and tracking detection.To enhance the learning ability of the network for spatial position information,this algorithm introduces the time dimension into the Transformer network and considers the interdepen-dence and promotion relationship between object detection,tracking,and segmentation in videos.A fusion tracking module and a detection temporal feature module that can effectively aggregate the tracking offset of objects in videos are proposed,improving the performance of object segmentation in occluded environments.The effectiveness of the proposed method is verified through experiments on the OVIS and YouTube VIS datasets.Compared to the current benchmark method,the proposed method exhibits better segmentation accuracy,further demonstrating its superiority.
Video instance segmentationObject detectionObject trackingFeature in time sequenceOccluded instance