电子与信息学报2024,Vol.46Issue(11) :4198-4207.DOI:10.11999/JEIT231394

基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法

Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation

侯志强 董佳乐 马素刚 王晨旭 杨小宝 王昀琛
电子与信息学报2024,Vol.46Issue(11) :4198-4207.DOI:10.11999/JEIT231394

基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法

Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation

侯志强 1董佳乐 1马素刚 1王晨旭 1杨小宝 1王昀琛1
扫码查看

作者信息

  • 1. 西安邮电大学计算机学院 西安 710121;西安邮电大学陕西省网络数据分析与智能处理实验室 西安 710121
  • 折叠

摘要

针对记忆网络算法中多尺度特征表达能力不足和浅层特征没有充分利用的问题,该文提出一种多尺度特征增强与全局-局部特征聚合的视频目标分割(VOS)算法.首先,通过多尺度特征增强模块融合可参考掩码分支和可参考RGB分支的不同尺度特征信息,增强多尺度特征的表达能力;同时,建立了全局-局部特征聚合模块,利用不同大小感受野的卷积操作来提取特征,并通过特征聚合模块来自适应地融合全局区域和局部区域的特征,这种融合方式可以更好地捕捉目标的全局特征和细节信息,提高分割的准确性;最后,设计了跨层融合模块,利用浅层特征的空间细节信息来提升分割掩码的精度,通过将浅层特征与深层特征融合,能更好地捕捉目标的细节和边缘信息.实验结果表明,在公开数据集DAVIS2016,DAVIS2017和YouTube-2018上,该文算法的综合性能分别达到91.8%、84.5%和83.0%,在单目标和多目标分割任务上都能实时运行.

Abstract

To address the issues of insufficient multi-scale feature expression ability and insufficient utilization of shallow features in memory network algorithms,a Video Object Segmentation(VOS)algorithm based on multi-scale feature enhancement and global local feature aggregation is proposed in this paper.Firstly,the multi-scale feature enhancement module fuses different scale feature information from reference mask branches and reference RGB branches to enhance the expression ability of multi-scale features;At the same time,a global local feature aggregation module is established,which utilizes convolution operations of different sizes of receptive fields to extract features,through the feature aggregation module,the features of the global and local regions are adaptively fused.This fusion method can better capture the global features and detailed information of the target,improving the accuracy of segmentation;Finally,a cross layer fusion module is designed to improve the accuracy of masks segmentation by utilizing the spatial details of shallow features.By fusing shallow features with deep features,it can better capture the details and edge information of the target.The experimental results show that on the public datasets DAVIS2016,DAVIS2017,and YouTube 2018,the comprehensive performance of our algorithm reaches 91.8%,84.5%,and 83.0%,respectively,and can run in real-time on both single and multi-objective segmentation tasks.

关键词

视频目标分割/记忆网络/孪生网络/特征融合/掩码细化

Key words

Video Object Segmentation(VOS)/Memory network/Siamese network/Feature fusion/Mask refinement

引用本文复制引用

出版年

2024
电子与信息学报
中国科学院电子学研究所 国家自然科学基金委员会信息科学部

电子与信息学报

CSTPCDCSCD北大核心
影响因子:1.302
ISSN:1009-5896
段落导航相关论文