基于双交叉注意力Transformer网络的小样本图像语义分割

Dual cross-attention Transformer network for few-shot image semantic segmentation

刘玉 ¹郭迎春 ²朱叶 ²于明¹

扫码查看

作者信息

1. 河北工业大学电子信息工程学院,天津 300401
2. 河北工业大学人工智能与数据科学学院,天津 300401
折叠

摘要

小样本图像语义分割只用少量样本就能分割出新类别.针对现有方法中语义信息挖掘不充分的问题,本文提出一种基于双交叉注意力网络的小样本图像语义分割方法.该方法采用Transformer结构,利用双交叉注意力模块同时从通道和空间维度上学习多尺度查询特征和支持特征的远程依赖性.首先,本文提出通道交叉注意力模块,并结合位置交叉注意力模块构成双交叉注意力模块.其中,通道交叉注意力模块用于学习查询和支持特征之间的通道语义相互关系,位置交叉注意力模块用来捕获查询和支持特征之间的远程上下文相关性.然后,通过多个双交叉注意力模块能够为查询图像提供包含丰富语义信息的多尺度交互特征.最后,本文引入辅助监督损失,并通过上采样和残差连接将多尺度交互特征连接至解码器以得到准确的新类分割结果.本文方法在数据集PASCAL-5i上的mIoU达到了 69.9%(1-shot)和72.4%(5-shot),在数据集COCO-20i上的mIoU达到了48.9%(1-shot)和54.6%(5-shot).与主流方法相比,本文方法的分割性能达到了最先进的水平.

Abstract

Few-shot semantic segmentation can segment novel classes with only few examples.To address the problem of insufficient semantic information mining in existing methods,a method based on Dual Cross-Attention Network for few-shot image semantic segmentation is proposed.The method adopts Transformer structure and uses dual cross-attention modules to explore the remote dependencies between multi-scale query and support features from both channel and spatial dimensions.Firstly,a channel cross-attention module is proposed in combination with the position cross-attention module to form a dual cross-attention module.Wherein,the channel cross-attention module is applied to learn the channel semantic interrelationships between the query and support features.The position cross-attention module is used to capture the remote contextual correlations between the query and support features.Then,multi-scale interaction features containing rich semantic information can be provided to the query image by multiple dual cross-attention modules.Finally,to obtain accurate segmentation results,auxiliary supervision loss is introduced,and these multi-scale interaction features are connected to the decoder via upsampled and residual connection.The proposed method achieves 69.9%(1-shot)and 72.4%(5-shot)mIoU on the dataset PASCAL-5i,and 48.9%(1-shot)and 54.6%(5-shot)mIoU on the dataset COCO-20i,which attains the state-of-the-art segmentation performance in comparison with mainstream methods.

关键词

小样本图像语义分割/Transformer结构/通道交叉注意力/双交叉注意力/辅助损失

Key words

few-shot semantic segmentation/transformer architecture/channel cross-attention/dual cross-attention/auxiliary losses

引用本文复制引用

基金项目

国家自然科学基金青年项目(62102129)

国家自然科学基金面上项目(62276088)

河北省自然科学基金(F2021202030)

河北省自然科学基金(F2019202381)

河北省自然科学基金(F2019202464)

出版年

2024

液晶与显示

中科院长春光学精密机械与物理研究所中国光学光电子行业协会液晶分会中国物理学会液晶分会

液晶与显示

CSTPCD北大核心

影响因子：0.964

ISSN：1007-2780

段落导航