首页|基于双交叉注意力Transformer网络的小样本图像语义分割

基于双交叉注意力Transformer网络的小样本图像语义分割

扫码查看
小样本图像语义分割只用少量样本就能分割出新类别.针对现有方法中语义信息挖掘不充分的问题,本文提出一种基于双交叉注意力网络的小样本图像语义分割方法.该方法采用Transformer结构,利用双交叉注意力模块同时从通道和空间维度上学习多尺度查询特征和支持特征的远程依赖性.首先,本文提出通道交叉注意力模块,并结合位置交叉注意力模块构成双交叉注意力模块.其中,通道交叉注意力模块用于学习查询和支持特征之间的通道语义相互关系,位置交叉注意力模块用来捕获查询和支持特征之间的远程上下文相关性.然后,通过多个双交叉注意力模块能够为查询图像提供包含丰富语义信息的多尺度交互特征.最后,本文引入辅助监督损失,并通过上采样和残差连接将多尺度交互特征连接至解码器以得到准确的新类分割结果.本文方法在数据集PASCAL-5i上的mIoU达到了 69.9%(1-shot)和72.4%(5-shot),在数据集COCO-20i上的mIoU达到了48.9%(1-shot)和54.6%(5-shot).与主流方法相比,本文方法的分割性能达到了最先进的水平.
Dual cross-attention Transformer network for few-shot image semantic segmentation
Few-shot semantic segmentation can segment novel classes with only few examples.To address the problem of insufficient semantic information mining in existing methods,a method based on Dual Cross-Attention Network for few-shot image semantic segmentation is proposed.The method adopts Transformer structure and uses dual cross-attention modules to explore the remote dependencies between multi-scale query and support features from both channel and spatial dimensions.Firstly,a channel cross-attention module is proposed in combination with the position cross-attention module to form a dual cross-attention module.Wherein,the channel cross-attention module is applied to learn the channel semantic interrelationships between the query and support features.The position cross-attention module is used to capture the remote contextual correlations between the query and support features.Then,multi-scale interaction features containing rich semantic information can be provided to the query image by multiple dual cross-attention modules.Finally,to obtain accurate segmentation results,auxiliary supervision loss is introduced,and these multi-scale interaction features are connected to the decoder via upsampled and residual connection.The proposed method achieves 69.9%(1-shot)and 72.4%(5-shot)mIoU on the dataset PASCAL-5i,and 48.9%(1-shot)and 54.6%(5-shot)mIoU on the dataset COCO-20i,which attains the state-of-the-art segmentation performance in comparison with mainstream methods.

few-shot semantic segmentationtransformer architecturechannel cross-attentiondual cross-attentionauxiliary losses

刘玉、郭迎春、朱叶、于明

展开 >

河北工业大学 电子信息工程学院,天津 300401

河北工业大学 人工智能与数据科学学院,天津 300401

小样本图像语义分割 Transformer结构 通道交叉注意力 双交叉注意力 辅助损失

国家自然科学基金青年项目国家自然科学基金面上项目河北省自然科学基金河北省自然科学基金河北省自然科学基金

6210212962276088F2021202030F2019202381F2019202464

2024

液晶与显示
中科院长春光学精密机械与物理研究所 中国光学光电子行业协会液晶分会 中国物理学会液晶分会

液晶与显示

CSTPCD北大核心
影响因子:0.964
ISSN:1007-2780
年,卷(期):2024.39(11)