首页|基于注意力机制的多尺度手部分割方法

基于注意力机制的多尺度手部分割方法

扫码查看
针对手部边缘细节信息分割不精确及小面积手部的错检、漏检问题,提出一种基于注意力机制的多尺度手部分割方法.首先,对Transformer模块重新进行设计优化,提出窗口自注意力结构和双分支前馈神经网络(Dual-branch Feed-Forward Networks,D-FFN)机制,通过窗口自注意力机制整合全局和局部的依赖信息,D-FFN抑制背景信息的干扰;然后,提出一种结合条形池化和级联网络的多尺度特征提取模块增大感受野,提高手部分割模型的准确性和鲁棒性;最后,提出基于Triplet Attention机制的上采样解码器模块,通过调节通道维度与空间维度的注意力权重将目标特征和背景的冗余特征区分开.将所提算法在公开数据集GTEA(Georgia Tech Egocentric Activity)和EYTH(EgoYouTubeHands)上测试,实验结果表明,该算法在两个数据集上的平均交并比(MIoU)值分别达到了95.8%和90.2%,相较于TransUnet算法分别提升了2.5%和2.1%,满足手部图像分割的稳定可靠、精度高、抗干扰能力强等要求.
Multi-scale hand segmentation method based on attention mechanism
Aiming at the problem of inaccurate segmentation of hand edge detail information and missed detection of small-area hand,a multi-scale hand segmentation method based on attention mechanism is proposed.Firstly,the Transformer module is redesigned and optimized,and the window self-attention structure and D-FFN mechanism are proposed.The window self-attention mechanism integrates global and local dependent information,and D-FFN suppresses the interference of background information.Then,a multi-scale feature extraction module combining strip pooling and cascade network is proposed to increase the receptive field and improve the accuracy and robustness of the hand segmentation model.Finally,an up-sampling decoder module based on Triplet Attention mechanism is proposed.By adjusting the attention weight of channel dimension and spatial dimension,the redundant features of target features and background are distinguished.The proposed algorithm is tested on public datasets GTEA(Georgia Tech Egocentric Activity)and EYTH(EgoYouTubeHands).Experimental results show that average MIoU values of the algorithm on the two datasets reach 95.8%and 90.2%,respectively,which is 2.5%and 2.1%higher than the TransUnet algorithm.It meets the requirements of stable and reliable,high precision and strong anti-interference ability of hand image segmentation.

hand segmentationdeep learningTransUnetfeed-forward networksatrous spatial pyramid poolingtriplet attention

周雯晴、代素敏、王阳萍、王文润

展开 >

兰州交通大学 电子与信息工程学院,甘肃 兰州 730070

北京中电飞华通信有限公司,北京 100700

甘肃省人工智能与图形图像处理工程研究中心,甘肃 兰州 730070

手部分割 深度学习 TransUnet 前馈神经网络 空洞空间金字塔池化模块 Triplet Attention

国家自然科学基金国家自然科学基金甘肃省知识产权计划兰州市青年科技人才创新项目兰州交通大学青年科学基金高校科研创新平台重大培育项目

620670066236700521ZSCQ0132023-QN-11720220122024CXPT-17

2024

液晶与显示
中科院长春光学精密机械与物理研究所 中国光学光电子行业协会液晶分会 中国物理学会液晶分会

液晶与显示

CSTPCD北大核心
影响因子:0.964
ISSN:1007-2780
年,卷(期):2024.39(11)