首页|基于双目视觉与Transformer的连铸坯模型定位与测量

基于双目视觉与Transformer的连铸坯模型定位与测量

扫码查看
针对双目视觉传统检测算法效率低、匹配复杂等问题,本文提出一种基于双目视觉与Transformer的连铸坯模型定位与测量方法。首先,使用标定后的平行双目相机采集连铸坯模型左右图像,经校正、标注后将其作为数据集。然后以改进的Transunet*为骨干,利用神经网络对数据集输出关键点坐标,网络模型采用多尺度U型结构来抵消因下采样量化而产生的高斯热图理论误差下界。为改善卷积神经网络只关注局部特征的缺陷,加入Transformer结构来强化每个通道内的信息交互,并提出一种优化的损失函数计算方式来克服正负样本比例失调问题以及加速网络收敛。最后,对网络输出的关键点坐标进行双目视觉三维重建并完成测距。研究结果表明:本文算法在关键点检测精度上比其他神经网络方法的高,相较于次优方法,本文方法均方根误差和归一化平均误差分别减少17。24%和18。58%;在三维测距上,其精度明显高于传统特征检测算法精度,满足工业上测量定位的精度高、受环境影响小等要求。
Continuous casting slab model positioning and measurement based on binocular vision and Transformer
In order to address the problems of low efficiency and complex matching of traditional binocular vision detection algorithms,a continuous casting slab model positioning and measurement based on binocular vision and Transformer method was proposed in this paper.Firstly,a calibrated parallel binocular camera was used to collect images of the continuous casting slab model,which were used as datasets after correction and labeling.Then,with the proposed Transunet* as the backbone,a neural network was used to output the key point coordinates of the datasets.The network model adopted a multi-scale U-shape structure to offset the lower bound of theoretical error of Gaussian heatmap caused by the downsampling quantization.In order to improve the defect that convolutional neural networks only focus on local features,Transformer module was added to enhance the information exchange in each channel,and an optimized loss function calculation method was proposed to overcome the problem of the misproportion of positive and negative samples and accelerate network convergence.Finally,the network output was reconstructed with binocular vision to complete the distance measurement.The results show that the proposed algorithm outperforms other neural network methods in the detection accuracy of key points.Compared with the sub-optimal methods,the root-mean-square error and normalized mean error the proposed method are reduced by 17.24%and 18.58%,respectively.In the three-dimensional ranging,the accuracy of the proposed method is obviously superior to that of the traditional feature detection algorithm.Thus,the proposed method can meet the requirements of high precision and small environmental impact in industrial measurement and positioning.

binocular visionTransformerlandmark detectionattention mechanism

李同谱、许四祥、施宇翔、杨利法

展开 >

安徽工业大学机械工程学院,安徽马鞍山,243032

双目视觉 Transformer 关键点检测 注意力机制

国家自然科学基金安徽省高等学校自然科学研究重点项目

51374007KJ2020A0259

2024

中南大学学报(自然科学版)
中南大学

中南大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.938
ISSN:1672-7207
年,卷(期):2024.55(4)
  • 22