浙江大学学报(工学版)2024,Vol.58Issue(4) :684-695.DOI:10.3785/j.issn.1008-973X.2024.04.004

联合语义分割和深度估计的交通场景感知算法

Traffic scene perception algorithm with joint semantic segmentation and depth estimation

范康 钟铭恩 谭佳威 詹泽辉 冯妍
浙江大学学报(工学版)2024,Vol.58Issue(4) :684-695.DOI:10.3785/j.issn.1008-973X.2024.04.004

联合语义分割和深度估计的交通场景感知算法

Traffic scene perception algorithm with joint semantic segmentation and depth estimation

范康 1钟铭恩 1谭佳威 2詹泽辉 1冯妍1
扫码查看

作者信息

  • 1. 厦门理工学院 福建省客车先进设计与制造重点实验室,福建 厦门 361024
  • 2. 厦门大学 航空航天学院,福建 厦门 361102
  • 折叠

摘要

受不同像素级视觉任务间的特征信息能够相互指导和优化的思路启发,基于多任务学习理论提出联合语义分割和深度估计的交通场景感知算法.提出双向跨任务注意力机制,实现任务间的全局相关性显式建模,引导网络充分挖掘和利用任务间互补模式信息.构建多任务Transformer,增强特定任务特征的空间全局表示,实现跨任务全局上下文关系的隐式建模,促进任务间互补模式信息的融合.设计编-解码融合上采样模块来有效融合编码器蕴含的空间细节信息,生成精细的高分辨率特定任务特征.在Cityscapes数据集上的实验结果表明,所提算法的语义分割平均交并比达到 79.2%,深度估计均方根误差为 4.485,针对 5 类典型交通参与者的距离估计平均相对误差为 6.1%,能够以比现有主流算法更低的计算复杂度获得更优的综合性能.

Abstract

Inspired by the idea that feature information between different pixel-level visual tasks can guide and optimize each other,a traffic scene perception algorithm based on multi-task learning theory was proposed for joint semantic segmentation and depth estimation.A bidirectional cross-task attention mechanism was proposed to achieve explicit modeling of global correlation between tasks,guiding the network to fully explore and utilize complementary pattern information between tasks.A multi-task Transformer was constructed to enhance the spatial global representation of specific task features,implicitly model the cross-task global context relationship,and promote the fusion of complementary pattern information between tasks.An encoder-decoder fusion upsampling module was designed to effectively fuse the spatial details contained in the encoder to generate fine-grained high-resolution specific task features.The experimental results on the Cityscapes dataset showed that the mean IoU of semantic segmentation of the proposed algorithm reached 79.2%,the root mean square error of depth estimation was 4.485,and the mean relative error of distance estimation for five typical traffic participants was 6.1%.Compared with the mainstream algorithms,the proposed algorithm can achieve better comprehensive performance with lower computational complexity.

关键词

交通环境感知/多任务学习/语义分割/深度估计/Transformer

Key words

perception of traffic environment/multi-task learning/semantic segmentation/depth estimation/Transformer

引用本文复制引用

基金项目

福建省自然科学基金(2023J011439)

福建省自然科学基金(2019J01859)

出版年

2024
浙江大学学报(工学版)
浙江大学

浙江大学学报(工学版)

CSTPCDCSCD北大核心
影响因子:0.625
ISSN:1008-973X
参考文献量27
段落导航相关论文