仪器仪表学报2024,Vol.45Issue(10) :133-142.DOI:10.19650/j.cnki.cjsi.J2412990

基于循环跨视图转换和多状态特征融合的鸟瞰图生成方法

Bird's eye view generation based on recurrent cross-view transformation and multi-state feature fusion

刘明杰 何峥言 陈俊生 刘平 朴昌浩
仪器仪表学报2024,Vol.45Issue(10) :133-142.DOI:10.19650/j.cnki.cjsi.J2412990

基于循环跨视图转换和多状态特征融合的鸟瞰图生成方法

Bird's eye view generation based on recurrent cross-view transformation and multi-state feature fusion

刘明杰 1何峥言 1陈俊生 1刘平 1朴昌浩1
扫码查看

作者信息

  • 1. 重庆邮电大学自动化学院 重庆 400065
  • 折叠

摘要

针对多数基于多视角透视图的鸟瞰图(BEV)生成算法难以实现对语义不一致多状态关联特征的提取,以及模型性能与复杂度的平衡等问题,提出一种基于轻量级Transformer的BEV生成模型.该模型采用端到端的单阶段训练策略,通过建立交通场景中动态车辆和静态道路信息的关联,滤除生成视图中的噪声.基于此,一方面设计面向多尺度特征的Transformer循环跨视图转换模块,通过注意力机制实现对输入的位置编码和表征学习,捕捉特征序列中不同位置的依赖关系,提升BEV特征的鲁棒性;另一方面设计面向语义不一致的多状态BEV特征融合模块,提取静态道路和动态车辆的关联信息,提升生成BEV视图的精度.在NuScenes数据集上进行实验,结果表明,方法在确保低模型复杂度的前提下,达到了先进的BEV视图生成性能.动态车辆和静态道路的语义分割精度分别达到了43.2%和82.0%.

Abstract

To address semantic inconsistency in multi-state associated feature extraction and balancing model performance with complexity in most multiple perspective view-based bird's eye view (BEV) generation method,a light-weight Transformer-based BEV generation model is proposed. The method utilizes an end-to-end one-stage training strategy to establish a mutual association between dynamic vehicle and static road information in traffic scenes,effectively filtering out noise in the generated BEV. A Transformer-based recurrent cross-view transformation module for multi-scale features is introduced to perform image encoding and representation learning. This module improves the robustness of the extracted BEV features by capturing the location-dependent relationships in the perspective view (PV) feature sequence. Additionally,a multi-state BEV feature fusion module is designed to address semantic inconsistencies,extracting correlated information between dynamic vehicles and static roads,thus enhancing the performance of the generated BEVs. Experiments on the NuScenes dataset show that this method achieves advanced BEV generation performance with low model complexity,achieving 43.2% and 82.0% semantic segmentation accuracy for dynamic vehicles and static roads,respectively.

关键词

视图转换/轻量化Transformer模型/鸟瞰图/透视图

Key words

map-view transition/light-weight Transformer/bird's eye view/perspective view

引用本文复制引用

出版年

2024
仪器仪表学报
中国仪器仪表学会

仪器仪表学报

CSTPCDCSCD北大核心
影响因子:2.372
ISSN:0254-3087
段落导航相关论文