无人驾驶突发紧要场景下基于平行视觉的风险增强感知方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的随着视觉感知技术的快速发展,无人驾驶已经可以应用于简单场景.但是在实际的复杂城市道路应用中,仍然存在一些挑战,尤其是在其他车辆的突然变道、行人的闯入、障碍物的出现等突发紧要场景中.然而,真实世界中此类紧要场景数据存在长尾分布问题,导致数据驱动为主的无人驾驶风险感知面临技术瓶颈,因此,本文提出一种基于平行视觉的风险增强感知方法.方法该方法基于交互式ACP(artificial societies,computational experiments,parallel execution)理论,在平行视觉框架下整合描述、指示、预测智能,实现基于视觉的风险增强感知.具体地,基于描述与指示学习,在人工图像系统中引入改进扩散模型,设计背景自适应模块以及特征融合编码器,通过控制生成行人等危险要素的具体位置,实现突发紧要场景风险序列的可控生成;其次,采用基于空间规则的方法,提取交通实体之间的空间关系和交互关系,实现认知场景图的构建;最后,在预测学习框架下,提出了一种新的基于图模型的风险增强感知方法,融合关系图注意力网络和Transformer编码器模块对场景图序列数据进行时空建模,最终实现风险的感知与预测.结果为验证提出方法的有效性,在MRSG-144(mixed reality scene graph)、IESG(interaction-enhanced scene graph)和 1043-carla-sg(1043-carla-scenegraph)数据集上与 5 种主流风险感知方法进行了对比实验.提出的方法在3个数据集上分别取得了 0.956、0.944、0.916的Fl-score,均超越了现有主流方法,达到最优结果.结论本文是平行视觉在无人驾驶风险感知领域的实际应用,对于提高无人驾驶的复杂交通场景风险感知能力,保障无人驾驶系统的安全性具有重要意义.

外文标题：Enhanced risk perception method based on parallel vision for autonomous vehicles in safety-critical scenarios

外文摘要：Objective With the rapid development of visual perception technology,autonomous driving can already be applied to simple scenarios.However,in actual complex urban road applications,especially in safety-critical scenarios such as sudden lane changes by other vehicles,the intrusion of pedestrians,and the appearance of obstacles,some chal-lenges must still be resolved.First,most existing autonomous driving systems still use the vast majority of daily natural scenes or heuristically generated adversarial scenes for training and evaluation.Among them,safety-critical scenarios,which refer to a collection of scenes in areas where cars are in danger of collision,especially scenes involving vulnerable traffic groups such as pedestrians,play an important role in the safety performance evaluation of autonomous driving sys-tems.However,this type of scenario generally has a low probability of occurring in the real world,and such critical scene data have long-tail distribution problems,causing data-driven autonomous driving risk perception to face technical bottle-necks.Second,creating new scenes using current scene generation methods or virtual simulation scene automatic genera-tion frameworks based on certain rules is difficult,and the generated driving scenes are often insufficiently realistic and lack a certain degree of diversity.By contrast,the scene generation method based on the diffusion model not only fully explores the characteristics of real data and supplements the gaps in the existing collected real data,but also facilitates interpretable and controllable scene generation.In addition,the difficult problem of limited system risk perception capa-bilities is still encountered in safety-critical scenarios.For risk-aware safety assessment technology,traditional methods based on convolutional neural networks can achieve the simple extraction of features of each object in the scene but cannot obtain high-level semantic information,that is,the relationship between various traffic entities.Obtaining such high-level information is still challenging because most potential risks are hidden at the semantic and behavioral levels.Autonomous driving risk assessment based on traffic scene graphs has become a popular research topic in recent years.Potential risks can be effectively understood and predicted by constructing and analyzing traffic scene graphs and capturing the overall rela-tionships and interactions in the traffic scene,providing a basis for autonomous driving.The system delivers highly accu-rate decision support.Starting from the visual perception of human drivers,different traffic entities have various risk impacts on autonomous vehicles.However,risk perception methods based on traffic scene graphs generally use graph con-volution to iteratively update the feature representation of each node.This method ignores the importance of different types of edges between nodes during message transmission.Considering these challenges and difficulties,this paper proposes a risk-enhanced perception framework based on the parallel vision to realize the automatic generation of safety-critical scene data and examines the importance of different types of edges between adjacent traffic entities.Method This method is based on the interactive ACP theory and integrates descriptive,prescriptive,and predictive intelligence under a parallel vision framework to achieve vision-based enhanced risk perception.Specifically,based on descriptive and prescriptive learning,a background adaptive module and a feature fusion encoder are first introduced into the diffusion model structure,thereby reducing the boundary contours of pedestrians and improving image quality.The controllable generation of risk sequences in safety-critical scenarios can be achieved by controlling the specific locations where dangerous elements,such as pedestri-ans,are generated.Second,a cognitive scene graph construction method based on spatial rules is used to obtain the spatial position of each entity in the scene through target detection.Based on the spatial relative position information and setting relevant threshold information,the distance,orientation,and affiliation relationships between entities in the traffic scene are extracted.The extraction of interactive relationships is mainly based on the changes in spatial information between traf-fic entities over time.Finally,under the predictive learning framework,a new graph model-based risk enhancement per-ception method,which integrates the relational graph attention network and the Transformer encoder module,is proposed to perform spatiotemporal modeling of scene graph sequence data.The relational graph attention network(RGAT)intro-duces an attention mechanism,assigns different weight values to different neighborhood relationships,and obtains the fea-ture representation of nodes through weighted summation.The temporal Transformer encoder module is used to model the temporal dynamics of scene graph sequence data,ultimately outputting risk-aware visual reasoning results.Result Experi-ments were conducted on three datasets(MRSG-144,IESG,and 1043-carla-sg)to compare the performance with five mainstream risk perception methods based on graph-structured data and verify the effectiveness of the proposed method.This method achieved Fl-score values of 0.956,0.944,and 0.916 on the three datasets,surpassing the existing main-stream methods and achieving optimal results.Additionally,ablation experiments revealed the contributions of each mod-ule to the performance of the model.The introduction of virtual scene data notably boosted the performance of the risk per-ception model,revealing an increase in accuracy,area under curve,and Fl-score by 0.4％,1.1％,and 1.2％,respec-tively.Conclusion This article is a practical application of parallel vision in the field of autonomous driving risk percep-tion,which holds considerable importance in enhancing the risk perception capabilities of autonomous vehicles in complex traffic scenarios and ensuring the safety of autonomous driving systems.

外文关键词：

autonomous drivingparallel visioncognitive scene graphdiffusion generationrisk perception

作者：

苟超、刘欣欣、郭子鹏、周昱臣、王飞跃

展开 >

作者单位：

中山大学智能工程学院,深圳 518107

中国科学院自动化研究所复杂系统管理与控制国家重点实验室,北京 100190

关键词：

无人驾驶平行视觉认知场景图扩散生成风险感知

出版年：

2024

DOI：

10.11834/jig.230748

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(11)