基于BC-MAAC算法的高速入口匝道合并类人决策

扫码查看

原文链接

万方数据
维普

中文摘要：针对在自动驾驶复杂环境下多智能体强化学习算法决策缺乏人类表现出的智能性和奖励函数设计难度大的问题,提出基于BC-MAAC算法的高速入口匝道合并类人决策方案.将行为克隆思想与多智能体注意力动作—评价算法相融合,提出BC-MAAC算法,并且从Highway-env平台收集的多智能体专家数据中推导出专家策略,利用推导的专家策略与智能体当前策略的KL散度来塑造奖励函数,指导智能体训练过程.同时,应用动作屏蔽机制,在每一步过滤掉不安全或无效的动作,提高学习效率.两种不同交通密度场景的仿真结果表明所提算法整体性能优于基线算法,提升了车辆的通行效率和安全性.简单模式中,所提算法的成功率达到100％,平均速度和平均奖励分别至少提升0.73％和11.14％;困难模式中,所提算法的成功率达到93.40％,平均速度和平均奖励分别至少提升3.96％和12.23％.可见BC-MAAC算法通过专家奖励函数指导网联自动驾驶车辆,能够通过合作更类人的完成高速入口匝道合并任务.

外文标题：Highway on-ramp merging human-like decision based on BC-MAAC algorithm

外文摘要：To address the lack of human-like intelligence and the difficulty in designing reward functions in multi-agent rein-forcement learning algorithms for autonomous driving in complex environments,this paper advanced a human-like decision-making scheme for highway on-ramp merging based on the BC-MAAC algorithm.Combined behavior cloning IDEA with the multi-actor-attention-critic algorithm,it proposed the BC-MAAC algorithm.Derives expert policies from multi-agent expert data collected on the Highway-env platform,and used the KL divergence between the derived expert policies and the current policies of agents to shape the reward function,so as to guide the training process of the agents.At the same time,the algo-rithm applied an action masking mechanism to filter out unsafe or ineffective actions at each step to improve learning efficien-cy.Simulation results under two different traffic density scenarios show that the proposed algorithm outperforms the baseline algorithm overall,improving vehicle efficiency and safety.In the easy mode,the proposed algorithm achieves 100％success rate,improves the average speed and the average reward by at least 0.73％and 11.14％,respectively.In the hard mode,the proposed algorithm achieves 93.40％success rate,improves the average speed and the average reward by at least 3.96％and 12.23％,respectively.It is obvious that the BC-MAAC algorithm guides connected autonomous vehicles to complete the high-way on-ramp merging task more human-like through cooperation by using the expert reward function.

外文关键词：

connected autonomous vehicleintelligent decision-makinghighway on-ramp mergingbehavior cloningmulti-agent reinforcement learning

作者：

于镝、张昌文、熊双双、刘朋友

展开 >

作者单位：

北京信息科技大学自动化学院,北京 100192

关键词：

网联自动驾驶车辆智能决策高速入口匝道合并行为克隆多智能体强化学习

出版年：

2025

DOI：

10.19734/j.issn.1001-3695.2024.06.0204

计算机应用研究

四川省电子计算机应用研究中心

计算机应用研究

北大核心

影响因子：0.93

ISSN：1001-3695

年,卷(期)：2025.42(1)