Highway on-ramp merging human-like decision based on BC-MAAC algorithm
To address the lack of human-like intelligence and the difficulty in designing reward functions in multi-agent rein-forcement learning algorithms for autonomous driving in complex environments,this paper advanced a human-like decision-making scheme for highway on-ramp merging based on the BC-MAAC algorithm.Combined behavior cloning IDEA with the multi-actor-attention-critic algorithm,it proposed the BC-MAAC algorithm.Derives expert policies from multi-agent expert data collected on the Highway-env platform,and used the KL divergence between the derived expert policies and the current policies of agents to shape the reward function,so as to guide the training process of the agents.At the same time,the algo-rithm applied an action masking mechanism to filter out unsafe or ineffective actions at each step to improve learning efficien-cy.Simulation results under two different traffic density scenarios show that the proposed algorithm outperforms the baseline algorithm overall,improving vehicle efficiency and safety.In the easy mode,the proposed algorithm achieves 100%success rate,improves the average speed and the average reward by at least 0.73%and 11.14%,respectively.In the hard mode,the proposed algorithm achieves 93.40%success rate,improves the average speed and the average reward by at least 3.96%and 12.23%,respectively.It is obvious that the BC-MAAC algorithm guides connected autonomous vehicles to complete the high-way on-ramp merging task more human-like through cooperation by using the expert reward function.