首页|基于混合模仿学习的多智能体追捕决策方法

基于混合模仿学习的多智能体追捕决策方法

扫码查看
针对传统模仿学习方法在处理多样化专家轨迹时的局限性,尤其是难以有效整合质量参差不齐的固定模态专家数据的问题,创新性地融合了多专家轨迹生成对抗模仿学习(Multiple Trajectories Generative Adversarial Imitation Learning,MT-GAIL)方法与时序差分误差行为克隆(Temporal-Difference Error Behavioral Cloning,TD-BC)技术,构建了一种混合模仿学习框架.该框架不仅可以增强模型对复杂多变的专家策略的适应能力,还能够提升模型从低质量数据中提炼有用信息的鲁棒性.框架得到的模型具备直接应用于强化学习的能力,仅需经过细微的调整与优化,即可训练出一个直接可用的、基于专家经验的强化学习模型.在二维动静结合的目标追捕场景中进行了实验验证,该方法展现出良好的性能.结果表明,所提方法可以吸取专家经验,为后续的强化学习训练阶段提供一个起点高、效果佳的初始模型.
Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning
Aiming at the limitations of traditional imitation learning approaches in handling diverse expert trajectories,particularly the difficulty in effectively integrating fixed-modality expert data of varying quality,this paper innovatively integrates the multiple trajectories generative adversarial imitation learning(MT-GAIL)method with temporal-difference error behavioral cloning(TD-BC)technology to construct a hybrid imitation learning framework.This framework not only enhances the model's adaptability to complex and dynamic expert strategies but also improves its robustness in extracting useful information from low-quality data.The resulting model from this framework is directly applicable to reinforcement learning,requiring only minor adjustments and optimizations to train a readily usable reinforcement learning model grounded in expert experience.Experimental validation in a two-dimensional dynamic-static hybrid target pursuit scenario demonstrates the method's impressive performance.The results in-dicate that the proposed method effectively assimilates expert knowledge,providing a high-starting-point and effective initial model for subsequent reinforcement learning training phases.

Intelligent decision-makingReinforcement learningBehavior cloningGenerative adversarial imitation learning

王焱宁、张锋镝、肖登敏、孙中奇

展开 >

北京航天自动控制研究所 北京 100854

宇航智能控制技术全国重点实验室 北京 100854

中船智海创新研究院有限公司 北京 100094

北京理工大学自动化学院 北京 100081

展开 >

智能决策 强化学习 行为克隆 生成对抗模仿学习

2025

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2025.52(1)