计算机科学2025,Vol.52Issue(1) :323-330.DOI:10.11896/jsjkx.240800072

基于混合模仿学习的多智能体追捕决策方法

Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning

王焱宁 张锋镝 肖登敏 孙中奇
计算机科学2025,Vol.52Issue(1) :323-330.DOI:10.11896/jsjkx.240800072

基于混合模仿学习的多智能体追捕决策方法

Multi-agent Pursuit Decision-making Method Based on Hybrid Imitation Learning

王焱宁 1张锋镝 1肖登敏 2孙中奇3
扫码查看

作者信息

  • 1. 北京航天自动控制研究所 北京 100854;宇航智能控制技术全国重点实验室 北京 100854
  • 2. 中船智海创新研究院有限公司 北京 100094
  • 3. 北京理工大学自动化学院 北京 100081
  • 折叠

摘要

针对传统模仿学习方法在处理多样化专家轨迹时的局限性,尤其是难以有效整合质量参差不齐的固定模态专家数据的问题,创新性地融合了多专家轨迹生成对抗模仿学习(Multiple Trajectories Generative Adversarial Imitation Learning,MT-GAIL)方法与时序差分误差行为克隆(Temporal-Difference Error Behavioral Cloning,TD-BC)技术,构建了一种混合模仿学习框架.该框架不仅可以增强模型对复杂多变的专家策略的适应能力,还能够提升模型从低质量数据中提炼有用信息的鲁棒性.框架得到的模型具备直接应用于强化学习的能力,仅需经过细微的调整与优化,即可训练出一个直接可用的、基于专家经验的强化学习模型.在二维动静结合的目标追捕场景中进行了实验验证,该方法展现出良好的性能.结果表明,所提方法可以吸取专家经验,为后续的强化学习训练阶段提供一个起点高、效果佳的初始模型.

Abstract

Aiming at the limitations of traditional imitation learning approaches in handling diverse expert trajectories,particularly the difficulty in effectively integrating fixed-modality expert data of varying quality,this paper innovatively integrates the multiple trajectories generative adversarial imitation learning(MT-GAIL)method with temporal-difference error behavioral cloning(TD-BC)technology to construct a hybrid imitation learning framework.This framework not only enhances the model's adaptability to complex and dynamic expert strategies but also improves its robustness in extracting useful information from low-quality data.The resulting model from this framework is directly applicable to reinforcement learning,requiring only minor adjustments and optimizations to train a readily usable reinforcement learning model grounded in expert experience.Experimental validation in a two-dimensional dynamic-static hybrid target pursuit scenario demonstrates the method's impressive performance.The results in-dicate that the proposed method effectively assimilates expert knowledge,providing a high-starting-point and effective initial model for subsequent reinforcement learning training phases.

关键词

智能决策/强化学习/行为克隆/生成对抗模仿学习

Key words

Intelligent decision-making/Reinforcement learning/Behavior cloning/Generative adversarial imitation learning

引用本文复制引用

出版年

2025
计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

北大核心
影响因子:0.944
ISSN:1002-137X
段落导航相关论文