首页|知识数据协同的多对手智能空中博弈策略设计

知识数据协同的多对手智能空中博弈策略设计

扫码查看
人工智能技术的迅速发展赋予了空战自主对抗策略超越人类专家的潜力.现有智能空战对抗策略依据驱动方式主要包含两类:其中,基于知识规则的策略对应用场景和专家知识依赖性强,而以强化学习为代表的数据驱动的策略可解释性差、泛化性弱.本文以全国智能空中博弈大赛多机协同空战为背景,提出了一种知识数据协同的多对手的空中博弈策略设计方法.其中,知识驱动部分基于专家知识设计一种参数化、风格化的策略,以生成高质量离线数据及初始化策略;数据驱动部分基于图注意力网络对队友、对手的信息进行针对性表征,提升训练效率及收敛性能.进一步,以动态对手匹配机制进行多对手强化学习训练,进一步提升策略泛化性.该策略与大赛16强中的12支队伍对抗,达到70%以上的统计胜率,这些队伍均采用最新的知识或数据驱动方法,风格各异,同时具有较强的作战能力.
Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game
The rapid development of artificial intelligence technology has endowed autonomous air combat strategies with the potential to surpass human experts.Existing intelligent air combat strategies can be classified into two categories based on their driving methods:knowledge-based strategies,which heavily rely on application scenarios and expert knowl-edge;and data-driven strategies,represented by reinforcement learning,which have poor interpretability and weak general-ization.In this study,focusing on the scenario of multi-agent cooperative air combat from the air intelligence game(AIG)—a knowledge-based and data-driven integrating strategy design method is proposed.The knowledge-based part utilizes ex-pert knowledge to design a parameterized and stylized knowledge-based artificial intelligence(AI)system,which generates high-quality offline data and initializes the strategy.The data-driven part employs graph attention networks to selectively represent information about teammates and opponents,aiming to improve training efficiency and convergence performance.Furthermore,a dynamic opponent matching mechanism is introduced for multi-agent reinforcement learning training to en-hance strategy generalization.The proposed strategy achieved a statistical winning rate of over 70%when competing against 12 teams from the top 16 teams in AIG.It is worth mentioning that these teams all adopt the latest knowledge-based or data-driven methods,with diverse styles,and at the same time,they have strong combat capabilities.

reinforcement learningknowledge and data integratingair combatmulti-opponentgeneralization

冯锦元、陈敏、李俊影、陈加乐、蒲志强、陈敏杰、孙方义

展开 >

中国科学院自动化研究所,北京 100190

中国科学院大学人工智能学院,北京 100049

南京信息工程大学计算机学院,江苏 南京 210044

华如研究院,北京 100193

展开 >

强化学习 知识数据协同驱动 空中博弈 多对手 泛化性

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(11)