Hierarchical Reinforcement Learning Adversarial Algorithm Against Opponent with Fixed Offensive Strategy

扫码查看

原文链接

NETL
NSTL
万方数据

外文摘要：Based on option-critic algorithm,a new adversarial algorithm named deterministic policy network with option architecture is proposed to improve agent's performance against opponent with fixed offensive algorithm.An option network is introduced in upper level design,which can generate activated signal from defensive and of-fensive strategies according to temporary situation.Then the lower level executive layer can figure out interactive action with guidance of activated signal,and the value of both activated signal and interactive action is evaluated by critic structure together.This method could release requirement of semi Markov decision process effectively and eventually simplified network structure by eliminating termination possibility layer.According to the result of experiment,it is proved that new algorithm switches strategy style between offensive and defensive ones neatly and acquires more reward from environment than classical deep deterministic policy gradient algorithm does.

外文关键词：

hierarchical reinforcement learningfixed offensive strategyoption architecturedeterministic gradi-ent policy

作者：

赵英策、张广浩、邢正宇、李建勋

展开 >

作者单位：

Shenyang Aircraft Design and Research Institute,Shenyang 110031,China

School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China

基金：

National Natural Science Foundation of ChinaNational Key Research and Development ProgramShanghai Commercial Aircraft System Engineering Joint Research Fund

项目编号：

616732652020YFC1512203CASEF-2022-Z05

出版年：

2024

DOI：

10.1007/s12204-023-2586-y

上海交通大学学报(英文版)

上海交通大学

上海交通大学学报(英文版)

影响因子：0.151

ISSN：1007-1172

年,卷(期)：2024.29(3)