系统仿真学报2024,Vol.36Issue(1) :39-49.DOI:10.16182/j.issn1004731x.joss.22-0886

基于深度强化学习的立体投送策略优化方法研究

Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning

安靖 司光亚 张雷
系统仿真学报2024,Vol.36Issue(1) :39-49.DOI:10.16182/j.issn1004731x.joss.22-0886

基于深度强化学习的立体投送策略优化方法研究

Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning

安靖 1司光亚 2张雷1
扫码查看

作者信息

  • 1. 国防大学 联合勤务学院,北京 100858;国防大学 研究生院,北京 100091;国防大学 联合作战学院,北京 100091
  • 2. 国防大学 联合作战学院,北京 100091
  • 折叠

摘要

基于深度强化学习算法在策略优化问题中的良好表现,以立体投送作战行动为主要研究对象,提出了一种深度强化学习框架与仿真推演实验协同的作战行动策略优化方法.在分析策略优化研究现状的基础上,根据研究问题对深度学习框架进行了分析比较,构建了基于A3C算法的深度强化学习立体投送策略模型,并通过仿真推演和分布式计算,实现深度强化学习模型与"人不在回路"仿真推演的交互学习,获得优化后的立体投送策略,验证了深度强化学习框架与仿真推演实验协同优化策略的有效性.

Abstract

Based on the perfect performance of deep reinforcement learning(DRL)in strategy optimization,this paper proposes a strategy optimization method of action taking the multi-dimension projection action as the main research object.The method combines the simulation experiment method with the DRL method.After analyzing the current situation of strategy optimization research,the deep learning framework is selected according to the research problems,and a DRL multi-dimension projection strategy model based on the asynchronous advantage actor-critic(A3C)algorithm is constructed.Through simulation experiments,the interactive learning between the DRL model and the simulation of"out of the loop"is realized,and the optimized multi-dimension projection strategy is obtained.Finally,the effectiveness of the cooperative optimization strategy between the DRL framework and the simulation experiment is verified.

关键词

深度强化学习/仿真推演/策略优化/立体投送/A3C算法

Key words

deep reinforcement learning(DRL)/simulation/strategy optimization/multi-dimension projection/asynchronous advantage actor-critic(A3C)algorithm

引用本文复制引用

出版年

2024
系统仿真学报
北京仿真中心 中国系统仿真学会

系统仿真学报

CSTPCDCSCD北大核心
影响因子:0.551
ISSN:1004-731X
参考文献量11
段落导航相关论文