首页|基于深度强化学习的立体投送策略优化方法研究

基于深度强化学习的立体投送策略优化方法研究

扫码查看
基于深度强化学习算法在策略优化问题中的良好表现,以立体投送作战行动为主要研究对象,提出了一种深度强化学习框架与仿真推演实验协同的作战行动策略优化方法.在分析策略优化研究现状的基础上,根据研究问题对深度学习框架进行了分析比较,构建了基于A3C算法的深度强化学习立体投送策略模型,并通过仿真推演和分布式计算,实现深度强化学习模型与"人不在回路"仿真推演的交互学习,获得优化后的立体投送策略,验证了深度强化学习框架与仿真推演实验协同优化策略的有效性.
Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning
Based on the perfect performance of deep reinforcement learning(DRL)in strategy optimization,this paper proposes a strategy optimization method of action taking the multi-dimension projection action as the main research object.The method combines the simulation experiment method with the DRL method.After analyzing the current situation of strategy optimization research,the deep learning framework is selected according to the research problems,and a DRL multi-dimension projection strategy model based on the asynchronous advantage actor-critic(A3C)algorithm is constructed.Through simulation experiments,the interactive learning between the DRL model and the simulation of"out of the loop"is realized,and the optimized multi-dimension projection strategy is obtained.Finally,the effectiveness of the cooperative optimization strategy between the DRL framework and the simulation experiment is verified.

deep reinforcement learning(DRL)simulationstrategy optimizationmulti-dimension projectionasynchronous advantage actor-critic(A3C)algorithm

安靖、司光亚、张雷

展开 >

国防大学 联合勤务学院,北京 100858

国防大学 研究生院,北京 100091

国防大学 联合作战学院,北京 100091

深度强化学习 仿真推演 策略优化 立体投送 A3C算法

2024

系统仿真学报
北京仿真中心 中国系统仿真学会

系统仿真学报

CSTPCD北大核心
影响因子:0.551
ISSN:1004-731X
年,卷(期):2024.36(1)
  • 11