基于深度强化学习的立体投送策略优化方法研究
Strategy Optimization Method of Multi-dimension Projection Based on Deep Reinforcement Learning
安靖 1司光亚 2张雷1
作者信息
- 1. 国防大学 联合勤务学院,北京 100858;国防大学 研究生院,北京 100091;国防大学 联合作战学院,北京 100091
- 2. 国防大学 联合作战学院,北京 100091
- 折叠
摘要
基于深度强化学习算法在策略优化问题中的良好表现,以立体投送作战行动为主要研究对象,提出了一种深度强化学习框架与仿真推演实验协同的作战行动策略优化方法.在分析策略优化研究现状的基础上,根据研究问题对深度学习框架进行了分析比较,构建了基于A3C算法的深度强化学习立体投送策略模型,并通过仿真推演和分布式计算,实现深度强化学习模型与"人不在回路"仿真推演的交互学习,获得优化后的立体投送策略,验证了深度强化学习框架与仿真推演实验协同优化策略的有效性.
Abstract
Based on the perfect performance of deep reinforcement learning(DRL)in strategy optimization,this paper proposes a strategy optimization method of action taking the multi-dimension projection action as the main research object.The method combines the simulation experiment method with the DRL method.After analyzing the current situation of strategy optimization research,the deep learning framework is selected according to the research problems,and a DRL multi-dimension projection strategy model based on the asynchronous advantage actor-critic(A3C)algorithm is constructed.Through simulation experiments,the interactive learning between the DRL model and the simulation of"out of the loop"is realized,and the optimized multi-dimension projection strategy is obtained.Finally,the effectiveness of the cooperative optimization strategy between the DRL framework and the simulation experiment is verified.
关键词
深度强化学习/仿真推演/策略优化/立体投送/A3C算法Key words
deep reinforcement learning(DRL)/simulation/strategy optimization/multi-dimension projection/asynchronous advantage actor-critic(A3C)algorithm引用本文复制引用
出版年
2024