首页|环境适应的高斯噪声数据增强强化学习方法

环境适应的高斯噪声数据增强强化学习方法

扫码查看
状态向量输入的强化学习方法是一种基本的强化学习研究方向,具有广泛的应用前景.针对目前强化学习方法数据效率低下导致学习时间较长从而难以在现实环境中应用的问题,提出了一种环境适应的高斯噪声数据增强(environment-adapted Gaussian noise augmentation,EAGNA)方法,并将其作为一个模块插入到软演员-评论家(soft actor-critic,SAC)和近端策略优化(proximal policy optimization,PPO)方法中.针对任务环境中状态向量的各个元素分布范围,对每个元素添加具有不同均值和标准差的高斯噪声,从而达到增强数据的目的.在 OpenAI Gym基准测试的 3 个基于状态向量输入的控制任务中,EAGNA较原算法获得了更高的平均回报,提高了算法的数据效率.特别是在具有复杂状态输入的 Lunar Lander控制任务中,EAGNA获得的平均回报比SAC和PPO 方法分别高出 30.52 和 26.09.
Reinforcement Learning Approach with Environment-Adaptive Gaussian Noise Augmentation
The state vector input-based reinforcement learning approach is currently a fundamental re-search direction in the field of reinforcement learning with broad application prospects.However,the low data efficiency of current reinforcement learning methods leads to prolonged learning times,making it dif-ficult to apply in real-world environments.To address these issues,an environment-adaptive Gaussian noise augmentation(EAGNA)method is proposed,which is integrated as a module into soft actor-critic(SAC)and proximal policy optimization(PPO)methods.This study focuses on the distribution range of each element in the state vector of the task environment and adds Gaussian noise with different means and standard deviations to each element for data augmentation.Across three state-vector-based control tasks in the OpenAI Gym benchmark,EAGNA achieved a higher average return than the original algorithm,en-hancing data efficiency.Notably,in the Lunar Lander control task with complex state inputs,EAGNA outperformed the SAC and PPO methods by 30.52 and 26.09 average returns,respectively.

reinforcement learningdata augmentationGaussian noisestate vector inputenviron-ment adaptation

朱乐乾、潘志松

展开 >

陆军工程大学 指挥控制工程学院,江苏 南京 210007

强化学习 数据增强 高斯噪声 状态向量输入 环境适应

国家自然科学基金

62076251

2024

陆军工程大学学报
解放军理工大学科研部

陆军工程大学学报

影响因子:0.556
ISSN:2097-0730
年,卷(期):2024.3(2)
  • 21