环境适应的高斯噪声数据增强强化学习方法

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：状态向量输入的强化学习方法是一种基本的强化学习研究方向,具有广泛的应用前景.针对目前强化学习方法数据效率低下导致学习时间较长从而难以在现实环境中应用的问题,提出了一种环境适应的高斯噪声数据增强(environment-adapted Gaussian noise augmentation,EAGNA)方法,并将其作为一个模块插入到软演员-评论家(soft actor-critic,SAC)和近端策略优化(proximal policy optimization,PPO)方法中.针对任务环境中状态向量的各个元素分布范围,对每个元素添加具有不同均值和标准差的高斯噪声,从而达到增强数据的目的.在 OpenAI Gym基准测试的 3 个基于状态向量输入的控制任务中,EAGNA较原算法获得了更高的平均回报,提高了算法的数据效率.特别是在具有复杂状态输入的 Lunar Lander控制任务中,EAGNA获得的平均回报比SAC和PPO 方法分别高出 30.52 和 26.09.

外文标题：Reinforcement Learning Approach with Environment-Adaptive Gaussian Noise Augmentation

外文摘要：The state vector input-based reinforcement learning approach is currently a fundamental re-search direction in the field of reinforcement learning with broad application prospects.However,the low data efficiency of current reinforcement learning methods leads to prolonged learning times,making it dif-ficult to apply in real-world environments.To address these issues,an environment-adaptive Gaussian noise augmentation(EAGNA)method is proposed,which is integrated as a module into soft actor-critic(SAC)and proximal policy optimization(PPO)methods.This study focuses on the distribution range of each element in the state vector of the task environment and adds Gaussian noise with different means and standard deviations to each element for data augmentation.Across three state-vector-based control tasks in the OpenAI Gym benchmark,EAGNA achieved a higher average return than the original algorithm,en-hancing data efficiency.Notably,in the Lunar Lander control task with complex state inputs,EAGNA outperformed the SAC and PPO methods by 30.52 and 26.09 average returns,respectively.

外文关键词：

reinforcement learningdata augmentationGaussian noisestate vector inputenviron-ment adaptation

作者：

朱乐乾、潘志松

展开 >

作者单位：

陆军工程大学指挥控制工程学院,江苏南京 210007

关键词：

强化学习数据增强高斯噪声状态向量输入环境适应

基金：

国家自然科学基金

项目编号：

62076251

出版年：

2024

DOI：

10.12018/j.issn.2097-0730.20230928001

陆军工程大学学报

解放军理工大学科研部

陆军工程大学学报

影响因子：0.556

ISSN：2097-0730

年,卷(期)：2024.3(2)

参考文献量21