基于自适应不确定性度量的离线强化学习算法

扫码查看

原文链接

万方数据
维普

中文摘要：离线强化学习可以从历史经验数据中直接学习出可执行的策略,由此来避免与在线环境的高代价交互,可应用于机器人控制、无人驾驶、智能营销等多种真实场景.有模型的离线强化学习首先通过监督学习构造环境模型,并通过与该环境模型交互来优化学习策略,具有样本效率高的特点,是最常用的离线强化学习算法.然而,由于离线数据集存在分布偏移问题,现有的方法往往通过静态的方法来评估此种不确定性,无法动态自适应于智能体策略的优化过程.针对以上问题,提出一种自适应的不确定性度量方法,首先对状态的不确定性进行估计,然后通过动态自适应的方法来衡量环境模型的不确定性,从而使得智能体可以在探索-保守中取得更好的平衡.在多个基准的离线数据集对算法进行了验证,实验结果表明,该算法在多个数据集中都取得最好的效果,消融实验等也验证了所提方法的有效性.

外文标题：Adaptive uncertainty quantification for model-based offline reinforcement learning

外文摘要：Offline reinforcement learning(RL)can optimize agent policies directly from historical offline datasets,avoiding the risky interactions with online environment.It is promising to be used in robot manipulation,autonomous driving,intelligent recommendation,etc.Model-based offline RL starts from constructing a supervised environmental model,and then interacts with this model to optimize the policy.This approach has high sample efficiency and has been widely considered in related studies.However,the distributional shift between the offline dataset and the online environment can also lead to out-of-distribution problem.Current methods mainly considered static metrics to measure the uncertainty from the environment model,and cannot adapt to the dynamic policy optimization process.Targeting the above problem,we propose a novel adaptive uncertainty quantification method.This method estimates the uncertainty of each state,and then uses the dynamic weight for the uncertainty quantification.Thus a better trade-off can be achieved between the conservatism and radicalism.Evaluations on multiple benchmarks validate the effectiveness of the algorithm.Ablation studies also demonstrate the usefulness of the measurements.

外文关键词：

offline reinforcement learning(RL)environment modeladaptive weightuncertainty quantification

作者：

张伯雷、刘哲闰

展开 >

作者单位：

南京邮电大学计算机学院,江苏南京 210023

关键词：

离线强化学习环境模型自适应权重不确定性度量

基金：

国家自然科学基金

项目编号：

62202238

出版年：

2024

DOI：

10.14132/j.cnki.1673-5439.2024.04.009

南京邮电大学学报(自然科学版)

南京邮电大学

南京邮电大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.486

ISSN：1673-5439

年,卷(期)：2024.44(4)