Adaptive uncertainty quantification for model-based offline reinforcement learning
Offline reinforcement learning(RL)can optimize agent policies directly from historical offline datasets,avoiding the risky interactions with online environment.It is promising to be used in robot manipulation,autonomous driving,intelligent recommendation,etc.Model-based offline RL starts from constructing a supervised environmental model,and then interacts with this model to optimize the policy.This approach has high sample efficiency and has been widely considered in related studies.However,the distributional shift between the offline dataset and the online environment can also lead to out-of-distribution problem.Current methods mainly considered static metrics to measure the uncertainty from the environment model,and cannot adapt to the dynamic policy optimization process.Targeting the above problem,we propose a novel adaptive uncertainty quantification method.This method estimates the uncertainty of each state,and then uses the dynamic weight for the uncertainty quantification.Thus a better trade-off can be achieved between the conservatism and radicalism.Evaluations on multiple benchmarks validate the effectiveness of the algorithm.Ablation studies also demonstrate the usefulness of the measurements.