Offline Reinforcement Learning Algorithm Based on Selection of High-Quality Samples
To address the issue of over-reliance on the quality of dataset samples of offline reinforcement learning algorithms,an offline reinforcement learning algorithm based on selection of high-quality samples(SHS)is proposed.In the policy evaluation stage,higher update weights are assigned to the samples with advantage values,and a policy entropy term is added to quickly identify high-quality action samples with high probability within the data distribution,thereby screening out more valuable action samples.In the policy optimization stage,SHS aims to maximize the normalized advantage function while maintaining the policy constraints on the actions within the dataset.Consequently,high-quality samples can be efficiently utilized when the sample quality of the dataset is low,thereby improving the learning efficiency and performance of the strategy.Experiments show that SHS performs well on D4RL offline dataset in the MuJoCo-Gym environment and successfully screens out more valuable samples,thus its effectiveness is verified.