Weighted Double Q-Learning Algorithm Based on Softmax
As a branch of machine learning,einforcement learning is used to describe and solve the problem that agents maximize returns through learning strategies in the process of interaction with the environment.Q-Learning,as a classical model free rein-forcement learning method,has the problem of maximizing the bias caused by overestimation,and performs poorly when there is noise in the environment.The emergence of double Q-Learning(DQL)solves the problem of overestimation,but at the same time causes the problem of underestimation.To solve the problem of high and low estimation in the above algorithms,weighted Q-Learning algorithm based on softmax is proposed.And combined with DQL,a new weighted double Q-Learning algorithm based on softmax(WDQL-Softmax)is proposed.This algorithm is based on the construction of weighted dual estimators,which per-form softmax operations on the expected values of the samples to obtain weights.The weights are used to estimate the action value,effectively balancing the problem of overestimation and underestimation of the action value,making the estimated value closer to the theoretical value.Experimental results show that in the discrete action space,compared with Q-Learning algorithm,double Q-Learning algorithm and weighted double Q-learning algorithm,weighted double q-learning algorithm based on softmax has faster convergence rate and smaller error between the estimated value and the theoretical value.