Novel Probability Distribution Update Strategy for Distributed Deep Q-Networks Based on Sigmoid Function
Based on expected value DQN,distributed deep Q network(Dist-DQN)can solve the stochastic reward problem in complex environments by continuing discrete action reward into an interval and continuously updating the probability distribution of support intervals.The distribution update strategy of reward probability,as an important function for Dist-DQN implementa-tion,significantly affect the learning efficiency of agents in the environment.A new Sig-Dist-DQN probability distribution update strategy is proposed to address the above issues.This strategy comprehensively considers the strength of the correlation between reward probability subsets,improving the probability quality update rate of strongly correlated subsets while reducing the proba-bility quality update rate of weakly correlated subsets.In the environment provided by OpenAI Gym,experiments are conducted,and the exponential update and harmonic series update strategies show significant differences in each training session,while the training images of the Sig-Dist-DQN strategy are very stable.Compared with the exponential update and harmonic sequence up-date strategies,the intelligent agent applying Sig-Dist-DQN has significantly improved the convergence speed and stability of the loss function during the learning process.
Distributed deep Q networkContinuation of reward intervalsUpdating the probability distributionLearning rateTraining stability