Performance evaluation and improvement of deep Q network for lunar landing task
Reinforcement learning is now being applied more and more in a variety of scenarios,the majority of which are based on the deep Q network(DQN)technology.However,the algorithm is heavily influenced by multi-ple factors.In this paper,we take the lunar lander as a case to study how various hyper-parameters affect the per-formance of the DQN algorithm,based on which we tune to get a model with better performance.At present,it is known that the DQN model has an average reward of 280+on 100 test episodes,and the reward value of the model in this article can reach 290+.Meanwhile,its robustness is tested and verified by introducing additional uncertainty tests into the original problem.In addition,to speed up the training process,imitation learning is incorporated in our model,using heuristic function model guidance method to obtain demonstration data,which accelerates training speed and improves performance.Simulation results have proven the effectiveness of this method.