In reinforcement learning,the agent encodes state sequence and influences action selection by historical informa-tion,typically employing recurrent neural network.Such traditional methods encounter gradient issues such as gradient dis-appearance and gradient explosion,and are also challenged by long sequences.Transformer leverages self-attention to as-similate long-range information.However,traditional Transformer exhibits instability and complexity in reinforcement learn-ing.Gated Transformer-XL(GTrXL)ameliorates Transformer training stability,but remains complex.To solve these prob-lems,in this article we propose a prob-sparse attention gated Transformer(PS-GTr)model,which introduces prob-sparse attention mechanism on the basis of identity mapping rearrangement and gating mechanism in GTrXL,reducing time and space complexity,and further improving training efficiency.Experimental verification showed that PS-GTr had comparable performance compared to GTrXL in reinforcement learning tasks,but had lower training time and memory usage.
deep reinforcement learningself-attentionprob-sparse attention