Attention-based Recurrent PPO Algorithm and Its Application
A proximal policy optimization model based on attention mechanism and recurrent neural network(ARPPO)is proposed to address the problems faced by deep reinforcement learning algorithms in partially observable environments,such as insufficient information about the environment and randomness factors.The algorithm first processes the encoded information of environmental images through convolutional network layers;then highlights important key information in states using attention mechanism;then extracts temporal characteristics of data through LSTM network;finally improves policy learning and training based on PPO with Actor-Critic structure.Ablation and comparative experiments of two exploration tasks were designed based on the Gym-Minigrid environment.The experimental results show that ARPPO has faster training speed and stronger stability compared with A2C,PPO and RPPO,and has stronger adaptability to unknown environments with random factors.
deep reinforcement learningpartially observableattention mechanismLSTM networkproximal policy optimization algo-rithm