HEURISTIC ACCELERATED DEEP Q NETWORK BASED ON COGNITIVE ACTION MODEL
Due to the expansion of the state-action space or sparse rewards of the complex environment,it is more difficult for reinforcement learning agents to learn an optimal policy from scratch.Therefore,a cognitive behavior model-based heuristic accelerated deep Q network is proposed.It incorporated symbolic rules into the learning network and guided policy learning dynamically,which solved the problem of effectively accelerating agents learning.The algorithm modeled the heuristic knowledge as a BDI-based cognitive behavior model,which was used to generate cognitive behavior knowledge to guide the agents'strategy learning.The heuristic strategy network was designed to guide the agent's action selection online.Experiments in GYM's typical environment and StarCraft Ⅱ environment show that the algorithm can dynamically extract effective cognitive behavior knowledge according to environmental changes,and accelerate the agent strategy convergence with the help of heuristic strategy network.
Reinforcement learningCognitive behavior modelHeuristic accelerated deep Q network