Option Discovery Method Based on Symbolic Knowledge
Hierarchical strategy learning based on options is a prominent approach in the field of hierarchical reinforcement lear-ning.Options represent temporal abstractions of specific actions,and a set of options can be combined in a hierarchical manner to tackle complex reinforcement learning tasks.For the goal of option discovery,existing research has focused on the discovery of meaningful options using supervised or unsupervised methods from unstructured demonstration trajectories.However,supervised option discovery requires manual task decomposition and option policy definition,leading to a lot of additional burden.On the other hand,options discovered through unsupervised methods often lack rich semantics,limiting the subsequent reuse of options.Therefore,this paper proposes a symbol-knowledge-based option discovery method that only requires modeling the symbolic knowledge of the environment.The acquired knowledge can guide option discovery for various tasks in the environment and as-sign symbolic semantics to the discovered options,enabling their reuse in new task executions.This method decomposes the op-tion discovery process into two stages:trajectory segmentation and behavior cloning.Trajectory segmentation aims to extract se-mantically meaningful trajectory segments from demonstration trajectories.To achieve this,a segmentation model is trained spe-cifically for demonstration trajectories,incorporating symbolic knowledge to define the accuracy of segmentation in reinforcement learning reward evaluation.Behavior cloning,on the other hand,supervises the training of options based on the segmented data,aiming to make the options mimic trajectory behaviors.The proposed method is evaluated in multiple domain environments,inclu-ding both discrete and continuous spaces,for option discovery and option reuse experiments.In the option discovery experiments,the results of trajectory segmentation show that the proposed method achieves higher segmentation accuracy compared to the baseline method,with an improvement of several percentage points in both discrete and continuous space environments.More-over,in complex environment tasks,the segmentation accuracy is further improved by 20%.Additionally,the results of the option reuse experiments demonstrate that options enriched with symbolic semantics exhibit faster training speed in adapting to new tasks compared to the baseline method.Furthermore,these symbolic semantics enhanced options show good convergence even in complex tasks that the baseline method fails to accomplish.
Hierarchical reinforcement learningDemonstration learningOption discoveryMarkov decision process