In the case that several controllable events(control commands)are allowed to execute simultaneously,the supervisor in the framework of discrete event systems(DESs)selects one randomly.However,in practical applications,such as traffic scheduling and robot path planning,the problems of directed control and numerical optimization should be considered.This paper introduces an optimization mechanism to quantify the control cost and combines supervisory control theory(SCT)with reinforcement learning.A systematic procedure is proposed to synthesize the optimal directed supervisor of a DES based on reinforcement learning,which makes the controlled system achieve the following three goals:(1)the con-trol specifications relevant to security and liveness are not violated;(2)at most one controllable event can be executed at each state;(3)the cumulative cost of event execution from the initial state to a mark state is minimal.First,given the autom-aton models of the plant and specifications,the target automaton model is obtained by the synchronous operation of these two models;a cost function is defined and assigns the execution cost for each event in the target model.Second,the non-blocking and maximally permissive supervisor is synthesized by SCT.Finally,the supervisor is transformed into a Markov decision process and then the Q-learning algorithm is utilized to compute the optimal directed supervisor.Two applications are used to verify the effectiveness and correctness of the proposed method.The simulation results show that the proposed method can realize the directed control of the system,and the numerical cost of the directed supervisor is minimized.
关键词
离散事件系统/定向监控器/强化学习/最优控制/数值优化/交通系统
Key words
discrete-event system/directed supervisor/reinforcement learning/optimal control/numerical optimiza-tion/traffic systems