Adaptive critic technology has been widely employed to solve the optimal control problems of complic-ated nonlinear systems,but there are some limitations to solve the infinite-horizon optimal problems of discrete-time nonlinear stochastic systems.In this paper,we establish a data-driven discounted optimal regulation method for dis-crete-time stochastic systems involving adaptive critic technology.First,we investigate the infinite-horizon optimal problems with the discount factor for stochastic systems under the relaxed assumption.The developed stochastic Q-learning algorithm can optimize an initial admissible policy to the optimal one in a monotonically nonincreasing way.Based on the data-driven idea,the policy optimization of the stochastic Q-learning algorithm is executed without a dynamic model.Then,the stochastic Q-learning algorithm is implemented by utilizing the actor-critic neural networks.Finally,two nonlinear benchmarks are given to demonstrate the overall performance of the de-veloped stochastic Q-learning algorithm.
关键词
自适应评判设计/数据驱动/离散系统/神经网络/Q-learning/随机最优控制
Key words
Adaptive critic design/data-driven/discrete-time systems/neural networks/Q-learning/stochastic op-timal control