首页|基于多智能体深度强化学习的解耦控制方法

基于多智能体深度强化学习的解耦控制方法

扫码查看
[目的]在现代工业生产过程中,实现复杂非线性多输入多输出系统的解耦控制对于生产过程的操作和优化都具有至关重要的意义.[方法]本文基于多智能体深度确定性策略梯度(MADDPG)算法,提出了一种解决复杂非线性多输入多输出系统解耦控制问题的设计方案,并通过连续搅拌反应过程的解耦控制仿真计算,验证了设计方案的有效性.[结果]验证结果表明:本文所提出的方案能够同时对连续搅拌反应过程中反应温度、产物摩尔流量两个被控量的设定控制目标进行跟踪调节,且在同样的控制目标下,该设计方案比单智能体方案和PID(proportional-integral-derivative control)控制方案都具有更好的稳定性与更小的稳态控制误差.[结论]仿真结果表明:针对复杂非线性多输入多输出系统的解耦控制问题,多智能体强化学习算法能够在不依赖于过程模型的基础上,实现复杂非线性多输入多输出系统的解耦控制,并保证较好的控制性能.
Decoupling control method based on multi-agent deep reinforcement learning
[Objective]The main characteristic of modern industrial process is known to couple the nonlinear dynamic with multi-input and multi-output(MIMO).For the purpose of ensuring the optimality and the simplicity of the process control,all key variables of the process must be effectively decoupled in control.However,due to difficulties in process modeling and system analyses,the challenge of achieving this treatment in MIMO systems remains.[Methods]The intelligent optimization algorithm based on multi-agent reinforcement learning(MARL)can attain the policy learning and optimization without relying on any process knowledge.In addition,the control objectives can be set so that each agent can be independently and collaboratively optimized.These futures of MARL can be applied to solve the decoupling control problems of the complex nonlinear MIMO processes.In this paper,a decoupling control system design method based on the MARL algorithm is proposed.In this method,the classical loop gain maximization principle is used to pair the process input and output to form the corresponding decoupled control loops.Each control loop is controlled and optimized by an agent with independent control objective and state feedback information.Based on the objective of each control loop,we design a reward function for the corresponding agent.Finally,the multi-agent deep deterministic policy gradient(MADDPG)algorithm is developed to train agents,resulting in obtaining the optimal decoupling control strategy.[Results]To demonstrate the effectiveness of the proposed design scheme,we model the reaction process in the continuous stirred tank reactor(CSTR)based on Aspen software platform,taking aniline hydrogenation to cyclohexylamine as an example.For this process,to guarantee the quality of the product,we necessarily regulate the residence time of the reactant in the reactor and the reaction temperature according to the optimal production conditions.In other words,two decoupled control loops need to be designed.One controls the reaction temperature in the reactor by adjusting the coolant flow in the jacket,and the other controls the residence time of the reactants by adjusting the feed flow rate.Because the residence time and reaction temperature both directly affect the product quality,for this reaction process,it is essential to solve a decoupling control problem for the two-input two-output nonlinear system.Based on the Aspen model of the process,design methods established upon three schemes,namely PID control scheme,single-agent reinforcement learning scheme,and MARL scheme,are adopted for the controller design,and simulations of the control system are conducted on this Aspen model.Experimental results show that,under the same control objectives and training times,the average control error of reaction temperature under the MARL control is reduced by 79%and 53%,respectively,and the average control error in the molar concentration of the product is reduced by 22%and 76%,respectively,compared with the single-agent reinforcement learning and PID control.These simulation results illustrate that the proposed MARL design scheme can independently control and adjust the reaction temperature and reaction residence time in the process,resulting in high control performance of the production.[Conclusions]The decoupling control for the complex nonlinear multi-input multi-output system is reformulated as a multi-agent self-optimization problem in this paper.Then,the MADDPG algorithm is developed to design the decoupling control system.Compared with other decoupled control methods,the proposed one does not depend on the process model,and secures more flexibility in the design,thus ensuring the better control performance and competently serving as an advanced decoupling control scheme based on the artificial intelligence technology.

multi-agent reinforcement learningdecoupling controldeep deterministic policy gradientcontinuous stirred tank reactornonlinear multi-input multi-output systems

肖钟毓、夏钟升、洪文晶、师佳

展开 >

厦门大学化学化工学院,福建厦门 361005

厦门大学古雷石化研究院,福建漳州 363123

多智能体强化学习 解耦控制 深度确定性策略梯度 连续搅拌反应器 非线性多输入多输出系统

2024

厦门大学学报(自然科学版)
厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.449
ISSN:0438-0479
年,卷(期):2024.63(3)
  • 32