首页|基于深度确定性策略梯度算法的自适应核反应堆功率控制器设计

基于深度确定性策略梯度算法的自适应核反应堆功率控制器设计

扫码查看
核电厂需要大量控制系统来实现系统有效控制与安全运行,其中核电站堆芯是放射性核燃料热源的关键部件,反应堆功率控制关系到核电厂运行的安全性与经济性。为解决传统PID控制器难以准确应对非线性、大功率范围的功率控制问题,本研究以某压水堆核电厂为对象推导并建立了反应堆堆芯模型,采用基于策略梯度的深度强化学习方法与PID控制器结合建立的自适应控制器进行功率控制仿真。仿真结果表明:相较于传统PID控制器,所设计的基于深度确定性策略梯度算法的自适应功率控制器,响应速度更快、控制精度与稳定性更高,同时具有较高的鲁棒性,可以准确快速地控制堆芯功率,跟踪负荷变化。
Design of Self-adaption Nuclear Reactor Power Controller Based on Deep Deterministic Policy Gradient Algorithm
Nuclear power plants need a large number of control systems to achieve effec-tive control and safe operation of the system,in which nuclear power plant core is the key component of radioactive nuclear fuel heat source,and reactor power control is related to the safety and economy of nuclear power plant operation.Therefore,it is of great significance to optimize the design of nuclear reactor power controller.In the con-troller design stage,the control parameters of PID controller will be fixed in advance,which makes the control effect of PID controller has a certain degree of optimization space.In order to solve the problem that traditional PID controller is difficult to accu-rately deal with the nonlinear power control in the high power range,this study derived and established a reactor core model for a pressurized water reactor nuclear power plant.The core model includes heat transfer equation,neutron dynamics equation and reactivi-ty equation.In this study,an adaptive controller based on deep reinforcement learning based on policy gradient(deep deterministic policy gradient algorithm)combined with PID(proportional integral derivative)controller was used to simulate power control,and a reward function was constructed.The reward function can be used to represent the optimization of several control evaluation indexes such as response time,threat time,control accuracy,overshoot and oscillation.The depth deterministic policy gradi-ent algorithm can realize real-time optimization policy learning of PID controller control parameters by interacting with core model in real time.After several groups of working conditions with different power levels and different power switching modes were tested.The simulation results show that:In the 100%FP-90%FP step power reduction process(training condition),compared with the traditional PID controller,the self-adaption power controller designed based on the depth deterministic policy gradient algorithm has faster response speed,higher control accuracy and stability.At the same time,under the conditions(test conditions)of 40%FP-30%FP step power reduction process,90%FP-100%FP step power increase process,30%FP-40%FP step power increase process,100%FP-30%FP linear power reduction process and 30%FP-100%FP linear power increase process,The control effect of the self-adaption power controller designed based on the depth deterministic policy gradient algorithm is also significantly better than that of the traditional PID controller,which indicates that the controller designed by this method has high robustness and can accurately map the power variation information of the pile type to the optimal control parameters of the PID controller.The proposed method can accurately and quickly control the core power,and track load changes.

power controlreinforcement learningdeep learningself-adaption con-troller

刘永超、李桐、成以恒、王博、高璞珍、谭思超、田瑞峰

展开 >

哈尔滨工程大学黑龙江省核动力装置性能与设备重点实验室,黑龙江哈尔滨 150001

哈尔滨工程大学核安全与先进核能技术工信部重点实验室,黑龙江哈尔滨 150001

功率控制 强化学习 深度学习 自适应控制器

中核集团领创项目中核集团领创项目中央高校基本科研业务费专项

CNNC-LCKY-202245CNNC-LCKY-2022513072022JC2401

2024

原子能科学技术
中国原子能科学研究院

原子能科学技术

CSTPCD北大核心
影响因子:0.372
ISSN:1000-6931
年,卷(期):2024.58(5)
  • 15