首页|Training effective deep reinforcement learning agents for real-time life-cycle production optimization
Training effective deep reinforcement learning agents for real-time life-cycle production optimization
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Life-cycle production optimization aims to obtain the optimal well control scheme at each time control step to maximize financial profit and hydrocarbon production.However,searching for the optimal policy under the limited number of simulation evaluations is a challenging task.In this paper,a novel production optimization method is presented,which maximizes the net present value(NPV)over the entire life-cycle and achieves realtime well control scheme adjustment.The proposed method models the life-cycle production optimization problem as a finite-horizon Markov decision process(MDP),where the well control scheme can be viewed as sequence decisions.Soft actor-critic,known as the state-of-the-art model-free deep reinforcement learning(DRL)algorithm,is subsequently utilized to train DRL agents that can solve the above MDP.The DRL agent strives to maximize long-term NPV rewards as well as the control scheme randomness by training a stochastic policy that maps reservoir states to well control variables and an action-value function that estimates the objective value of the current policy.Since the trained policy is an explicit function structure,the DRL agent can adjust the well control scheme in real-time under different reservoir states.Different from most existing methods that introduce task-specific sensitive parameters or construct complex supplementary structures,the DRL agent learns adap-tively by executing goal-directed interactions with an uncertain reservoir environment and making use of accumulated well control experience,which is similar to the actual field well control mode.The key insight here is that the DRL method's ability to utilize gradients information(well-control experience)for higher sample efficiency.The simulation results based on two reservoir models indicate that compared to other optimization methods,the proposed method can attain higher NPV and access excellent performance in terms of oil displacement.
Production optimizationDeep reinforcement learningOptimal controlGoal-directed interactionModel free
Kai Zhang、Zhongzheng Wang、Guodong Chen
展开 >
Oil and Gas Development Engineering Institute,School of Petroleum Engineering,China University of Petroleum,Qingdao,China