Training effective deep reinforcement learning agents for real-time life-cycle production optimization

扫码查看

原文链接

NSTL

外文摘要：Life-cycle production optimization aims to obtain the optimal well control scheme at each time control step to maximize financial profit and hydrocarbon production.However,searching for the optimal policy under the limited number of simulation evaluations is a challenging task.In this paper,a novel production optimization method is presented,which maximizes the net present value(NPV)over the entire life-cycle and achieves realtime well control scheme adjustment.The proposed method models the life-cycle production optimization problem as a finite-horizon Markov decision process(MDP),where the well control scheme can be viewed as sequence decisions.Soft actor-critic,known as the state-of-the-art model-free deep reinforcement learning(DRL)algorithm,is subsequently utilized to train DRL agents that can solve the above MDP.The DRL agent strives to maximize long-term NPV rewards as well as the control scheme randomness by training a stochastic policy that maps reservoir states to well control variables and an action-value function that estimates the objective value of the current policy.Since the trained policy is an explicit function structure,the DRL agent can adjust the well control scheme in real-time under different reservoir states.Different from most existing methods that introduce task-specific sensitive parameters or construct complex supplementary structures,the DRL agent learns adap-tively by executing goal-directed interactions with an uncertain reservoir environment and making use of accumulated well control experience,which is similar to the actual field well control mode.The key insight here is that the DRL method's ability to utilize gradients information(well-control experience)for higher sample efficiency.The simulation results based on two reservoir models indicate that compared to other optimization methods,the proposed method can attain higher NPV and access excellent performance in terms of oil displacement.

外文关键词：

Production optimizationDeep reinforcement learningOptimal controlGoal-directed interactionModel free

作者：

Kai Zhang、Zhongzheng Wang、Guodong Chen

展开 >

作者单位：

Oil and Gas Development Engineering Institute,School of Petroleum Engineering,China University of Petroleum,Qingdao,China

出版年：

2022

DOI：

10.1016/j.petrol.2021.109766

Journal of Petroleum Science & Engineering

ISSN：0920-4105

年,卷(期)：2022.208PE

被引量29
参考文献量45