Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline Supervision Actor-Critic
Reinforcement learning has low dependence on mathematical models,and it is easy to construct and optimize models by using experience,which is very suitable for dynamic treatment regime learning.However,existing studies still have the following problems:1)risk is not considered when learning strategy optimality,resulting in certain risks in the learned policy;2)the problem of distribution deviation is ignored,resulting in learning policies completely different from the doctor's policy;3)the pa-tient's histo-rical observation data and treatment history are ignored,thus failing to obtain a good patient status and thus failing to learn the optimal policy.Based on this,DOSAC-DTR,a dynamic treatment regime generation model combining dead-ends and offline supervision actor-critic,is proposed.First,considering the risk of treatment actions recommended by the learned policies,the concept of dead-ends is integrated into the actor-critic framework.Secondly,in order to alleviate the problem of distribution offset,physician supervision is integrated into the actor-critic framework to minimize the gap between learned policies and doc-tors'policies while maximizing the expected return.Finally,in order to obtain a state representation that includes critical patient historical information,a LSTM-based encoder decoder model is used to model the patient's historical observation data and treat-ment history.Experiments show that DOSAC-DTR has better performance than the baseline approach,resulting in lower esti-mated mortality rates and higher Jaccard coefficients.