融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：强化学习对数学模型依赖性低,利用经验便于架构和优化模型,非常适合用于动态治疗策略学习.但现有研究仍存在以下问题:1)学习策略最优性的同时未考虑风险,导致学到的策略存在一定的风险;2)忽略了分布偏移问题,导致学到的策略与医生策略完全不同;3)忽略患者的历史观测数据和治疗史,从而不能很好地得到患者状态,进而导致不能学到最优策略.基于此,提出了融合Dead-ends和离线监督Actor-Critic的动态治疗策略生成模型DOSAC-DTR.首先,考虑学到的策略所推荐的治疗行动的风险性,在 Actor-Critic框架中融入Dead-ends概念;其次,为缓解分布偏移问题,在 Actor-Critic框架中融入医生监督,在最大化预期回报的同时,最小化所学策略与医生策略之间的差距;最后,为了得到包含患者关键历史信息的状态表示,使用基于LSTM的编码器解码器模型对患者的历史观测数据和治疗史进行建模.实验结果表明,DOSAC-DTR相比基线方法有更好的性能,可以得到更低的估计死亡率以及更高的Jaccard系数.

外文标题：Dynamic Treatment Regime Generation Model Combining Dead-ends and Offline Supervision Actor-Critic

外文摘要：Reinforcement learning has low dependence on mathematical models,and it is easy to construct and optimize models by using experience,which is very suitable for dynamic treatment regime learning.However,existing studies still have the following problems:1)risk is not considered when learning strategy optimality,resulting in certain risks in the learned policy;2)the problem of distribution deviation is ignored,resulting in learning policies completely different from the doctor's policy;3)the pa-tient's histo-rical observation data and treatment history are ignored,thus failing to obtain a good patient status and thus failing to learn the optimal policy.Based on this,DOSAC-DTR,a dynamic treatment regime generation model combining dead-ends and offline supervision actor-critic,is proposed.First,considering the risk of treatment actions recommended by the learned policies,the concept of dead-ends is integrated into the actor-critic framework.Secondly,in order to alleviate the problem of distribution offset,physician supervision is integrated into the actor-critic framework to minimize the gap between learned policies and doc-tors'policies while maximizing the expected return.Finally,in order to obtain a state representation that includes critical patient historical information,a LSTM-based encoder decoder model is used to model the patient's historical observation data and treat-ment history.Experiments show that DOSAC-DTR has better performance than the baseline approach,resulting in lower esti-mated mortality rates and higher Jaccard coefficients.

外文关键词：

Dynamic treatment regimeDead-endsActor-CriticState representation

作者：

杨莎莎、于亚新、王跃茹、许晶铭、魏阳杰、李新华

展开 >

作者单位：

东北大学计算机科学与工程学院沈阳 110169

医学影像智能计算教育部重点实验室(东北大学)沈阳 110169

关键词：

动态治疗策略 Dead-ends Actor-Critic 状态表征

基金：

国家自然科学基金

项目编号：

62373084

出版年：

2024

DOI：

10.11896/jsjkx.231000138

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(7)