首页|面向强化学习的可解释性研究综述

面向强化学习的可解释性研究综述

扫码查看
强化学习作为机器学习的一种范式,因其强大的策略试错学习能力,受到关注.随着深度学习的融入,强化学习方法在许多复杂的控制任务中取得了巨大成功.然而,深度强化学习网络作为黑盒模型,其缺乏可解释性所带来的不安全、不可控及难理解等问题限制了强化学习在诸如自动驾驶、智慧医疗等关键领域中的发展.为了解决这一问题,科研人员开展了对强化学习可解释性的研究.然而,这些研究开展相对较晚,且缺少针对多智能体强化学习可解释性方法的系统性总结,同时,可解释性的定义存在人为主观性,导致系统性面向强化学习过程的可解释性研究较为困难.本文对当前强化学习的可解释性研究工作进行了全面的整理与总结.首先,对强化学习的可解释性进行定义并总结了相关评估方法.随后,基于马尔可夫决策过程,划分了行为级解释、特征级解释、奖励级解释及策略级解释四个类别.此外,在每个类别中,分析了单智能体及多智能体的策略解释方法,并特别关注可解释性研究中的人为因素,描述了人机交互式的解释方法.最后,对当前强化学习可解释性研究面临的挑战以及未来的研究方向进行总结与展望.
A Survey of Interpretability Research Methods for Reinforcement Learning
Reinforcement learning,as a machine learning paradigm,is garnering increasing attention due to its robust trial-and-error learning capabilities.With the integration of deep learning,reinforcement learning methods have achieved remarkable success in complex control tasks of real-world.However,the lack of interpretability in deep reinforcement learning networks,stemming from their black-box nature,presents challenges such as insecurity,lack of control,and difficulty in comprehension.This limitation hampers the progress of reinforcement learning in critical domains like autonomous driving and intelligent healthcare.To tackle this issue,researchers have undertaken extensive studies in the field of explainable reinforcement learning.Nevertheless,these studies are relatively recent and lack a systematic summary of explainable methods tailored to multi-agent reinforcement learning.Moreover,the definition of interpretability carries subjective elements,making it more challenging to comprehensively categorize explainable research targeting the reinforcement learning process.This article provides a comprehensive review and synthesis of the current state of interpretability research in reinforcement learning.To commence,the article establishes a definition for the interpretability of reinforcement learning and outlines relevant evaluation methods.Subsequently,rooted in Markov decision processes,the article categorizes interpretability into four classes:action-level explanation,feature-level explanation,reward-level explanation,and policy-level explanation.Regarding the inclusion of action factors in action-level explanations within reinforcement learning,methods for action-level explanations can be categorized based on the extent to which they explain the decision-making action of intelligent agents.These categories include self-explanatory model construction,formalized explanation methods,and generative explanation methods in reinforcement learning.In terms of incorporating state factors into feature-level explanations,methods in the feature-level explanations category are classified based on the importance of reinforcement learning state features and the form of explanation.This includes interaction trajectory explanation methods and key feature visualization methods.When it comes to incorporating reward factors into reward-level explanations,methods in the reward-level explanations category are categorized based on the impact of reward feedback on policy effectiveness and different task scenarios.This includes reward decomposition methods and reward shaping methods.Regarding the inclusion of learning strategies into policy-level explanations,methods in the policy-level explanations category are further classified into strategy decomposition methods and strategy aggregation methods,considering the hierarchical relationship of strategy explanations.These four categories of methods encompass the entire Markov decision process of an agent,spanning action execution,state transitions,reward computations,and policy learning.Furthermore,the description of each method category takes into account the progression of interpretability methods from single-agent reinforcement learning to multi-agent reinforcement learning,addressing the interpretability of more complex multi-agent environments.This includes analyzing and discussing issues such as credit assignment,collaborative cooperation,and adversarial games in the context of explainable methods,effectively bridging the gap in interpretability research methods for multi-agent reinforcement learning.Additionally,considering the subjective factors of human interpretability research,this paper provides a comprehensive summary and organization of human-machine interactive interpretability research,emphasizing the involvement of humans in the interpretability.Finally,the article concludes by summarizing the current challenges in interpretability research for reinforcement learning and offering prospects for future research directions.

reinforcement learninginterpretabilitymachine learningartificial intelligenceMarkov decision process

曹宏业、刘潇、董绍康、杨尚东、霍静、李文斌、高阳

展开 >

计算机软件新技术全国重点实验室(南京大学) 南京 210023

湘潭大学自动化与电子信息学院 湖南 湘潭 411105

南京邮电大学计算机学院 南京 210023

强化学习 可解释性 机器学习 人工智能 马尔可夫决策过程

科技创新2030—"新一代人工智能"重大项目国家自然科学基金国家自然科学基金国家自然科学基金国家自然科学基金南京大学计算机软件新技术国家重点实验室资助项目

2021ZD011330362192783622761286227614262206133KFKT2022B12

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(8)
  • 2