基于强化学习的任务型对话策略研究综述

A Survey of Task-Oriented Dialogue Policies Based on Reinforcement Learning

徐恺 ¹王振宇 ¹王旭 ¹秦华 ¹龙宇轩¹

扫码查看

作者信息

1. 华南理工大学软件学院广州 510006
折叠

摘要

对话系统在自然语言处理中发挥着重要作用,具有较好的实际应用前景和许多值得研究的方向.对话策略是基于管道方法的人机对话系统的核心组件,能够根据对话状态生成响应动作,进而指导对话生成.对话策略学习常建模为(半)马尔可夫决策过程,然后通过强化学习求解.近年来,基于强化学习算法解决任务型对话策略问题的研究层出不穷,而相关综述缺乏.因此,本文对基于强化学习的任务型对话策略进行分析、归类、总结.首先,介绍分类强化学习的一般模型,并基于强化学习的分类,分析并总结现有对话策略学习的一般思路和存在问题;其次,基于不同的研究热点,包括多领域、多模态、多代理和共情对话策略,深度剖析新近研究的理论模型、研究进展和存在的问题;接着,针对对话策略的相关研究,包括用户模拟器、对话策略评估、对话策略平台与数据集以及大语言模型与对话策略等进行介绍;针对现有研究的不足,本文从5种不同的角度分析了对话策略的未来研究方向;最后,对全文进行总结与展望.本文不仅从强化学习分类上概述任务型对话策略,而且从应用的角度分类任务型对话策略,全方面、多角度地综述了任务型对话策略,为未来的任务型对话策略的研究提供启示.

Abstract

The dialogue system holds a crucial position within the realm of natural language processing(NLP),serving as a significant and valuable component in facilitating human-machine interaction.At present,the dialogue system has attracted more and more attention in both academic and industrial communities because it is conversational for real-world applications as well as valuable in academic prospects.The pipeline-based human-computer dialogue systems consist of four distinct modules,with dialogue policy learning serving as a central component.In the pipeline framework,dialogue policy learning is responsible for selecting suitable dialogue actions based on the dialogue states obtained from the modules of natural language understanding and dialogue state tracking.These selected actions subsequently drive the natural language generation process to produce a coherent and complete response.Dialogue policy learning is commonly formulated as either a Markov decision process(MDP)or a semi-Markov decision process(SMDP).These processes are subsequently addressed by the means of reinforcement learning methods as a sequential decision problem.In recent years,there has been a rapid expansion of research methods focused on studying task-oriented dialogue policy learning using reinforcement learning methods.However,to the best of our knowledge,the existing reviews on dialogue policy learning based on reinforcement learning fall notably short in terms of comprehensiveness and depth.Therefore,the primary focus of this paper revolves around task-oriented dialogue policy learning utilizing reinforcement learning methods.We undertake an all-sided analysis,categorization,and comprehensive synthesis of task-oriented dialogue policy learning based on reinforcement learning techniques.First,we classify the reinforcement learning algorithms that are commonly used in dialogue policy learning.Then,based on the classification of reinforcement learning,we analyze the concept of dialogue policy learning in general,and summarize the problems or limitations in the existing dialogue policy learning methods.Furthermore,we present a comprehensive examination of current research directions and obstacles in the field of dialogue policy learning,which encompass various prominent areas of investigation such as multi-domain,multi-modal,multi-agent,and empathetic dialogue policies.Next,we proceed to introduce additional pertinent studies pertaining to dialogue policy learning.These encompass investigations on user simulators,methodologies for evaluating dialogue policy learning,dialogue policy platforms,datasets tailored for dialogue systems,as well as the interplay between large language models and the learning of dialogue policies.In order to rectify the deficiencies found in current research on dialogue policy learning,this paper under-takes an analysis of the prospective research directions for dialogue policy learning from five distinct vantage points.These perspectives encompass the realms of reinforcement learning technology and various applications.In conclusion,we wrap up this article and turn our gaze toward the future of dialogue policy learning.This paper not only provides a classification and comprehensive overview of task-oriented dialogue policy learning based on reinforcement learning algorithms but also categorizes it from different application perspectives.It offers a multi-dimensional,comprehensive,and systematic synthesis of task-oriented dialogue policy learning.We believe that this paper can provide valuable insights and inspiration for future research in task-oriented dialogue policy learning,and promoting the development of human-machine dialogue systems.

关键词

对话策略/强化学习/任务型对话系统/深度强化学习/多领域/多模态

Key words

dialogue policy/reinforcement learning/task-oriented dialogue systems/deep rein-forcement learning/multidomain/multimodal

引用本文复制引用

基金项目

广东省重点领域研发计划(2021B0101190002)

出版年

2024

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

参考文献量1

段落导航