基于大型语言模型指令微调的心理健康领域联合信息抽取

Instruction Tuning of LLM for Unified Information Extraction in Mental Health Domain

扫码查看

原文链接

维普
万方数据

中文摘要：信息抽取目的在于从文本中提取关键的信息.心理健康领域的信息抽取能力反映了语言模型对人类心理健康相关信息的自然语言理解能力.提高语言模型的领域信息抽取能力,还能为AI心理健康服务提供重要的知识来源.但目前心理健康信息抽取的中文指令数据集十分匮乏,这限制了相关研究和应用的发展.针对以上问题,该文在心理学专家的指导下提示ChatGPT生成样本实例,并通过设计生成指令以及数据增强,构建了 5 641条包含命名实体识别、关系抽取和事件抽取三项基本抽取任务的心理健康领域联合信息抽取指令数据集,旨在填补心理健康领域信息抽取中文指令数据集的不足.随后使用该指令数据集对大型语言模型进行参数高效微调.与基线模型的性能对比以及人工评估的实验结果表明,大型语言模型经过有效的指令微调后可以完成心理健康领域信息抽取的联合任务.

外文摘要：Information extraction is to extract essential information from text.The information extraction ability in the mental health domain reflects the large language model(LLM)'s understanding of human mental health related information.To improve the LLM's ability in mental health domain,however,is currently blocked by the severe shortage of Chinese instruction datasets.This paper,under the guidance of psychologists,makes ChatGPT generate sample instances,and finally created 5641 unified instruction datasets for information extraction in the field of men-tal health through the designed instruction generation and data augmentation.This dataset covers three basic extrac-tion tasks:name entity recognition,relation extraction,and event extraction,with the aim of filling the gap in men-tal health information extraction Chinese instruction datasets.Applied parameter-efficient tuning with this instruction dataset,LLM is shown to be capable of performing unified information extraction tasks in the mental health field according to the comparison against the baseline models and the results of human evaluations.

外文关键词：

information extractionmental healthlarge language modelinstruction tuning

作者：

蔡子杰、方荟、刘建华、徐戈、龙云飞

展开 >

作者单位：

福建理工大学计算机科学与数学学院,福建福州 350118

福建省大数据挖掘与应用技术重点实验室,福建福州 350118

闽江学院计算机与大数据学院,福建福州 350108

福建省心理健康人机交互技术研究中心,福建福州 350108

埃塞克斯大学计算机与电子工程学院,英国科尔切斯特CO4 3SQ

展开 >

关键词：

信息抽取心理健康大型语言模型指令微调

基金：

科技创新2030-"新一代人工智能"重大项目福建省自然科学基金福建省创新资金项目闽江学院引进人才科技预研项目闽江学院引进人才科技预研项目

项目编号：

2022ZD01163082023J013492022C0022MJY23033MJY21032

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(8)