首页|基于大模型的政策反讽评论自动识别方法研究

基于大模型的政策反讽评论自动识别方法研究

扫码查看
政策反讽评论是公众发表政策意见时,采取的一种极端和尖锐的表达方式,对其进行自动精准识别,是政策舆情监管的重要命题之一.鉴于当前鲜有关于政策反讽评论自动识别方法的研究,并且解决该问题困难重重,本文提出基于大模型框架构建政策反讽评论自动识别方法,分别基于ChpoBERT(Chinese policy BERT)、LLaMA-2、GPT-2、StructBERT等框架构建政策反讽评论自动识别模型,在爬取111628条新浪微博有效政策评论数据的基础上,手工对数据进行标注,构建了首个政策反讽评论数据集,为未来此方向的研究提供了数据支持.同时,根据数据有无话题标签的特征,将其进一步划分为带话题标签和不带话题标签两个数据集,分别用于模型训练和评估.研究发现,基于ChpoBERT构建的政策反讽评论自动识别模型,其精确率、召回率和F1值等指标最优,LLaMA-2次之;基于大模型框架构建的政策反讽自动识别模型,经过微调后,性能都比较有保障.本文构建的政策反讽自动识别模型,是针对此问题的首项研究,为未来该方向的研究树立了明确可对比的基线模型,为当下政策舆情监管提供了一种有效方法.
Automatic Identification Method of Policy Irony Comments Based on Large Language Models
Policy irony comments are extreme and sharp expressions whereby the public voices their opinions on public policies.Automatic and accurate identification is crucial for monitoring policy opinions.Given the scarcity of research on automatic identification methods for policy irony comments and the multiple difficulties involved,this paper proposes a method for automatically identifying policy irony comments based on large language model frameworks.Specifically,us-ing the ChpoBERT,LLaMA-2,GPT-2,and StructBERT frameworks,models for the automatic identification of policy iro-ny comments were constructed and compared.Based on a dataset of 111,628 valid policy comments collected from Sina Weibo,the first dataset of policy irony comments was manually annotated.Additionally,based on the presence or absence of topic labels,the data were further divided into two datasets—one with and one without topic labels—for model training and evaluation.We found that the model built on ChpoBERT achieved the best performance in terms of accuracy,recall,and F1 score,followed by the model built on LLaMA-2.After fine-tuning,the models demonstrated certain performance guarantees.The models constructed in this study establish clear and comparable baseline models for research on the accu-rate identification of policy irony comments,providing methodological support for policy sentiment monitoring.

policy irony commentslarge language modelsautomatic identificationpublic opinion on policiespolicy in-formatics

霍朝光、尹卓、杨媛、杨万诚、茹润钰、霍帆帆

展开 >

中国人民大学信息资源管理学院,北京 100872

中国人民大学数字人文研究院,北京 100872

中国科学技术信息研究所,北京 100038

政策反讽评论 大语言模型 自动识别 政策舆情 政策信息学

2024

情报学报
中国科学技术情报学会 中国科学技术信息研究所

情报学报

CSTPCDCSSCICHSSCD北大核心
影响因子:1.296
ISSN:1000-0135
年,卷(期):2024.43(12)