高技术通讯2024,Vol.34Issue(5) :453-462.DOI:10.3772/j.issn.1002-0470.2024.05.002

基于多维语义特征与层次注意力机制的讽刺识别

Sarcasm recognition based on multi-dimensional semantic features and hierarchical attention mechanism

宋留静 赵泽方 马宇翔 申罕骥 李俊
高技术通讯2024,Vol.34Issue(5) :453-462.DOI:10.3772/j.issn.1002-0470.2024.05.002

基于多维语义特征与层次注意力机制的讽刺识别

Sarcasm recognition based on multi-dimensional semantic features and hierarchical attention mechanism

宋留静 1赵泽方 1马宇翔 2申罕骥 3李俊1
扫码查看

作者信息

  • 1. 中国科学院计算机网络信息中心 北京 100190;中国科学院大学 北京 100049
  • 2. 河南大学计算机与信息工程学院 开封 475004
  • 3. 中国科学院计算机网络信息中心 北京 100190
  • 折叠

摘要

讽刺是一种复杂的语言表达方式,在日常交流中发挥着重要作用.随着人工智能和社交网络的快速发展,讽刺识别已成为自然语言处理领域的热点研究课题之一.现有的讽刺识别研究往往从单一维度对讽刺文本特征进行表示,忽视了讽刺文本特征的细微差异及其重要程度.本文将讽刺识别视为文本分类任务,在特征提取阶段,将讽刺文本根据其不一致性特征、情感特征、句法结构特征和风格特征进行多维语义特征表示.在特征融合阶段,针对不同维度特征对整体特征贡献和关联程度不同,采用层次注意力机制调整不同讽刺语言学特征对模型整体性能的影响.实验结果表明,所提出的模型能够从多个维度提取讽刺文本的潜在语义特征,其在公开数据集IAC、Tweets和Reddit上的实验性能均有明显提升.

Abstract

Sarcasm is a complex language expression that plays an important role in everyday communication.With the rapid development of artificial intelligence and social networks,making computers to automatically recognize sar-casm has become one of the hot research topics in the field of natural language processing.Existing research on sar-casm recognition often expresses samantic features from a single dimension,ignoring the subtle differences and im-portance of samantic features.This paper treats sarcasm recognition as a kind of natural language classification task,in the feature extraction stage,the sarcasm text is represented by multi-dimensional semantic features accord-ing to its inconsistency features,affective features,dependency structure features and style features.In the feature fusion stage,the hierarchical attention mechanism is used to adjust the impact of different samantic linguistic fea-tures on the overall performance of the model in view of the different contribution and correlation degree of different dimension features to the overall feature.The experimental results show that the proposed model can extract the la-tent semantic features of satirical text from multiple dimensions,bring a significant improvement on public datasets IAC,Tweets and Reddit.

关键词

讽刺识别/自然语言处理/多维语义表示/层次注意力机制

Key words

sarcasm recognition/natural language processing/multi-dimensional semantic/hierarchical atten-tion mechanism

引用本文复制引用

基金项目

国家重点研发计划(2019YFB1405801)

中国科学院对外合作重点项目(241711KYSB20180002)

河南省重点研发与推广专项(222102210040)

出版年

2024
高技术通讯
中国科学技术信息研究所

高技术通讯

CSTPCD北大核心
影响因子:0.19
ISSN:1002-0470
参考文献量29
段落导航相关论文