首页|支持跨领域的中文虚假评论识别方法

支持跨领域的中文虚假评论识别方法

扫码查看
[目的]在多领域数据集的基础上,构建一种基于评论文本深层词关系语义信息提取的支持跨领域的中文虚假评论识别模型CFEE,解决传统识别方法较少考虑中文评论文本中存在不同领域数据差异性和领域虚假评论数据隐藏性的问题.[方法]提出11条虚假评论数据集建立规则,建立多领域数据集;构建CFEE模型跨领域识别中文虚假评论,其主要功能为基于ERNIE预训练模型提取文本深层语义信息、基于评论文本情感属性识别评论隐藏性、基于卷积神经网络将文本信息投射到词关系维度、基于神经网络融合特征实现分类.[结果]CFEE模型在多领域中文虚假评论数据集上的F1值为91.52%,在手机、食品、服装、家电等单领域数据集上的F1值分别为85.71%、79.59%、85.71%、85.00%,效果均显著优于现有模型.[局限]存在人工标注的主观性.[结论]本文所提识别方法能够有效地跨领域识别中文虚假评论.
Support for Cross-Domain Methods of Identifying Fake Comments of Chinese
[Objective]This paper constructs a cross-domain Chinese fake review identification model(CFEE)for multi-domain datasets.It extracts the semantic information of the comment texts and addresses the problems of traditional recognition models.[Methods]First,we established 11 rules for constructing fake review datasets and created a multi-domain dataset.Then,we designed the CFEE model to identify Chinese fake comments across domains.Third,it extracted the deep semantic information with the ERNIE pre-training model.The model identified the hidden comments based on the texts'emotional attributes.Finally,it projected the text information to the word relation dimension with the convolutional neural network and realized classification based on features of neural network fusion.[Results]The CFEE model's F,value reached 91.52%on the multi-domain Chinese fake comment datasets.The model's F,values were 85.71%,79.59%,85.71%,and 85.00%on single-domain datasets for mobile phones,food,clothing,and household appliances,respectively.It outperformed the existing models significantly.[Limitations]There is subjectivity in the manual annotation.[Conclusions]The proposed method can effectively identify Chinese fake reviews across domains.

Fake CommentsERNIE ModelCross-Domain IdentificationChinese SemanticEmotional Score

谷岩、郑楷洪、胡勇军、宋益善、刘东屏

展开 >

广州大学管理学院 广州 510006

香港中文大学数据科学学院 深圳 518000

亚马逊云科技大中华区合作伙伴及业务赋能部 北京 100015

虚假评论 ERNIE模型 跨领域识别 中文语义 情感得分

国家社会科学基金国家重点研发计划教育部供需对接就业育人项目重点领域校企合作项目(第二期)

18BGL2362021YFB330180120230103480

2024

数据分析与知识发现
中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI
影响因子:1.452
ISSN:2096-3467
年,卷(期):2024.8(2)
  • 40