首页|英俄语虚假新闻共性计量特征挖掘与跨语言聚类研究

英俄语虚假新闻共性计量特征挖掘与跨语言聚类研究

扫码查看
[目的]挖掘不同语言虚假新闻的共性特征,为跨语言虚假新闻检测提供参考.[方法]以英语和俄语为例建立数据集,挖掘不同语言虚假新闻在词、句、可读性和情感层面的共性计量特征,将其用于主成分分析、K-means聚类、层次聚类和二阶聚类实验.[结果]34个共性计量特征用于真假新闻跨语言聚类效果良好,提出的19个新计量特征发挥了更大作用;发现虚假新闻有语言简化和经济化的趋势,倾向于使用短句和简单搭配传达信息,文本更易理解且包含负面表达更少.[局限]由于当前数据集限制,未能找到同一主题的真假新闻样本进行平行测试.[结论]不同语言的虚假新闻的确存在同语种无关的共性特征可用于自动聚类,为跨语言虚假新闻检测和甄别研究提供了借鉴.
Mining Common Quantitative Features and Cross-Linguistic Clustering of English and Russian Fake News
[Objective]This study examines the common features of fake news in different languages to provide a reference for cross-language fake news detection.[Methods]Using English and Russian as examples,we established datasets to extract common quantitative features of fake news across different languages at word,sentence,readability,and sentiment levels.Then,we used these features in principal component analysis,K-means clustering,hierarchical clustering,and second-order clustering experiments.[Results]The 34 common quantitative features demonstrated good performance in cross-language clustering of real and fake news.The proposed 19 quantitative features played a more significant role.The study found a tendency for fake news to exhibit language simplification and economization.It favors short sentences and simple collocations to convey information,making the text easier to understand and containing fewer negative expressions.[Limitations]The current dataset's limitations made parallel testing with true and false news on the same topic impossible.[Conclusions]Fake news in different languages shares common language-independent features to be used for automatic clustering,providing insights for cross-language fake news detection research.

Fake NewsQuantitative AnalysisClustering

原伟、刘海涛

展开 >

国防科技大学外国语学院 南京 210039

浙江大学外国语学院 杭州 310058

虚假新闻 计量分析 聚类

国家社会科学基金重大项目国家社会科学基金重大项目河南省哲学社会科学规划项目

20&ZD14020AZD1302021BYY024

2024

数据分析与知识发现
中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI
影响因子:1.452
ISSN:2096-3467
年,卷(期):2024.8(7)