首页|基于语料库的俄语虚假新闻词特征分析与自动聚类研究

基于语料库的俄语虚假新闻词特征分析与自动聚类研究

Corpus-Based Study on Word Features and Automatic Clustering of Russian Fake News

扫码查看
虚假新闻的自动分析与检验具有较高研究价值和现实意义.本文通过建立真假新闻语料库,考察了俄语虚假新闻在词特征层面的差异,结果表明:虚假新闻倾向使用多词、短词、简单词、单音节词来传递信息,文本难度相对较低;词频分布符合Zipf定律,但真假新闻的拟合优度差异不大;在传递确凿事实信息的名词、数词、专有名词使用上频数低于真实新闻;多使用二元搭配,较少使用多元搭配,经济性特征较为显著;语言有简化倾向,相比陈述已有事实,更倾向表达设想与意愿;5个总体特征、14个多样性特征、11个分布特征、5个搭配特征和17个词法特征指标与真实新闻具有统计学差异,使用上述指标进行真假新闻自动聚类效果良好.
The automatic analysis and detection of fake news are of high research value and great practical significance.By building a corpus of true and fake news,the differences of Russian fake news at the level of word characteristics can be investigated.The results show that fake news tends to use multiple words,short words,simple words,and monosyllabic words to convey information,and the text difficulty is relatively low.The word frequency distribution conforms to Zipf's law,but the goodness of fit between real and fake news is not quite different.The frequency of nouns,numbers and proper nouns that convey conclusive factual information in fake news is lower than that in real news.The use of binary collocations is more than that of multiple collocations,and the economic characteristics are more significant.The language tends to be simplified and is more inclined to express ideas and intentions than to state existing facts.The 5 overall features,13 diversity features,11 distribution features,5 collocation features,and 17 lexical feature indicators in fake news are statistically different from those in real news.The use of the above indicators for the automatic clustering of real and fake news works well.

原伟、罗卫萍

展开 >

国防科技大学外国语学院,江苏南京 210039

语料库 俄语 虚假新闻 词特征 自动聚类

国家社会科学基金重大项目国家社会科学基金重大项目国家社会科学基金重点项目国家社会科学基金项目河南省哲学社会科学规划项目

20&ZD14019ZDA31720AZD13018BYY2352021BYY024

2024

解放军外国语学院学报
解放军外国语学院

解放军外国语学院学报

CSTPCDCSSCICHSSCD北大核心
影响因子:1.175
ISSN:1002-722X
年,卷(期):2024.47(3)
  • 1