摘要
为帮助公众识别假新闻,研究使用人工智能和高级统计技术检测假新闻,在此基础上探讨开发学生媒体素养的评估和学习工具.通过人工智能、计算语言学和高级统计,分析用户或推特属性是否能区分4165条纠正过拼写错误的英文推文的真、假新闻,这些推文与20条匹配的相关新闻报道之一(10条真,10条假)关联.相较而言,使用常用词、负面情绪、更高情感激励、更高支配力、第一人称单数代词、第三人称代词或拥有更多关注用户的推文,是真新闻的可能性更大;使用第二人称代词、以无主语句开头或使用委婉语的推文,是假新闻的可能性更大.结果表明,一些通用的预测因素(如代词、礼貌用语、关注用户人数等)和特定主题的预测因素(如常用词、情绪、委婉语等)可以有效识别真假新闻.最后提出用简单易懂的媒体素养仪表盘模拟假新闻的传播范围、速度和形状,以帮助学生学习和评估自身媒体素养.
Abstract
To help the public identify fake news,research is being conducted on the use of artificial intelligence and advanced statistical techniques to detect fake news.This study uses artificial intelligence,computational linguistics,and advanced statistics to test whether user or tweet attributes can distinguish true versus fake news in 4,165 spell-checked English tweets linked to one of 20 matched COVID-19 news stories(10 true,10 fake).Tweets with common words,negative emotional valence,higher arousal,greater dominance,first person singular pronouns,third person pronouns or by users with more followers were more likely to be true.By contrast,tweets with second person pronouns,bald starts,or hedges were more likely to be fake news.The results suggest some universal predictors(pronouns,politeness,followers)and topic-specific predictors(common words,emotions,hedges).We model diffusion scope,speed and shape of fake news for a dashboard to help students learn and assess their media literacy.