摘要
一位新闻记者兼机器人与机器学习的新闻编辑每日新闻-在一份新的报告中讨论了人工智能的研究结果。根据NewsRx记者来自香港理工大学的新闻报道,研究表明:"在线平台正在试验内容筛选等干预措施,以缓和虚假、有偏见和煽动内容的影响。"新闻记者从香港理工大学的研究中获得了一句话:“然而,由于标签问题,在线平台在实施用于管理在线内容的机器学习算法方面面临操作挑战,其中用于模型培训的标签数据有限,获取成本高昂。”本文提出了一种基于对抗训练的领域自适应转移学习方法来增强人工智能对虚假内容的检测,该方法首先从大量基于人类判断和观点的标注新闻源中构建了一个包含正确和可信的一般新闻的源领域数据集,然后利用先进的深度学习模型提取源领域新闻中常见的辨别性语言特征,然后利用该模型对源领域新闻进行转移学习.这些与源领域相关的特征增强了政治新闻、金融新闻和在线评论三个目标领域的虚假内容检测,我们表明从标记样本丰富的源领域学习的领域不变语言特征可以有效地提高标记数据极少或高度不平衡的目标领域的虚假内容检测。
Abstract
By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News – Research findings on artificial intell igence are discussed in a new report. According to news reporting originating fr om Hong Kong Polytechnic University by NewsRx correspondents, research stated, “ Online platforms are experimenting with interventions such as content screening to moderate the effects of fake, biased, and incensing content.” The news journalists obtained a quote from the research from Hong Kong Polytechn ic University: “Yet, online platforms face an operational challenge in implement ing machine learning algorithms for managing online content due to the labeling problem, where labeled data used for model training are limited and costly to ob tain. To address this issue, we propose a domain adaptive transfer learning via adversarial training approach to augment fake content detection with collective human intelligence. We first start with a source domain dataset containing decep tive and trustworthy general news constructed from a large collection of labeled news sources based on human judgments and opinions. We then extract discriminat ing linguistic features commonly found in source domain news using advanced deep learning models. We transfer these features associated with the source domain t o augment fake content detection in three target domains: political news, financ ial news, and online reviews. We show that domain invariant linguistic features learned from a source domain with abundant labeled examples can effectively impr ove fake content detection in a target domain with very few or highly unbalanced labeled data.”