Mining Common Quantitative Features and Cross-Linguistic Clustering of English and Russian Fake News
[Objective]This study examines the common features of fake news in different languages to provide a reference for cross-language fake news detection.[Methods]Using English and Russian as examples,we established datasets to extract common quantitative features of fake news across different languages at word,sentence,readability,and sentiment levels.Then,we used these features in principal component analysis,K-means clustering,hierarchical clustering,and second-order clustering experiments.[Results]The 34 common quantitative features demonstrated good performance in cross-language clustering of real and fake news.The proposed 19 quantitative features played a more significant role.The study found a tendency for fake news to exhibit language simplification and economization.It favors short sentences and simple collocations to convey information,making the text easier to understand and containing fewer negative expressions.[Limitations]The current dataset's limitations made parallel testing with true and false news on the same topic impossible.[Conclusions]Fake news in different languages shares common language-independent features to be used for automatic clustering,providing insights for cross-language fake news detection research.