Online rumors may disrupt people's thoughts,psychology and behavior,cause social shocks and endanger public safety.The widespread use of social platforms such as Weibo makes the impact and harm caused by rumors even greater.Therefore,rumor detection is of great significance to the orderly and healthy development of cyberspace.The current automatic detection techniques for rumors focus more on the construction of detection models and the represen-tation of input data,while there is little research on improving the quality of data to improve the effect of rumor detec-tion.Based on this idea,this paper applies the rough set theory to the incomplete rumor information system for knowl-edge acquisition and decision-making.In essence,to obtain high-quality data and improve rumor detection,the rough set theory is used to solve the uncertainty measurement,redundancy,and incompleteness of the incomplete rumor in-formation system.Firstly,it systematically summarizes the methods of uncertainty measurement in rough set theory,including four uncertainty measurement methods such as Shannon entropy,rough entropy,Liang entropy,and informa-tion granularity,and organizes and derives the consistent expansion of the four uncertainty measurement methods from complete information system to incomplete information system.Based on the four uncertainty measurement methods summarized above,a knowledge reduction algorithm based on Maximum Correlation Minimum Redundancy(MCMR)is proposed.The method is based on entropy measurement,which can comprehensively consider decision information and redundant noise.Experiments on 8 data sets such as UCI and Weibo show that the algorithm in this paper is supe-rior to several baseline algorithms and can effectively solve the redundancy of the information system.In addition,this paper proposes an incomplete decision tree algorithm based on maximal consistent blocks.Experiments on data with different degrees of missingness show that the algorithm in this paper can effectively solve the incompleteness of the information system.
关键词
谣言检测/粗糙集/不完备信息系统/最大相关最小冗余/极大相容块
Key words
rumor detection/rough set/incomplete information system/maximum correlation minimum redundancy/maximal consistent blocks