Variable radius neighborhood rough set model based on hash bucket and clustering
Neighborhood rough set is a data analysis tool that handles uncertainty in machine learning and data mining.The size of neighborhood granules in neighborhood rough set models is often affected by neighborhood ra-dius.However,existing neighborhood rough set models usually do not consider the distribution information of sample data,and set the same neighborhood radius for each sample,resulting in the neighborhood granules being unable to accurately depict each sample.To address this problem,based on the distribution information of data,a variable radius neighborhood rough set model is proposed.Firstly,the dataset is clustered,and the sample dis-tribution of each class is analyzed based on the hash bucket,and then the appropriate neighborhood radius is set for each sample,so that the information of each sample can be more accurately described.Finally,on eight data sets,the variable radius neighborhood rough set model is compared with popular neighborhood rough set models.Theoretical analysis and experimental results show that the variable radius neighborhood rough set model proposed in this paper has better learning performance.