首页|基于降低数据稀疏度的协同过滤算法

基于降低数据稀疏度的协同过滤算法

扫码查看
协同过滤算法是推荐系统的一种常见算法,其核心思想是通过历史数据挖掘用户偏好,计算对象相似近邻项进行推荐.但是一般真实数据都存在严重的数据稀疏性问题,用户或者项目之间的共同评分项目过少,使得一些传统相似度算法计算不准确、推荐准确度不高.传统Slope One算法准确度不高,但其实现简单,运行效率高,可以用做稀疏数据预填充,从而改善相似度计算的准确度.因此,结合Slope One算法,该文提出了一种基于降低数据稀疏度的协同过滤算法.首先对用户评分数据进行分层聚类,再使用Weighted Slope One算法对高稀疏度数据集部分空白数据进行预测填充,从而大幅度降低数据稀疏度,提高了皮尔逊相似度计算的准确度,最后再引入对象属性偏好相似度进行融合.通过MovieLens 100 K数据集进行算法验证,从结果中可以清晰地看出其平均绝对误差(Mean Absolute Error,MAE)有所降低,证明该算法能在一定程度上提升推荐结果的准确度.
Collaborative Filtering Algorithm Based on Reducing Data Sparsity
Collaborative filtering algorithm is a common algorithm in recommendation systems,and its core idea is to mine user preferences through historical data and calculate similar neighbor items of objects for recommendation.However,the general real data has a serious data sparsity,and there are too few common scoring items between users or projects,which makes some traditional similarity al-gorithms inaccurate in calculation and low in recommendation accuracy.The traditional Slope One algorithm is inaccurate,but it has simple implementation and high operation efficiency,which can be used as sparse data pre-filling to improve the accuracy of similarity calculation.Therefore,we introduce a collaborative filtering algorithm based on reducing data sparsity,incorporating the Slope One algorithm.Firstly,hierarchical clustering is performed on the user rating data,and then the Weighted Slope One algorithm is used to predict and fill in some blank data of the high-sparsity dataset,thereby significantly reducing the data sparsity and improving the accuracy of Pearson's similarity calculation.Finally,the object attribute preference similarity is introduced for fusion.Validation is performed using the MovieLens 100 K dataset,and the results clearly show a reduction in the Mean Absolute Error(MAE),indicating an improvement in recommendation accuracy.It is validated that the proposed algorithm can enhance recommendation accuracy to some extent.

collaborative filteringdata sparsityWeighted Slope OnePearson similarityobject properties

徐文涛、王诚

展开 >

南京邮电大学 通信与信息工程学院,江苏 南京 210003

协同过滤 数据稀疏度 加权Slope One 皮尔逊相似度 对象属性

国家自然科学基金

61801240

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(5)
  • 18