南京邮电大学学报(自然科学版)2024,Vol.44Issue(1) :101-108.DOI:10.14132/j.cnki.1673-5439.2024.01.011

融合数据挖掘和评分预测的推荐算法

A recommendation algorithm integrating data mining andrating prediction

林啸轩 季一木 刘尚东 李玲娟
南京邮电大学学报(自然科学版)2024,Vol.44Issue(1) :101-108.DOI:10.14132/j.cnki.1673-5439.2024.01.011

融合数据挖掘和评分预测的推荐算法

A recommendation algorithm integrating data mining andrating prediction

林啸轩 1季一木 1刘尚东 1李玲娟1
扫码查看

作者信息

  • 1. 南京邮电大学计算机学院,江苏南京 210023
  • 折叠

摘要

针对传统UserCF算法存在的数据稀疏、相似度计算开销大且不够准确的问题,以提高推荐准确率、覆盖率和时间效率为目标,设计了融合数据挖掘和评分预测的推荐算法DRR.首先用PCA降维算法解决用户评分矩阵过大且稀疏的问题;再用Canopy算法对降维后的矩阵进行处理得到聚类个数K,以余弦相似度为距离度量,用K-means算法对用户聚类,并用Apriori算法挖掘簇内项目之间潜在的关联规则,计算项目关联因子;最后以目标用户所在簇内的其他用户为其近邻,基于历史评分、余弦相似度和项目关联因子预测目标用户对项目的评分,在降低寻找最近邻时耗的同时挖掘出长尾项目.在movieLens数据集、豆瓣电影数据集上与UserCF算法、基于K-means聚类的协同过滤算法和基于谱聚类的协同过滤算法的对比实验结果表明,DRR算法的准确率、召回率、F1值、覆盖率,以及时间效率都有所提升.

Abstract

Aiming at the traditional UserCF algorithms'problems of inaccuracy,data sparsity,and high cost of similarity calculation,a recommendation algorithmDRR integrating data mining and rating prediction is designed to improve the recommendation accuracy,coverage and time efficiency.First,the PCA dimension reduction algorithm is used to solve the problem of the extra large and sparse user rating matrix.Second,the Canopy algorithm is used to process the reduced dimension matrix to obtain the number of clusters K.Then the K-means algorithm is deployed to cluster users with cosine similarity as the distance measurement,and the Apriori algorithm is adopted to mine the potential association rules between items in the cluster.Thus,the item association factor is calculated.Finally,other users in the target user's cluster are taken as neighbors,and the rating of the target user on the item is predicted according to the historical rating,cosine similarity and item correlation factor to mine the long tail items while reducing the time consumption of searching for the nearest neighbor.The experimental results on the movieLens dataset and the Douban movie dataset show that the accuracy,recall,F1 value,coverage and time efficiency of the DRR algorithm have been improved,compared with those of the UserCF algorithm and the K-means clustering based collaborative filtering algorithm,and the spectral clustering based collaborative filtering algorithm.

关键词

降维/聚类/关联规则/长尾项目/评分预测

Key words

dimension reduction/clustering/association rules/long tail items/rating prediction

引用本文复制引用

基金项目

国家重点研发计划(2020YFB2104002)

江苏省重点研发计划(BE2019740)

出版年

2024
南京邮电大学学报(自然科学版)
南京邮电大学

南京邮电大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.486
ISSN:1673-5439
参考文献量16
段落导航相关论文