摘要
文本是新闻最主要的媒介,传统基于情感词典的新闻推荐算法在分析情感词汇时,通常会忽略词典外的词汇情感,使得情感词汇标记不全,导致预测准确度不高和排序性能不佳等问题.针对这些问题,提出了一种推断未知词汇情感的启发式方法,设计了一种对应的新闻推荐算法来验证其有效性.构建标题-情感词-情感字三部图模型,将情感词典中的词汇情感扩散到单个的字,通过情感词和情感字得到了标题情感.首先,用词袋模型提取出标题的话题特征.然后,计算标题之间的情感相似度和话题相似度,并将两种相似度融合作为综合的相似度评价指标.接着,选取与目标新闻相似度较高的新闻作为邻居.算法通过邻居新闻的时均点击量,预测出目标新闻的时均点击量,将时均点击量视为目标新闻的预测评分,最终将评分排序实现对用户的新闻推荐.在真实的网易热榜新闻数据集上验证了该方法的可行性与有效性.对比其他算法,本文算法的平均绝对误差最优准确度提升了 2.2%~3.4%,均方根误差最优准确度提升了 2.3%~2.9%,归一化折损累计增益平均得分提升了 0.7%~1.8%.
Abstract
Traditional lexicon-based news recommendation algorithms often ignore the emotional nuances present in words beyond the confines of the dictionary.This oversight can lead to issues such as diminished prediction accuracy and subpar sorting performance.To address these challenges,this paper introduces a heuristic approach to deduce the sentiment of unfamiliar words and devises a news recommendation algorithm to verify its feasibility.A tripartite graph model is constructed to propagate sentiment from a sentiment dictionary to individual words and obtain the headline sentiment.In addition,the bag-of-words model is used to extract topic features from the headlines.The sentiment similarity and topic similarity between headlines are calculated,consolidating these into a comprehensive similarity evaluation index.The news with higher similarity to the target news is then selected as the neighbor.The algorithm predicts the hourly average click volume of the target news by considering the hourly average click volume of neighbors,treating this as the predicted score for the target news.Finally,users receive a selection of high-scoring news articles.Validation using real data from NetEase News confirms the feasibility and effectiveness of our algorithm.Compared with other algorithms,our algorithm has shown improvements in the optimal accuracy of mean absolute error(MAE)by 2.2%to 3.4%,root mean square error(RMSE)by 2.3%to 2.9%,and the mean score of normalized discounted cumulative gain(NDCG)by 0.7%to 1.8%,respectively.