首页|基于因果机制约束的强化推荐系统

基于因果机制约束的强化推荐系统

扫码查看
利用历史数据训练强化学习推荐系统已经得到越来越多研究人员的关注,但是历史数据使得强化学习模型对状态-动作估值错误,产生数据偏差,如流行度偏差和选择偏差。造成上述问题的原因是历史数据分布与强化学习策略采集的数据分布不一致以及历史数据本身带有偏差。使用因果机制可以在约束策略采集数据分布的同时解决数据偏差的问题,提出基于因果机制约束的强化推荐系统,包含因果机制约束模块和对比策略模块。因果机制约束模块用于约束推荐策略可选择的样本空间以减少策略分布与数据分布误差,考虑随时间动态变化的物品流行度分布以缓解流行度偏差。对比策略模块通过平衡正负样本的重要性,缓解选择偏差的影响。在真实数据集Ciao和Epinions上的实验结果表明,相比深度Q网络(DQN)-r、GAIL、SOFA等,该算法具有较优的准确性和多样性,包含加入因果机制约束模块后的模型在F-measure指标上分别提高2%和3%,进一步验证了因果机制约束模块的有效性。
Reinforcement Recommendation System Based on Causal Mechanism Constraint
The application of historical data for training reinforcement learning recommendation systems is currently gaining attention from researchers.However,historical data leads to the incorrect estimation of state-actions in reinforcement learning models,resulting in data biases such as popularity and selection biases.The reason for this is that the distribution of historical data is inconsistent with the data collected by reinforcement learning strategies,and the historical data itself exhibits bias.To address this challenge,the use of causal mechanisms has proven effective in resolving data bias while constraining the distribution of data collected through policies.This paper proposes a reinforcement recommendation system based on causal mechanism constraint,comprising a causal mechanism constraint module and a comparison strategy module.The causal mechanism constraint module serves to limit the sample space that recommendation strategies can choose,thereby reducing errors in policy and data distributions.Notably,the causal mechanism constraint module considers the dynamic changes in the distribution of item popularity over time to alleviate popularity bias.Simultaneously,the comparison strategy module mitigates the impact of selection bias by balancing the importance of positive and negative samples.Experimental results on real datasets Ciao and Epinions show that,in comparison to Deep Q Network(DQN)-r,GAIL,SOFA,etc.,this algorithm exhibits superior accuracy and diversity.Moreover,the model with the causal constraint module improves the F-measure index by 2%and 3%,respectively,compared to the model without the causal constraint module,further verifying the effectiveness of the causal constraint module.

recommendation systemreinforcement learningcausal mechanismextrapolation errordata bias

张斯力、李梓健、蔡瑞初、郝志峰、闫玉光

展开 >

广东工业大学计算机学院,广东广州 510006

汕头大学工学院,广东汕头 515063

推荐系统 强化学习 因果机制 外推误差 数据偏差

国家自然科学基金国家自然科学基金国家自然科学基金国家优秀青年科学基金科技创新2030—"新一代人工智能"重大项目

618760436197605262206061621220222021ZD0111501

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(5)
  • 35