计算机研究与发展2024,Vol.61Issue(5) :1290-1298.DOI:10.7544/issn1000-1239.202220943

基于多粒度特征交叉剪枝的点击率预测模型

Multi-Granularity Based Feature Interaction Pruning Model for CTR Prediction

白婷 刘轩宁 吴斌 张梓滨 徐志远 林康熠
计算机研究与发展2024,Vol.61Issue(5) :1290-1298.DOI:10.7544/issn1000-1239.202220943

基于多粒度特征交叉剪枝的点击率预测模型

Multi-Granularity Based Feature Interaction Pruning Model for CTR Prediction

白婷 1刘轩宁 1吴斌 1张梓滨 2徐志远 2林康熠2
扫码查看

作者信息

  • 1. 北京邮电大学计算机学院(国家示范性软件学院) 北京 100876
  • 2. 微信事业群开放平台基础部 广州 510220
  • 折叠

摘要

在推荐系统中,学习有效的高阶特征交互是提升点击率预测的关键.现有的研究将低阶特征进行组合来学习高阶交叉特征表示,导致模型的时间复杂度随着特征维度的增加呈指数型增长;而基于深度神经网络的高阶特征交叉模型也无法很好地拟合低阶特征交叉,影响预测的准确率.针对这些问题,提出了基于多粒度特征交叉剪枝的点击率预测模型FeatNet.该模型首先在显式的特征粒度上,通过特征剪枝生成有效的特征集合,保持了不同特征组合的多样性,也降低了高阶特征交叉的复杂度;基于剪枝后的特征集合,在特征元素粒度上进一步进行隐式高阶特征交叉,通过滤波器自动过滤无效的特征交叉.在2个真实的数据集上进行了大量的实验,FeatNet都取得了最优的点击率预测效果.

Abstract

Learning effective high-order feature interactions is crucial for click through rate(CTR)prediction in recommender systems.Existing methods that learn meaningful high-order feature combinations by reassembling low-order feature combinations,i.e.,2-order feature interaction,suffer from high computational costs to calculate the interaction weight of all pairwise feature interactions.Some deep neural network-based methods can be seen as universal function approximators to potentially learn all kinds of feature interactions.However,it had been proved to be inefficient to approximate the low-order interactions,i.e.,2-order or 3rd-order feature interactions,which may influence the accuracy of CTR prediction task.Based on the above consideration,we propose a multi-granularity based feature interaction pruning network(FeatNet)for CTR prediction task.Firstly,FeatNet generates different subsets with a threshold pruning operation to select the meaningful feature combinations on the explicit feature granularity,which enables FeatNet to keep the diversity of different feature combinations,and reduce the complexity of high-order feature interactions.Based on the pruned feature subsets,implicit high-order feature interactions are further conducted on the granularity of feature elements,which automatically filters out the invalid feature interactions.Extensive experiments are conducted on two real-world datasets,showing the superiority of FeatNet in CTR prediction.

关键词

点击率预测/高阶特征交叉/多粒度/特征剪枝/特征降噪

Key words

CTR prediction/high-order feature interaction/multi-granularity/feature pruning/feature denoising

引用本文复制引用

基金项目

国家自然科学基金(62102038)

国家自然科学基金(61972047)

腾讯微信开放平台项目(S2021120)

出版年

2024
计算机研究与发展
中国科学院计算技术研究所 中国计算机学会

计算机研究与发展

CSTPCD北大核心
影响因子:2.649
ISSN:1000-1239
参考文献量37
段落导航相关论文