首页|多模态信息增强的短视频推荐模型

多模态信息增强的短视频推荐模型

扫码查看
提出用于短视频点击率预估任务的多模态增强模型(MMa4CTR)。模型综合利用用户与短视频交互中的多模态数据,以构建用户的嵌入表示,并学习用户的多模态兴趣。通过组合和交叉不同模态特征,探索各模态间的共同语义。通过引入自动学习率调整和验证中断这2种训练策略,提升模型整体的推荐性能。为了解决多模态数据量增加带来的计算挑战,采用计算效率较高的多层感知机。在微信视频号和抖音短视频数据集上进行性能比较实验和超参数敏感性实验,结果显示MMa4CTR在保持较低计算开销的同时,实现了超越基线模型的卓越推荐效果。通过在2个数据集上进行的消融实验,进一步证实了短视频模态交叉模块、用户多模态嵌入层以及自动学习率调整策略和验证中断策略在提升推荐性能方面的重要性和有效性。
Multi-modal information augmented model for micro-video recommendation
A multi-modal augmented model for click through rate(MMa4CTR)tailored for micro-videos recommendation was proposed.Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal.The aim was to reveal the latent semantic commonalities,by combining and crossing features across modalities.The overall recommendation performance was boosted via two training strategies,automatic learning rate adjustment and validation interruption.A computationally efficient multi-layer perceptron architecture was employed,in order to address the computational demands brought on by the vast amount of multi-modal data.Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models,delivering superior recommendation results with minimal computational resources.Additionally,ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module,the user multi-modal embedding layer,and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.

recommender systemclick through ratemulti modalmicro-videomachine learning

霍育福、金蓓弘、廖肇翊

展开 >

中国科学院大学计算机科学与技术学院,北京 100049

中国科学院软件研究所,北京 100190

推荐系统 点击率 多模态 短视频 机器学习

国家自然科学基金

62072450

2024

浙江大学学报(工学版)
浙江大学

浙江大学学报(工学版)

CSTPCD北大核心
影响因子:0.625
ISSN:1008-973X
年,卷(期):2024.58(6)