浙江大学学报(工学版)2024,Vol.58Issue(6) :1142-1152.DOI:10.3785/j.issn.1008-973X.2024.06.005

多模态信息增强的短视频推荐模型

Multi-modal information augmented model for micro-video recommendation

霍育福 金蓓弘 廖肇翊
浙江大学学报(工学版)2024,Vol.58Issue(6) :1142-1152.DOI:10.3785/j.issn.1008-973X.2024.06.005

多模态信息增强的短视频推荐模型

Multi-modal information augmented model for micro-video recommendation

霍育福 1金蓓弘 2廖肇翊1
扫码查看

作者信息

  • 1. 中国科学院大学计算机科学与技术学院,北京 100049
  • 2. 中国科学院大学计算机科学与技术学院,北京 100049;中国科学院软件研究所,北京 100190
  • 折叠

摘要

提出用于短视频点击率预估任务的多模态增强模型(MMa4CTR).模型综合利用用户与短视频交互中的多模态数据,以构建用户的嵌入表示,并学习用户的多模态兴趣.通过组合和交叉不同模态特征,探索各模态间的共同语义.通过引入自动学习率调整和验证中断这2种训练策略,提升模型整体的推荐性能.为了解决多模态数据量增加带来的计算挑战,采用计算效率较高的多层感知机.在微信视频号和抖音短视频数据集上进行性能比较实验和超参数敏感性实验,结果显示MMa4CTR在保持较低计算开销的同时,实现了超越基线模型的卓越推荐效果.通过在2个数据集上进行的消融实验,进一步证实了短视频模态交叉模块、用户多模态嵌入层以及自动学习率调整策略和验证中断策略在提升推荐性能方面的重要性和有效性.

Abstract

A multi-modal augmented model for click through rate(MMa4CTR)tailored for micro-videos recommendation was proposed.Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal.The aim was to reveal the latent semantic commonalities,by combining and crossing features across modalities.The overall recommendation performance was boosted via two training strategies,automatic learning rate adjustment and validation interruption.A computationally efficient multi-layer perceptron architecture was employed,in order to address the computational demands brought on by the vast amount of multi-modal data.Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models,delivering superior recommendation results with minimal computational resources.Additionally,ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module,the user multi-modal embedding layer,and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.

关键词

推荐系统/点击率/多模态/短视频/机器学习

Key words

recommender system/click through rate/multi modal/micro-video/machine learning

引用本文复制引用

基金项目

国家自然科学基金(62072450)

出版年

2024
浙江大学学报(工学版)
浙江大学

浙江大学学报(工学版)

CSTPCDCSCD北大核心
影响因子:0.625
ISSN:1008-973X
参考文献量28
段落导航相关论文