基于语义感知知识蒸馏的情景动词识别方法

Situational verb recognition with semantic-aware distillation

扫码查看

原文链接

万方数据

中文摘要：随着计算机视觉领域的迅速发展,动词情景识别作为图像领域一项挑战性任务,旨在识别图像中的语义高度复杂的情境.分析动词情景识别的研究现状,提出了一种结合CLIP模型与知识蒸馏技术的新方法.利用CLIP的强大跨模态能力捕获图像与情景间的细微关联,并通过知识蒸馏将这些关联映射到任务中,提升网络的性能.在标准SWiG数据集上结果表明,该算法在参数量最小的情况下,性能上超越了当前的先进技术.主要贡献为提出了一个结合CLIP的语义感知知识蒸馏的SKD-VSR框架,并在公开数据集上进行了广泛的实验,验证了方法的有效性.

外文摘要：With the rapid development in the field of computer vision,Situational Verb Recognition stands as a challenging task in image processing,aimed at identifying semantically complex situations within images.A novel method that integrates the CLIP model with knowledge distillation techniques is proposed,which leverages the powerful cross-modal capabilities of CLIP to capture the subtle associations between images and verb,and employs knowledge distillation to map these associations onto the SVR task,thereby enhancing network performance.Experimental results on standard SWiG datasets indicate that the method sur-passes current state-of-the-art techniques in performance while having the smallest parameter count.The primary contribution is the introduction of a SVR framework that combines CLIP with semantic-aware knowledge distillation,validated through extensive experiments on public datasets,confirming the effectiveness of our method.

外文关键词：

situational verb recognitionknowledge distillationneural networkscomputer vision

作者：

赵城斌

展开 >

作者单位：

西南交通大学计算机与人工智能学院,成都 611756

关键词：

情景动词识别知识蒸馏神经网络计算机视觉

出版年：

2024

DOI：