多模态跨级特征知识转移下音频目标检测网络

Audio object detection network with multimodal cross level feature knowledge transfer

扫码查看

原文链接

维普
万方数据

中文摘要：声音作为物体固有属性之一能为目标检测提供有价值的信息,当前仅通过监测环境声进行目标定位的方法鲁棒性较低,为解决这一问题提出了跨级特征知识转移下的多模态自监督目标检测网络.首先,针对教师网络和学生网络同级特征间学习能力有限的问题,设计了基于注意力融合的多教师跨级特征知识转移损失,通过注意力融合的方式融合学生的深层和浅层特征,更高效地学习对应的教师中间层特征,以提取更多的知识,同时结合KL散度,实现教师和学生网络中间层特征的对齐.此外,为了解决定位信息的缺失的问题,加入定位蒸馏损失,通过让学生的包围盒分布去拟合教师的包围盒分布的方式,来获取更多的定位信息.在多模态视听检测MAVD数据集中对网络进行训练,该网络的mAP值在IOU值为0.5,0.75和平均的情况下较基线网络分别有6.71%,14.36%和10.32%的提升.实验结果证明了该检测网络的优越性.

外文摘要：As one of the inherent properties of objects,sound can provide valuable information for target detection.At present,the method of target positioning only by monitoring environmental sound is less ro-bust.To solve this problem,a multi-modal self-supervised target detection network under cross-level fea-ture knowledge transfer was proposed.First of all,in view of the teachers network and students at the same characteristics of network learning ability of the limited problem,design based on the integration of teachers across level knowledge transfer loss,through the way of attention fusion deep and shallow charac-teristics of students,more efficient learning to the corresponding teacher middle layer characteristics,to ex-tract more knowledge,combined with KL divergence,realize the alignment of teachers and students net-work alignment.In addition,in order to solve the problem of missing localization information,localization distillation loss was added,and more localization information was obtained by fitting the distribution of the teacher.With the network trained in the multimodal audiovisual detection MAVD dataset,the mAP val-ues improve by 6.71%,14.36%and 10.32%from the baseline network at IOU values of 0.5,0.75 and average,respectively.The experimental results demonstrate the superiority of this detection network.

外文关键词：

multimodalknowledge distillationobject detectionself-superviseddeep learning

作者：

刘诗蓓、陈莹

展开 >

作者单位：

江南大学轻工过程先进控制教育部重点实验室,江苏无锡 214122

关键词：

多模态知识蒸馏目标检测自监督深度学习

基金：

国家自然科学基金资助项目

项目编号：

62173160

出版年：

2024

DOI：

10.37188/OPE.20243202.0237

光学精密工程

中国科学院长春光学精密机械与物理研究所中国仪器仪表学会

光学精密工程

CSTPCD北大核心

影响因子：2.059

ISSN：1004-924X

年,卷(期)：2024.32(2)

参考文献量31