首页|多模态跨级特征知识转移下音频目标检测网络

多模态跨级特征知识转移下音频目标检测网络

扫码查看
声音作为物体固有属性之一能为目标检测提供有价值的信息,当前仅通过监测环境声进行目标定位的方法鲁棒性较低,为解决这一问题提出了跨级特征知识转移下的多模态自监督目标检测网络.首先,针对教师网络和学生网络同级特征间学习能力有限的问题,设计了基于注意力融合的多教师跨级特征知识转移损失,通过注意力融合的方式融合学生的深层和浅层特征,更高效地学习对应的教师中间层特征,以提取更多的知识,同时结合KL散度,实现教师和学生网络中间层特征的对齐.此外,为了解决定位信息的缺失的问题,加入定位蒸馏损失,通过让学生的包围盒分布去拟合教师的包围盒分布的方式,来获取更多的定位信息.在多模态视听检测MAVD数据集中对网络进行训练,该网络的mAP值在IOU值为0.5,0.75和平均的情况下较基线网络分别有6.71%,14.36%和10.32%的提升.实验结果证明了该检测网络的优越性.
Audio object detection network with multimodal cross level feature knowledge transfer
As one of the inherent properties of objects,sound can provide valuable information for target detection.At present,the method of target positioning only by monitoring environmental sound is less ro-bust.To solve this problem,a multi-modal self-supervised target detection network under cross-level fea-ture knowledge transfer was proposed.First of all,in view of the teachers network and students at the same characteristics of network learning ability of the limited problem,design based on the integration of teachers across level knowledge transfer loss,through the way of attention fusion deep and shallow charac-teristics of students,more efficient learning to the corresponding teacher middle layer characteristics,to ex-tract more knowledge,combined with KL divergence,realize the alignment of teachers and students net-work alignment.In addition,in order to solve the problem of missing localization information,localization distillation loss was added,and more localization information was obtained by fitting the distribution of the teacher.With the network trained in the multimodal audiovisual detection MAVD dataset,the mAP val-ues improve by 6.71%,14.36%and 10.32%from the baseline network at IOU values of 0.5,0.75 and average,respectively.The experimental results demonstrate the superiority of this detection network.

multimodalknowledge distillationobject detectionself-superviseddeep learning

刘诗蓓、陈莹

展开 >

江南大学 轻工过程先进控制教育部重点实验室,江苏 无锡 214122

多模态 知识蒸馏 目标检测 自监督 深度学习

国家自然科学基金资助项目

62173160

2024

光学精密工程
中国科学院长春光学精密机械与物理研究所 中国仪器仪表学会

光学精密工程

CSTPCD北大核心
影响因子:2.059
ISSN:1004-924X
年,卷(期):2024.32(2)
  • 31