深度语义关联学习的基于图像视觉数据跨域检索

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：基于图像的视觉数据跨域检索任务旨在搜索与输入图像在语义上一致或外形上相似的跨域图像和三维模型数据,其面临的主要问题是处理跨域数据之间的模态异质性.现有方法通过构建公共特征空间,采用域适应算法或深度度量学习算法实现跨域特征的域对齐或语义对齐,其有效性仅在单一类型的跨域检索任务中进行了验证.提出一种基于深度语义关联学习的方法,以适用多种类型的基于图像的跨域视觉数据检索任务.首先,使用异构网络提取跨域数据的初始视觉特征;然后,通过构建公共特征空间实现初始特征映射,以便进行后续的域对齐和语义对齐;最后,通过域内鉴别性学习、域间一致性学习和跨域相关性学习,消除跨域数据特征之间的异质性,探索跨域数据特征之间的语义相关性,并为检索任务生成鲁棒且统一的特征表示.实验结果表明,该方法在TU-Berlin、IM2MN和MI3DOR数据集中的平均精度均值(mAP)分别达到0.448、0.689和0.874,明显优于对比方法.

外文标题：Image-Based Cross-Domain Visual-Data Retrieval with Deep Semantic Correlation Learning

外文摘要：Image-based cross-domain retrieval of visual data is performed to identify cross-domain images and three-dimensional model data that are semantically consistent with or similar in appearance to an input image.In this task,the modal heterogeneity between cross-domain data must be addressed to achieve cross-domain correspondence between the query images and target objects.Existing methods achieve domain or semantic alignment of cross-domain features by constructing a common feature space and using a domain-adaptation or depth metric algorithm.The effectiveness of these methods has only been verified in a single type of cross-domain retrieval task.To address the above issues,a method based on deep semantic correlation learning is proposed for many types of image-based cross-domain visual-data retrieval tasks.First,heterogeneous networks are used to extract the original visual features of cross-domain data.Subsequently,a common feature space is constructed to map the original features for subsequent domain and semantic alignments.Finally,intra-modal discrimination learning,inter-modal consistency learning,and cross-modal correlation learning are performed to eliminate the heterogeneity among cross-domain features,determine the semantic relevance among cross-domain data features,and generate robust and uniform feature representations for retrieval tasks.Experimental results show that the mean Average Precision(mAP)values of this method on the TU-Berlin,IM2MN,and MI3DOR datasets are 0.448,0.689,and 0.874,respectively,significantly better than comparative methods.

外文关键词：

cross-domain retrievalfeature alignmentdomain alignmentsketchreal imagethree-dimensional modelcorrelation learning

作者：

焦世超、关日鹏、况立群、熊风光、韩燮

展开 >

作者单位：

中北大学计算机科学与技术学院,山西太原 030051

机器视觉与虚拟现实山西省重点实验室,山西太原 030051

山西省视觉信息处理及智能机器人工程研究中心,山西太原 030051

关键词：

跨域检索特征对齐域对齐草图真实图像三维模型相关性学习

基金：

国家自然科学基金国家自然科学基金山西省科技重大专项"揭榜挂帅"项目山西省科技成果转化引导专项山西省回国留学人员科研项目山西省基础研究计划

项目编号：

62272426621062382022011504010212021040213010552020-113202203021222027

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0067501

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(5)

参考文献量49