南京大学学报(自然科学版)2024,Vol.60Issue(4) :531-541.DOI:10.13232/j.cnki.jnju.2024.04.001

多示例学习的簇频繁性分析及双角度融合嵌入

Cluster frequency analysis and dual-perspective fusion embedding for multi-instance learning

杨梅 张靖宇 闵帆 方宇
南京大学学报(自然科学版)2024,Vol.60Issue(4) :531-541.DOI:10.13232/j.cnki.jnju.2024.04.001

多示例学习的簇频繁性分析及双角度融合嵌入

Cluster frequency analysis and dual-perspective fusion embedding for multi-instance learning

杨梅 1张靖宇 2闵帆 1方宇1
扫码查看

作者信息

  • 1. 西南石油大学计算机与软件学院,成都,610500;西南石油大学人工智能研究院,成都,610500;西南石油大学机器学习研究中心,成都,610500
  • 2. 西南石油大学计算机与软件学院,成都,610500
  • 折叠

摘要

多示例学习(Multi-Instance Learning,MIL)的训练数据是由若干个未带标记的示例组成的带标记的包,基于嵌入的方法,通过将包嵌入成单向量来解决包表示问题,然而大部分现有方法忽略了示例与包的联系,难以保证所选示例的代表性.同时,单角度的嵌入方法无法有效地提取正、负包的差异信息,使嵌入向量的质量较差.提出一种多示例学习的簇频繁性分析及双角度融合嵌入(FADE).簇频繁性分析技术从正、负子空间中分别筛选部分示例作为子空间的簇心,依据簇心将子空间聚类成簇,再计算簇频繁性指标,选择频繁性较高的簇的簇心组成子空间代表示例集.双角度融合嵌入技术基于正、负子空间代表示例集和差值嵌入函数,分别从正、负角度挖掘信息,融合两个角度信息获得最终的嵌入向量.在29个数据集上与七个MIL算法进行了对比实验,结果表明,FADE的分类准确率总体上优于七个对比算法,在图像数据集上有显著优势,在文本和网页数据集上也表现良好.

Abstract

Multi-Instance Learning(MIL)uses labeled bags composed of multiple unlabeled instances as training data.Embedding-based methods address bag representation issues by embedding bags into single vectors.However,existing methods often focus on individual instances and overlook the relationship between instances and bags,which compromises the representativeness of the prototypes.Additionally,the differences between positive and negative bags are not considered by single-angle embedding methods,resulting in weak embedding vector quality.This paper proposes the Cluster Frequency Analysis and Dual-Perspective Fusion Embedding for MIL(FADE).The cluster center selection technique utilizes density peak of instances to choose a certain proportion of instances from positive and negative subspaces as cluster centers.The cluster frequency analysis technique clusters instances within subspaces based on the cluster centers,calculates cluster frequency indicators,and selects high-frequency cluster centers to form the prototype instance set of subspaces.The dual-perspective fusion embedding technique utilizes the prototype instance sets from positive and negative subspaces,along with a difference embedding function,to extract information from both perspectives and fuse the two sets of information to obtain the final embedding vector.The algorithm is tested on 29 datasets and compared with seven MIL algorithms.Experimental results demonstrate that FADE achieves higher overall classification accuracy compared to the seven benchmark algorithms,particularly excelling on image datasets while performing well on text and web datasets.

关键词

多示例学习/嵌入方法/簇频繁性/示例来源/双角度融合

Key words

MIL/embedding method/cluster frequency/instance source/dual-perspective fusion

引用本文复制引用

基金项目

南充市-西南石油大学市校科技战略合作专项资金(23XNSYSX0084)

南充市-西南石油大学市校科技战略合作专项资金(23XNSYSX0062)

浙江省海洋大数据挖掘与应用重点实验室开放课题(OBDMA202102)

国家自然科学基金(61976194)

出版年

2024
南京大学学报(自然科学版)
南京大学

南京大学学报(自然科学版)

CSTPCDCSCD北大核心
影响因子:0.756
ISSN:0469-5097
参考文献量27
段落导航相关论文