多示例学习的簇频繁性分析及双角度融合嵌入

扫码查看

原文链接

万方数据
维普

中文摘要：多示例学习(Multi-Instance Learning,MIL)的训练数据是由若干个未带标记的示例组成的带标记的包,基于嵌入的方法,通过将包嵌入成单向量来解决包表示问题,然而大部分现有方法忽略了示例与包的联系,难以保证所选示例的代表性.同时,单角度的嵌入方法无法有效地提取正、负包的差异信息,使嵌入向量的质量较差.提出一种多示例学习的簇频繁性分析及双角度融合嵌入(FADE).簇频繁性分析技术从正、负子空间中分别筛选部分示例作为子空间的簇心,依据簇心将子空间聚类成簇,再计算簇频繁性指标,选择频繁性较高的簇的簇心组成子空间代表示例集.双角度融合嵌入技术基于正、负子空间代表示例集和差值嵌入函数,分别从正、负角度挖掘信息,融合两个角度信息获得最终的嵌入向量.在29个数据集上与七个MIL算法进行了对比实验,结果表明,FADE的分类准确率总体上优于七个对比算法,在图像数据集上有显著优势,在文本和网页数据集上也表现良好.

外文标题：Cluster frequency analysis and dual-perspective fusion embedding for multi-instance learning

外文摘要：Multi-Instance Learning(MIL)uses labeled bags composed of multiple unlabeled instances as training data.Embedding-based methods address bag representation issues by embedding bags into single vectors.However,existing methods often focus on individual instances and overlook the relationship between instances and bags,which compromises the representativeness of the prototypes.Additionally,the differences between positive and negative bags are not considered by single-angle embedding methods,resulting in weak embedding vector quality.This paper proposes the Cluster Frequency Analysis and Dual-Perspective Fusion Embedding for MIL(FADE).The cluster center selection technique utilizes density peak of instances to choose a certain proportion of instances from positive and negative subspaces as cluster centers.The cluster frequency analysis technique clusters instances within subspaces based on the cluster centers,calculates cluster frequency indicators,and selects high-frequency cluster centers to form the prototype instance set of subspaces.The dual-perspective fusion embedding technique utilizes the prototype instance sets from positive and negative subspaces,along with a difference embedding function,to extract information from both perspectives and fuse the two sets of information to obtain the final embedding vector.The algorithm is tested on 29 datasets and compared with seven MIL algorithms.Experimental results demonstrate that FADE achieves higher overall classification accuracy compared to the seven benchmark algorithms,particularly excelling on image datasets while performing well on text and web datasets.

外文关键词：

MILembedding methodcluster frequencyinstance sourcedual-perspective fusion

作者：

杨梅、张靖宇、闵帆、方宇

展开 >

作者单位：

西南石油大学计算机与软件学院,成都,610500

西南石油大学人工智能研究院,成都,610500

西南石油大学机器学习研究中心,成都,610500

关键词：

多示例学习嵌入方法簇频繁性示例来源双角度融合

基金：

南充市-西南石油大学市校科技战略合作专项资金南充市-西南石油大学市校科技战略合作专项资金浙江省海洋大数据挖掘与应用重点实验室开放课题国家自然科学基金

项目编号：

23XNSYSX008423XNSYSX0062OBDMA20210261976194

出版年：

2024

DOI：

10.13232/j.cnki.jnju.2024.04.001

南京大学学报(自然科学版)

南京大学

南京大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.756

ISSN：0469-5097

年,卷(期)：2024.60(4)