Cluster frequency analysis and dual-perspective fusion embedding for multi-instance learning
Multi-Instance Learning(MIL)uses labeled bags composed of multiple unlabeled instances as training data.Embedding-based methods address bag representation issues by embedding bags into single vectors.However,existing methods often focus on individual instances and overlook the relationship between instances and bags,which compromises the representativeness of the prototypes.Additionally,the differences between positive and negative bags are not considered by single-angle embedding methods,resulting in weak embedding vector quality.This paper proposes the Cluster Frequency Analysis and Dual-Perspective Fusion Embedding for MIL(FADE).The cluster center selection technique utilizes density peak of instances to choose a certain proportion of instances from positive and negative subspaces as cluster centers.The cluster frequency analysis technique clusters instances within subspaces based on the cluster centers,calculates cluster frequency indicators,and selects high-frequency cluster centers to form the prototype instance set of subspaces.The dual-perspective fusion embedding technique utilizes the prototype instance sets from positive and negative subspaces,along with a difference embedding function,to extract information from both perspectives and fuse the two sets of information to obtain the final embedding vector.The algorithm is tested on 29 datasets and compared with seven MIL algorithms.Experimental results demonstrate that FADE achieves higher overall classification accuracy compared to the seven benchmark algorithms,particularly excelling on image datasets while performing well on text and web datasets.