基于样本全局相似度和Relief的缺失标记特征选择

Missing Label Feature Selection via Global Similarity of Samples and Relief

扫码查看

原文链接

NETL
NSTL
维普
万方数据

中文摘要：原始的Relief模型只能分析完备的单标记数据,并且相关的改进模型也未涉及样本之间的全局相似性等,由此基于样本全局相似度和Relief模型设计了一种缺失标记特征选择方法.首先,为了补全样本对应的缺失标记,在每个标记下将全体样本划分为缺失集和完备集,通过计算样本间的欧式距离,为缺失标记的样本搜索其在完备集中的最近邻样本,从而提出了一种补全标记策略,补充样本缺失的标记.其次,为了衡量不同的样本在全局空间上的相似关系,使用余弦相似度函数计算样本间的特征相似度,基于样本标记集的重叠程度计算样本间的标记相似度,结合上述两种相似度构建了样本全局相似度.然后,为了确定目标样本在多标记决策系统中的同类近邻和异类近邻,根据目标样本与其余样本之间的全局相似度,定义了样本间的同异类判别关系.最后,结合改进的Relief模型构建新的特征权重迭代公式,进而设计了基于样本全局相似度和Relief的缺失标记特征选择算法.在8 个多标记数据集上分析和测试所提算法的分类性能,实验结果表明所提算法是有效的.

外文摘要：The original Relief model can only analyze complete single-label data,and the relevant improved models have not considered the global similarity between samples.Therefore,a missing label feature selection method is designed based on the global similarity between samples and the Relief model.Firstly,in order to com-plete the missing labels corresponding to the samples,all samples are divided into missing sets and complete sets under each label.By calculating the Euclidean distance between samples,the nearest neighbors of the missing la-bel samples in the complete set are searched,thereby proposing a label completion strategy to supplement the missing labels of the samples.Secondly,to measure the similarity relationship of different samples in the global space,the cosine similarity function is used to calculate the feature similarity between samples,and the label simi-larity between samples is calculated based on the overlap degree of the sample label sets,combining the above two kinds of similarity to construct the global similarity between samples.Then,in order to determine the same-class neighbors and different-class neighbors of the target sample in the multi-label decision system,the same-dif-ferent class discrimination relationship between samples is defined based on the global similarity between the tar-get sample and the rest of the samples.Finally,a new feature weight iteration formula is constructed based on the improved Relief model,and a missing label feature selection algorithm based on sample global similarity and Re-lief is designed.The classification performance of the proposed algorithm is analyzed and tested on 8 multi-label datasets,and the experimental results show that the proposed algorithm is effective.

外文关键词：

feature selectionmulti-label learningmissing labelRelief model

作者：

孙林、丰昌武、陈雨生、胡一飞

展开 >

作者单位：

天津科技大学人工智能学院,天津 300457

河南师范大学计算机与信息工程学院,河南新乡 453007

关键词：

特征选择多标记学习缺失标记 Relief模型

基金：

国家自然科学基金

项目编号：

62076089

出版年：

2024

DOI：

10.16112/j.cnki.53-1223/n.2024.02.131

昆明理工大学学报(自然科学版)

昆明理工大学

昆明理工大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.516

ISSN：1007-855X

年,卷(期)：2024.49(2)

参考文献量24