基于特征加权混合隶属度的模糊孪生支持向量机

A fuzzy twin support vector machines based on feature-weighted hybrid affiliations

吕思雨 ¹赵嘉 ¹吴烈阳 ²张翼英 ³韩龙哲¹

扫码查看

作者信息

1. 南昌工程学院信息工程学院,江西南昌 330099;南昌工程学院南昌市智慧城市物联感知与协同计算重点实验室,江西南昌 330099
2. 江西省交通监控指挥中心,江西南昌 330099
3. 天津科技大学人工智能学院,天津 300457
折叠

摘要

模糊孪生支持向量机(FTSVM)忽略了不同特征间的差异,导致核函数或距离的计算无法准确反映样本间的相似性,使FTSVM在处理含有大量不相关或弱相关特征的高维数据分类时,难以达到良好分类效果;且隶属度的设计未有效区分离群点或噪声.针对以上问题,提出了一种基于特征加权混合隶属度的FM-FTSVM.首先计算每个特征的信息增益,并依据信息增益值的大小为特征赋予权重,降低不相关或弱相关特征的作用,使其能更好地应用于高维数据分类;然后,为每一类样本构造一个最小包围球计算基于紧密度的特征加权隶属度,并结合基于距离的特征加权隶属度得到特征加权混合隶属度,综合考虑样本点到类中心的特征加权欧式距离和样本间的紧密程度,可更好识别离群点或噪声数据;最后,融合特征加权核函数,降低不相关特征对核函数或距离计算产生的影响.与对比算法在人工数据集、高维数据集和UCI数据集上进行比较,发现本文提出的方法在区分离群点、噪声和有效样本上有明显优势,且在高维数据集上可获得更好分类效果.

Abstract

The fuzzy twin support vector machine(FTSVM)ignores the differences between different features,resulting in the fact that the calculation of kernel function or distance cannot accurately reflect the similarity between the samples,which makes FTSVM fail to achieve good classification results when dealing with high-dimensional data classification problems containing a large number of irrelevant or weakly correlated features;and the design of the degree of affiliation does not ef-fectively distinguish the outliers or noise.To address the above problems,this paper proposes a fuzzy twin support vector ma-chine(FM-FTSVM)based on feature-weighted hybrid affiliation.Firstly,the information gain of each feature is calculated,and the weights are assigned to the features based on the magnitude of the information gain value,which reduces the role of irrelevant or weakly relevant features and enables them to be better applied to the classification problem of high-dimensional data;then,a minimum enclosing sphere is constructed for each class of samples to calculate the feature-weighted affiliation based on the closeness,and combined with the feature-weighted affiliation based on the distance to obtain the feature-weigh-ted hybrid affiliation,which is a combination of the feature-weighted affiliation based on the distance from sample points to the center of the class.The combined consideration of the feature-weighted Euclidean distance from the sample point to the class center and the degree of closeness between samples can better identify outliers or noisy data.Finally,the fusion fea-ture-weighted kernel function reduces the impact of irrelevant features on the calculation of the kernel function or distance.Compared with the algorithm on artificial dataset,high dimensional dataset and UCI dataset,the proposed algorithm in this paper has advantages in distinguishing outliers,noise and valid samples,and it can also achieve better classification effectin high dimensional data.

关键词

模糊孪生支持向量机/特征加权/信息增益/紧密度/隶属度/高维数据

Key words

fuzzy twin support vector machine/feature weighting/information gain/affinity/membership/high-dimensional data

引用本文复制引用

基金项目

国家自然科学基金(52069014)

国家自然科学基金(61962036)

江西省重点研发计划(20192BBE50076)

江西省重点研发计划(20203BBGL73225)

出版年

2024

南昌工程学院学报

南昌工程学院

南昌工程学院学报

影响因子：0.272

ISSN：1006-4869

参考文献量35

段落导航