首页|基于特征加权混合隶属度的模糊孪生支持向量机

基于特征加权混合隶属度的模糊孪生支持向量机

扫码查看
模糊孪生支持向量机(FTSVM)忽略了不同特征间的差异,导致核函数或距离的计算无法准确反映样本间的相似性,使FTSVM在处理含有大量不相关或弱相关特征的高维数据分类时,难以达到良好分类效果;且隶属度的设计未有效区分离群点或噪声.针对以上问题,提出了一种基于特征加权混合隶属度的FM-FTSVM.首先计算每个特征的信息增益,并依据信息增益值的大小为特征赋予权重,降低不相关或弱相关特征的作用,使其能更好地应用于高维数据分类;然后,为每一类样本构造一个最小包围球计算基于紧密度的特征加权隶属度,并结合基于距离的特征加权隶属度得到特征加权混合隶属度,综合考虑样本点到类中心的特征加权欧式距离和样本间的紧密程度,可更好识别离群点或噪声数据;最后,融合特征加权核函数,降低不相关特征对核函数或距离计算产生的影响.与对比算法在人工数据集、高维数据集和UCI数据集上进行比较,发现本文提出的方法在区分离群点、噪声和有效样本上有明显优势,且在高维数据集上可获得更好分类效果.
A fuzzy twin support vector machines based on feature-weighted hybrid affiliations
The fuzzy twin support vector machine(FTSVM)ignores the differences between different features,resulting in the fact that the calculation of kernel function or distance cannot accurately reflect the similarity between the samples,which makes FTSVM fail to achieve good classification results when dealing with high-dimensional data classification problems containing a large number of irrelevant or weakly correlated features;and the design of the degree of affiliation does not ef-fectively distinguish the outliers or noise.To address the above problems,this paper proposes a fuzzy twin support vector ma-chine(FM-FTSVM)based on feature-weighted hybrid affiliation.Firstly,the information gain of each feature is calculated,and the weights are assigned to the features based on the magnitude of the information gain value,which reduces the role of irrelevant or weakly relevant features and enables them to be better applied to the classification problem of high-dimensional data;then,a minimum enclosing sphere is constructed for each class of samples to calculate the feature-weighted affiliation based on the closeness,and combined with the feature-weighted affiliation based on the distance to obtain the feature-weigh-ted hybrid affiliation,which is a combination of the feature-weighted affiliation based on the distance from sample points to the center of the class.The combined consideration of the feature-weighted Euclidean distance from the sample point to the class center and the degree of closeness between samples can better identify outliers or noisy data.Finally,the fusion fea-ture-weighted kernel function reduces the impact of irrelevant features on the calculation of the kernel function or distance.Compared with the algorithm on artificial dataset,high dimensional dataset and UCI dataset,the proposed algorithm in this paper has advantages in distinguishing outliers,noise and valid samples,and it can also achieve better classification effectin high dimensional data.

fuzzy twin support vector machinefeature weightinginformation gainaffinitymembershiphigh-dimensional data

吕思雨、赵嘉、吴烈阳、张翼英、韩龙哲

展开 >

南昌工程学院信息工程学院,江西南昌 330099

南昌工程学院南昌市智慧城市物联感知与协同计算重点实验室,江西南昌 330099

江西省交通监控指挥中心,江西南昌 330099

天津科技大学人工智能学院,天津 300457

展开 >

模糊孪生支持向量机 特征加权 信息增益 紧密度 隶属度 高维数据

国家自然科学基金国家自然科学基金江西省重点研发计划江西省重点研发计划

520690146196203620192BBE5007620203BBGL73225

2024

南昌工程学院学报
南昌工程学院

南昌工程学院学报

影响因子:0.272
ISSN:1006-4869
年,卷(期):2024.43(1)
  • 35