首页|基于截断技术的鲁棒模糊C均值聚类

基于截断技术的鲁棒模糊C均值聚类

扫码查看
[目的]直接利用模糊C均值(FCM)对原始数据进行聚类,容易导致聚类结果受到噪声和离群点的影响,但通常利用松弛技术对样本点模糊隶属度或空间位置关系进行松弛的解决方法只能降低,而无法完全剔除噪声和离群点的影响.为了解决这个问题,提出了基于截断技术的鲁棒模糊C均值(TRFCM)聚类算法.[方法]基于模糊局部信息C均值(FLICM)聚类模型,通过引入截断技术,提出TRFCM算法.该算法的主要思路为:(1)利用FLICM,在学习数据聚类结构的同时保留样本点的局部邻域结构;(2)基于FLICM的聚类结果动态调整原始数据,使其满足期望的聚类结构;(3)将聚类结构特征学习与原始数据的调整(即截断掉部分样本点),统一在一个优化框架中,从而实现组合最优化.将TRFCM算法与近年来相关算法进行比较以检验TRFCM的参数敏感性、收敛性、鲁棒性、时效性等性能.[结果]实验包括5个部分:参数敏感性与收敛性分析、鲁棒性检验、图像分割实验、Benchmark数据集实验和各算法计算时间对比实验.在参数敏感性和收敛性分析中,TRFCM算法在合适的范围内对参数不敏感且在大多数情况下可以获得良好的聚类效果.同时,算法对各数据集的聚类均可以在20轮迭代内收敛.在鲁棒性检验中,TRFCM的准确率是81.55%,较FLICM高出9.71个百分点,聚类结果更接近于真实数据分布,这证明了 TRFCM对噪声具有良好的鲁棒性.在图像分割实验中,各对比算法对图像的划分在一定程度上都不够准确,部分算法出现了环境划分不完整、不同的部分错分到相同类中、不同的类之间发生重叠等问题.而TRFCM均规避了这些问题,取得了良好的聚类结果.在添加了均值为0、方差为0.05的高斯噪声的图像分割实验中,TRFCM算法对噪声干扰的抑制效果最优.在Benchmark数据集上,对Banknote Authentication、Wine、COIL20、WarpPIE10P、Yale 和 USPS 数据集进行聚类分析,TRFCM 在 ACC、NMI 与purity三种评价指标上都取得了优于其它对比算法的得分.在算法时效性的实验中,在相近的时间内,相较对比算法TRFCM能够获得更好的聚类效果.[结论]将截断技术引入到模糊聚类算法中,可实现对原始数据的动态调整,剔除噪声和离群点对聚类过程的干扰,从而保留更多对聚类有利的数据细节.基于该思路,利用截断技术以相似的方式对以往其他经典的模糊聚类模型进行改进,可以得到一系列的优化算法,为未来的研究提供新的方向.
Truncated robust fuzzy C-means clustering
[Objective]Fuzzy C-means(FCM)performs the clustering directly on original data,and is sensibly influenced by noises and outliers.Currently for the purpose of tackling this issue,the most widely used method is based on mining results of data clustering structure and relaxing the fuzzy membership or local relationship of the sample points.Nevertheless,it can only reduce,but not completely eliminate,the effect of noises and outliers.To address this issue,herein we propose a novel clustering algorithm called truncated robust fuzzy C-means(TRFCM).[Methods]In TRFCM,the truncation technique is introduced based on the fuzzy local information C-mean(FLICM)model.The main idea of the proposed TRFCM is threefold:(1)by utilizing FLICM,the local neighborhood structure of sample points is preserved during the learning of data clustering structure;(2)on the basis of the clustering result of FLICM,original data is adjusted dynamically to meet the desired clustering structure;(3)an optimization framework is constructed to appropriately integrate(1)and(2).[Results]Proposed TRFCM is compared with algorithms developed in recent years.Our comparative experiments include categories,namely(1)parameter sensitivity and convergence analysis,(2)robustness,(3)image segmentation,(4)benchmark dataset and(5)computational time cost.For(1),within the appropriate range,TRFCM behaves insensitively to the parameter and can produce effective clustering results in most cases.Meanwhile,the clustering algorithm on each dataset can converge within 20 rounds of iterations.For(2),the accuracy of TRFCM reaches 81.55%,exceeding FLICM by 9.71 percentage points and clustering results approach closely to the real data distribution,demonstrating the robustness of TRFCM to noises.In(3),all compared algorithms do not behave sufficiently accurately to segmenting the image to a certain extent,and some of them endure troubles such as incomplete segmentation of environment,misclassification of different parts into the same class,and overlap among different classes.Instead,TRFCM avoids all these problems and produces satisfactory clustering results.To further compare the robustness of each algorithm to noises,we perform the image segmentation on images added with Gaussian noises with mean 0 and variance 0.05.Experimental results show that TRFCM performs optimally for the suppression of noise interferences.In(4),the clustering analysis is applied on Banknote Authentication,Wine,COIL20,WarpPIE10P,Yale and USPS datasets.During the experiment,10 repetitions of random initialization are conducted to measure the mean and standard deviation of three clustering metrics:ACC,NMI,and purity.According to experimental results,TRFCM achieves more satisfactory clustering results than other comparative algorithms do in those three aforementioned evaluation metrics.This outcome suggests that,in addition to the application of image segmentation,TRFCM also performs effectively for the division of real-world discrete datasets.In(5),TRFCM achieves more satisfactory clustering results compared to other algorithms under similar time costs.[Conclusions]A fuzzy clustering algorithm called TRFCM is proposed.Based on FLICM,the truncation technique is introduced to enable the improved TRFCM to dynamically adapt to original data and remove the interference of noises and outliers during clustering,so that more useful details for clustering are retained.TRFCM achieves meritorious results in the parameter sensitivity and convergence analysis,robustness testing,image segmentation experiments,benchmark dataset experiment as well as computational time-cost experiment,indicating the effectiveness of the algorithm.Inspired by these merits,previous classical fuzzy clustering models can be similarly modified by the truncation technique as an essential future research direction.

fuzzy C-means(FCM)robustnesstruncation techniqueimage segmentation

高云龙、陈彦光、李辉堆、史曙光、曹超

展开 >

厦门大学萨本栋微米纳米科学技术研究院,福建厦门 361005

厦门大学航空航天学院,福建厦门 361102

自然资源部第三海洋研究所,福建厦门 361005

模糊C均值(FCM) 鲁棒性 截断技术 图像分割

国家自然科学基金福建省自然科学基金

420760582022J01061

2024

厦门大学学报(自然科学版)
厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.449
ISSN:0438-0479
年,卷(期):2024.63(2)
  • 14