首页|基于广义分布的区间函数型聚类方法

基于广义分布的区间函数型聚类方法

扫码查看
区间函数型聚类是一种用来分析连续型高频数据的方法.已有均匀分布下的区间函数型聚类方法,不能充分利用区间内部的分布信息.而且均匀分布的假设,不符合很多数据的实际分布情况,造成聚类效果和稳定性较差.针对这些问题,文章考虑数据分布的实际情况,使用原始数据的均值和标准差,改进已有的中点-半径法,提出了基于广义分布的区间函数型聚类方法.该方法扩大了区间函数型聚类的使用范围,不仅可以更好地描述区间内部的分布情况,而且能够充分地利用和获取数据信息的内在特征,提高聚类结果的有效性和合理性.使用蒙特卡罗方法,计算聚类效果的内部指标,比较文章所提方法与已有均匀分布下的区间函数型聚类方法的优劣,结果显示文章提出的方法优于已有均匀分布下的区间函数型聚类方法.最后将文章所提方法应用到不同城市的大气污染物浓度的聚类分析中,验证该方法不仅可以有效地解决实际问题,且与已有方法相比具有明显优势.
Interval Function Type Clustering Method Under Generalized Distribution
Interval function clustering is a method used to analyze continuous high-frequency data.The existing interval function based clustering under uniform distri-bution cannot fully utilize the distribution information within the interval.Moreover,the assumption of uniform distribution does not conform to the distribution of many data,resulting in poor clustering performance and stability.In response to these issues,this article considers the actual situation of data distribution.Using the mean and standard deviation of the original data,we improve the existing midpoint-radius method and propose an interval function based clustering method based on gener-alized distribution.This method expands the range of use of interval functional clustering and better describes the distribution within the interval.And it can fully utilize and obtain the inherent features of data information,improve the effectiveness and rationality of clustering results.Using the Monte Carlo method,we calculate the internal indicator and compare the advantages and disadvantages of the proposed method with existing interval function clustering under the assumption of uniform distribution.The results show that the proposed method in this article is superior to existing interval function clustering methods under uniform distribution.Finally,the proposed method in this article is applied to cluster analysis of atmospheric pol-lutant concentrations in different cities.It has been verified that this method not only effectively solves practical problems,but also has obvious advantages compared to existing methods.

Interval function datamean-standard deviation distancegeneralized distributionclustering analysis

孙利荣、蒋晨锴、田颖华、郭宝才

展开 >

浙江工商大学统计与数学学院,杭州 310018

浙江工商大学统计数据工程技术与应用协同创新中心,杭州 310018

区间函数型数据 均值-标准差距离 广义分布 聚类分析

国家社会科学基金重点项目

23ATJ009

2024

系统科学与数学
中国科学院数学与系统科学研究院

系统科学与数学

CSTPCD北大核心
影响因子:0.425
ISSN:1000-0577
年,卷(期):2024.44(8)
  • 14