基于深度学习的非结构化大数据密度聚类仿真
Deep Learning-Based Density Clustering Simulation of Unstructured Big Data
胡涛 1王中杰 1张连明 2陈晓锁1
作者信息
- 1. 湖南交通工程学院电气与信息工程学院,湖南 衡阳 421001
- 2. 湖南师范大学信息科学与工程学院,湖南 长沙 410081
- 折叠
摘要
常规的非结构化大数据密度聚类方法耗时长,且易出现数据密度分配错误的情况,影响数据聚类精度.因此,提出一种基于深度学习的非结构化大数据快速密度聚类方法.采用数据密度函数求解每个非结构化大数据密度值,使用邻近搜索技术找出各簇最佳中心,选用Alex Net网络建立数据聚类学习框架,利用映射方式提取数据特征矢量,通过损失函数得出伪标签并作为反向传播依据.为了提升模型聚类速度及精度,引入小批量梯度下降优化聚类模型参数,实现非结构化大数据密度聚类.实验结果表明,所提方法能够使密度相似数据紧密、密度相差较大数据稀疏,令数据密度聚类效果良好.
Abstract
Conventionally,traditional methods are time-consuming and prone to incorrect data density allocation,which affects the data clustering accuracy.Therefore,this paper proposed a fast density clustering method for non-structural big data based on deep learning.Firstly,the data density function was used to calculate all density values of unstructured big data.Secondly,the proximity search technology was adopted to find the best center of each cluster.Then,the Alex Net network was used to construct a learning framework for data clustering.Meanwhile,data feature vectors were extracted by mapping.Thirdly,pseudo labels were obtained by loss function as a basis for backpropaga-tion.In order to improve the clustering speed and accuracy of the model,small-lot gradient descent was used to opti-mize the model parameter,thus achieving the non-structural big data density clustering.Experimental results show that the proposed method can make the data with similar density integrate more closely with each other and make the data with large density differences sparse,so it has good data density clustering effect.
关键词
深度学习/非结构化大数据/数据密度/伪标签Key words
Deep learning/Non-structural big data/Data density/Pseudo label引用本文复制引用
基金项目
湖南省教育厅教学改革研究项目(HNJG-2021-1275)
湖南省教育厅科学研究重点项目(22A0056)
出版年
2024