基于决策树模型的非结构化云数据分块存储方法

Unstructured Cloud Data Block Storage Method Based on Decision Tree Model

万磊¹

扫码查看

作者信息

1. 上海邮电设计咨询研究院有限公司,北京 100070
折叠

摘要

以降低非结构化云数据存储压力、提升非结构化云数据存储能力为目的,研究基于决策树模型的非结构化云数据分块存储方法.采用数据清洗、数据选择、数据变换、归一化处理等过程预处理非结构化云数据,降低非结构化云数据维度.采用选择随机性特征分析方法,明确预处理后非结构化云数据间关联维度分布特征量与相似度的相关性,并以此为基础,通过样本扩展和密度融合的方法提取非结构化云数据特征.采用改进决策树算法对提取的非结构化云数据特征集进行模糊分类处理,将各类别非结构化云数据划分为相同规格的数据块,通过范德蒙矩阵编码、解码处理,在多个适配度较高的节点上完成非结构化云数据分块存储.实验结果表明,该方法有效计算比值达到0.8,具有较优的存储能力;压缩因子均值达到6.7,可显著降低非结构化云数据存储压力.

Abstract

In order to reduce the storage pressure of unstructured cloud data and improve the storage capacity of unstructured cloud data,the unstructured cloud data block storage method based on decision tree model is studied.Data cleaning,data selec-tion,data transformation and normalization are used to preprocess unstructured cloud data to reduce the dimension of unstruc-tured cloud data.The method of random feature analysis is adopted to clarify the correlation between the distribution feature quantity of correlation dimensions and the similarity of the unstructured cloud data after preprocessing.Based on this,the fea-tures of unstructured cloud data are extracted by sample expansion and density fusion.The improved decision tree algorithm is used to perform fuzzy classification on the extracted feature set of unstructured cloud data.All kinds of unstructured cloud data are divided into data blocks of the same specification.Through Vandermonde matrix encoding and decoding,unstructured cloud data are stored in blocks on multiple nodes with higher fitness.The experimental results show that the effective calculation ratio of this method reaches 0.8,and it has better storage capacity.The mean compression factor reaches 6.7,which can significant-ly reduce the storage pressure of unstructured cloud data.

关键词

决策树模型/非结构化/云数据/分块存储/预处理/范德蒙矩阵

Key words

decision tree model/unstructured/cloud data/block storage/preprocess/Vandermonde matrix

引用本文复制引用

出版年

2024

微型电脑应用

上海市微型电脑应用学会

微型电脑应用

CSTPCD

影响因子：0.359

ISSN：1007-757X

段落导航