基于变分自编码器的近似聚合查询优化方法

Optimization method of approximate aggregate query based on variational auto-encoder

黄龙森 ¹房俊 ¹周云亮 ¹郭志城¹

扫码查看

作者信息

1. 北方工业大学信息学院,北京 100144;北方工业大学大规模流数据集成与分析技术北京市重点实验室,北京 100144
折叠

摘要

针对偏态数据分布不平衡,传统近似聚合查询方法难以抽样生成偏态分布数据的问题,提出基于优化的变分自编码器的近似聚合查询方法,研究近似聚合查询方法对偏态分布数据的近似聚合查询准确率的影响.在预处理阶段对偏态分布数据进行分层分组,对变分自编码器生成模型的网络结构和损失函数进行优化,降低近似聚合查询相对误差.实验结果表明,与基准方法相比,近似聚合查询对偏态分布数据的查询相对误差更小,且随着偏态系数的提高,查询相对误差的上升趋势更平缓.

Abstract

An optimized variational self-encoder-based approximate aggregation query method was proposed for the problem of imbalanced distribution of biased data,which makes it difficult to sample biased distribution data with traditional approximate aggregation query methods.The effect of approximate aggregation query method on the accuracy of approximate aggregation query for biased distribution data was analyzed.The bias-distributed data were hierarchically grouped in the preprocessing stage,and the network structure and loss function of the variational self-encoder generation model were optimized to reduce the approximate aggregated query relative error.The experimental results show that the query relative error of the approximate aggregation query is smaller for skewness distribution data compared with the benchmark method,and the rising trend of the query relative error is smoother as the skewness coefficient increases.

关键词

近似查询处理/偏态分布/机器学习/变分自编码器/分组抽样

Key words

approximate query processing/skewness distribution/machine learning/variational auto-encoder/group sampling

引用本文复制引用

基金项目

国家自然科学基金国际(地区)合作与交流项目(62061136006)

国家自然科学基金重点项目(61832004)

出版年

2024

浙江大学学报(工学版)

浙江大学

浙江大学学报(工学版)

CSTPCD北大核心

影响因子：0.625

ISSN：1008-973X

参考文献量26

段落导航