首页|一种基于Spark的配置优化技术

一种基于Spark的配置优化技术

扫码查看
在快速进行海量数据处理的电力应用中,Spark变得越来越重要,但其配置参数空间大且参数之间关联关系复杂,基于经验通过手动调整参数以获得最佳性能极其困难,故而提出一种基于Spark的配置优化方法.选取对Spark性能影响活跃的配置参数,通过MCMC采样和生成对抗网络(GAN)生成数据集;通过分层建模构建性能模型;通过粒子群算法在参数空间有效搜索应用程序的最佳配置.实验结果表明,采用所提出的方法使得Spark的性能相比经验调优平均提高了 25%.
A Spark-based Configuration Optimization Technology
Spark is becoming more and more important in power applications where massive data should be rapidly processed,but its configuration parameter space is large and the relationship between parameters is complex.It is extremely difficult to manually adjust parameters based on experience to obtain the best performance.Therefore,this paper proposes a configuration optimization method based on Spark.The configuration parameters that have an active impact on Spark performance are select-ed,and the dataset is generated through MCMC sampling and generative adversarial network(GAN);The performance model is constructed through hierarchical modeling.The optimal configuration of the application is efficiently searched in the parame-ter space by the particle swarm optimization(PSO)algorithm.The experimental results show that the performance of Spark is improved by an average of 25%compared with empirical tuning by the method based on experience.

Sparkparameter configurationMCMC algorithmhierarchical modelingPSO algorithm

沈伍强、沈桂泉、许明杰、杨春松、王召

展开 >

广东电网有限责任公司信息中心,广东,广州 510000

国电南瑞科技股份有限公司,江苏,南京 210000

Spark 参数配置 MCMC算法 分层建模 粒子群算法

南方电网公司科技项目

037800KK52190012

2024

微型电脑应用
上海市微型电脑应用学会

微型电脑应用

CSTPCD
影响因子:0.359
ISSN:1007-757X
年,卷(期):2024.40(2)
  • 8