Full process parallel genetic algorithm for Bayesian network structure learning
To solve the problem of algorithm performance degradation in Bayesian network(BN)structure learning in case of massive data,a full process parallel genetic algorithm(GA)for BN structure learning is proposed based on the Spark framework(SparkGA-BN).SparkGA-BN includes three parts:parallel calculation of mutual information,parallelization of genetic operators,and parallelization of fitness evaluation.Parallel computation of mutual information is employed to reduce the search space.Broadcasting is used to perform selection operation on the entire population by propagating population information and selection information before evolution.Selection and crossover operators share selection information to evolve efficiently and reduce disk write time.Intermediate data generated during the constraint and scoring stages are stored in memory to improve data reuse and overall execution efficiency.Experimental results show that the proposed algorithm outperforms the comparison algorithms in terms of execution efficiency and learning accuracy.