Computational Materials Science2022,Vol.20111.DOI:10.1016/j.commatsci.2021.110899

Accurate prediction of band gap of materials using stacking machine learning model

Wang, Teng Zhang, Kefei The, Jesse Yu, Hesheng
Computational Materials Science2022,Vol.20111.DOI:10.1016/j.commatsci.2021.110899

Accurate prediction of band gap of materials using stacking machine learning model

Wang, Teng 1Zhang, Kefei 1The, Jesse 2Yu, Hesheng1
扫码查看

作者信息

  • 1. Minist Educ
  • 2. Univ Waterloo
  • 折叠

Abstract

The prediction of the band gap of semiconductor materials using machine learning has gradually progressed in recent years. However, the performance of such prediction still needs further optimization. This work applies the stacking approach, which fuses the output of multiple baseline models, to further enhance the performance of band gap regression. Ten baseline models are optimized to predict the band gap of materials. Afterwards, the output of models with relatively better performance is used as the input features of the stacking approach. This research employed a benchmark dataset containing 3896 inorganic compounds, with 136 dimensions, and a newly established complex database (E-AFLOW), containing 21,534 compounds with 206 dimensions, to prove the effectiveness of different models. The trained stacking model based on the E-AFLOW database is then applied to determine the band gaps of different new compounds. The results demonstrate that the stacking model has the highest R-2 value, at 0.920, in benchmark dataset and a value of 0.917 in the E-AFLOW dataset, with 5-flod cross validation. For the E-AFLOW dataset, the improvement percentage of RMSE, MAE, MAPE, and R-2 of the stacking model to GBDT, XGB, RF, and LGB input baseline models are between 3.06%-17.54%, 8.12%-33.25%, 7.69%-33.33%, and 0.66%-4.44%, respectively. In real applications, the trained stacking model based on the E-AFLOW dataset can predict the band gaps of 78.57% of new materials within +/- 8.00% of observed measurements. The minimum deviation between the predicted and observed values is -0.02%, and the maximum is 14.27%. These results convincingly demonstrate the excellent performance of stacking approach in band gap regression.

Key words

Machine learning/Stacking approach/Band gap/Regression/CRYSTALLOGRAPHY OPEN DATABASE/OPEN-ACCESS COLLECTION/CRYSTAL-STRUCTURE/NEURAL-NETWORKS/ENSEMBLE/SINGLE/IV/NA/FE/CO

引用本文复制引用

出版年

2022
Computational Materials Science

Computational Materials Science

EISCI
ISSN:0927-0256
被引量14
参考文献量81
段落导航相关论文