含能材料2025,Vol.33Issue(1) :1-12.DOI:10.11943/CJEM2024276

机器学习辅助的烃类分子性质预测与燃料的高通量筛选

Machine Learning Assisted Property Prediction of Hydrocarbon Molecules and High Throughput Screening for Fuel

侯放 齐晓宁 刘睿宸 李玲 王莅 张香文 李国柱
含能材料2025,Vol.33Issue(1) :1-12.DOI:10.11943/CJEM2024276

机器学习辅助的烃类分子性质预测与燃料的高通量筛选

Machine Learning Assisted Property Prediction of Hydrocarbon Molecules and High Throughput Screening for Fuel

侯放 1齐晓宁 2刘睿宸 1李玲 3王莅 4张香文 4李国柱4
扫码查看

作者信息

  • 1. 天津大学化工学院,天津 300072
  • 2. 中国科学院计算技术研究所,北京 100190;中国科学院大学,北京 100190
  • 3. 承德钒钛新材料有限公司,河北 承德 067102
  • 4. 天津大学化工学院,天津 300072;先进燃料与化学推进剂教育部重点实验室,天津 300072;物质绿色创造与制造海河实验室,天津 300192
  • 折叠

摘要

通过数据收集、结构优化和量化计算,建立了碳数从1到50的2899个烃类分子"结构-多种性质"数据集,性质包含熔点(Tm)、沸点(Tb)、密度(ρ)、0 K下的内能(U0)、298.15 K下的内能(U)、298.15 K下的焓(H)、298.15 K下的吉布斯自由能(G).以表示分子结构的库伦矩阵作为模型输入,建立了决策树回归模型、交叉验证的最小绝对收缩和选择算子回归模型、交叉验证的岭回归模型、极限梯度提升回归模型4种不同的机器学习模型.通过比较不同模型预测性质的精度得出,极限梯度提升回归模型更适用于预测烃类分子的熔点、沸点、密度等通过实验测得的性质,交叉验证的岭回归模型更适用于预测烃类分子的内能、焓、吉布斯自由能等能量的通过理论计算得到的性质.同时,最优的机器学习组合模型可以准确预测相同碳数、不同种类和同分异构体烃类分子的性质.使用最优的机器学习模型计算了 34种已通过实验合成的高密度碳氢燃料的密度,计算值与实验值的平均绝对误差为0.0290 g‧cm-3.进而,预测了开源数据库GDB-13C中的319,893个烃类分子的燃料性质,并高通量筛选出了37种低凝固点、高密度的新型碳氢燃料候选分子.采用基团贡献法和DFT方法进一步计算了筛选出的碳氢分子的关键燃料性质,这些新型分子与典型燃料JP-10和QC的质量热值和比冲相当.

Abstract

A big database containing molecular structures and multiple properties of 2899 hydrocarbon molecules(the number of carbon atom is from 1 to 50),was constructed via data collection,structure optimization and quantum chemistry calculation.Seven properties were focused,including melting point(Tm),boiling point(Tb),density(ρ),internal energy at 0 K(U0),inter-nal energy at 298.15 K(U),enthalpy at 298.15 K(H)and Gibbs free energy at 298.15 K(G).Four different machine learning models were established,including Decision Tree Regressor,Lasso CV,Ridge CV and XGBoost Regressor,using coulomb ma-trix representing molecular structures as the input.In comparison,the XGBoost Regressor model is more suitable for regressing experimental melting point,boiling point and density of hydrocarbon molecules;Ridge CV model is more suitable for the predic-tion of four thermodynamic energy properties.In addition,the optimized machine learning combined model can accurately pre-dict the properties of the hydrocarbon molecules with same carbon numbers,hydrocarbons with different types and hydrocarbon isomers.Furthermore,the densities of 34 high-density hydrocarbon fuels reported experimentally were calculated by the opti-mized machine learning model.The mean absolute error between the calculated values and the experimental values is only 0.0290 g cm-3.Next,the fuel properties of 319,893 hydrocarbon molecules in GDB-13C were predicted by the machine learn-ing model to establish a big database containing hydrocarbon structure and fuel properties.Based on high-throughput screening,37 hydrocarbon fuel molecules with low freezing point and high density have been discovered.Through the proof-of-concept via group contribution method and DFT method,the net heat of combustion and specific impulse of the as-screened new molecules are similar to those of JP-10 and quadricyclane(QC).

关键词

机器学习/烃类分子/高密度碳氢燃料/性质预测/高通量筛选

Key words

machine learning/hydrocarbon molecule/high-density hydrocarbon fuel/property prediction/high throughput screening

引用本文复制引用

出版年

2025
含能材料
中国工程物理研究院

含能材料

CSCD北大核心
影响因子:0.616
ISSN:1006-9941
段落导航相关论文