首页|基于DP-FS-BP预测框架和SHAP算法的数据资产价值评估指标贡献率

基于DP-FS-BP预测框架和SHAP算法的数据资产价值评估指标贡献率

扫码查看
数据资产价值评估对数据要素化发展具有战略意义,为理清数据资产价值评估指标的贡献率,平衡机器学习模型的准确性及可解释性.提出一种结合数据预处理技术和特征选择工程预测框架(data preprocessing-feature selection-back propaga-tion neural network,DP-FS-BP),并运用SHAP(Shapley additive explanations)算法对预测模型指标贡献进行解释.以优易数据网采集的交易块数据为例,首先运用数据预处理和特征选择对数据进行清洗与指标选择,其次将处理后的数据与原始数据在线性回归、支持向量机(support vector machine,SVM)、决策树、A-最近邻(k-nearest neighbors,KNN)、随机森林、XGBoost 和 DP-FS-BP 模型上对比相关系数拟合优度R2、均方根误差(root mean squared error,RMSE)、平均绝对误差(mean absolute error,MAE)的值,结果表明,DP-FS-BP模型获得最理想的预测结果,在预测精度上比其他模型有着显著优势;SHAP算法对BP神经网络模型进行解释.结果表明科研技术和数据样本量的SHAP值的平均绝对值分别为209.25和191.24,位居第一和第二.通过将特征对输出的贡献率可视化,为建立相应的数据资产价值评价指标体系提供决策依据.
Contribution Rate of Data Asset Value Evaluation Index Based on DP-FS-BP Prediction Framework and SHAP Algorithm
Data asset valuation is of strategic significance to the development of data elementalization,in order to clarify the contri-bution rate of data asset valuation indicators and balance the accuracy and interpretability of machine learning models,a data prepro-cessing-feature selection-back propagation neural network(DP-FS-BP)prediction framework prediction framework was proposed,and the Shapley Additive exPlanations(SHAP)algorithm was used to explain the metric contribution of the prediction model.Taking the transaction block data collected by Youe data network as an example,data preprocessing and feature selection were used to clean the data and select indicators,and then the values of R2,root mean squared error(RMSE)and mean absolute error(MAE)were compared with the original data on linear regression,support vector machine(SVM),decision tree,k-nearest neighbors(KNN),random forest,XGBoost and DP-FS-BP models.The results show that the DP-FS-BP model obtains the most ideal prediction results,and has a signifi-cant advantage over other models in prediction accuracy.The results of explaining the BP neural network model using the SHAP algo-rithm show that the average absolute values of SHAP values for scientific research techniques and data sample sizes are 209.25 and 191.24,respectively,ranking first and second.By visualizing the contribution rate of features to the output,a decision-making basis is provided for establishing a corresponding data asset value evaluation index system.

data preprocessingfeature selectionmodel interpretabilityback propagation neural networkcontribution rate

周翠平、李少波、张仪宗、袁攀亮、廖子豪、张星星

展开 >

贵州大学省部共建公共大数据国家重点实验室,贵阳 550025

贵州大学机械工程学院,贵阳 550025

数据预处理 特征选择 模型可解释性 BP神经网络 贡献率

2024

科学技术与工程
中国技术经济学会

科学技术与工程

CSTPCD北大核心
影响因子:0.338
ISSN:1671-1815
年,卷(期):2024.24(33)