首页|融合LDA和CNN的施工触电事故原因识别和预判

融合LDA和CNN的施工触电事故原因识别和预判

扫码查看
鉴于施工触电事故具有突发性强、致死率高的特点,为了有效辅助事故原因的调查,首先,对318份施工触电事故进行预处理,运用词频-逆文档频率(Term Frequency-Inverse Document Frequency,TF-IDF)关键词处理算法和可视化技术提取关键信息。其次,通过狄利克雷主题模型(Latent Dirichlet Allocation,LDA)提取原因主题词条,并根据关键信息构建相应的原因主题标签。随后,运用Word2Vec模型将"事故经过"和"主题标签"转化为词向量矩阵,并输入卷积神经网络(Convolutional Neural Networks,CNN)模型中,利用CNN模型数据预测的特征,实现事故原因的预判。最后,对比分析CNN模型与其他两种经典模型的预判效果。试验结果表明,该方法能够在事故调查完成前,较准确地预判事故可能原因。该模型构建的事故原因库,可以为事故预防提供一定参考,模型可以作为辅助事故实际调查的有效手段。
Identifying and predicting the causes of construction electric shock accidents through integration of LDA and CNN
To enhance the investigation of accident causes,this paper proposes a comprehensive model that integrates Latent Dirichlet Allocation(LDA),Word2Vec,and Convolutional Neural Networks(CNN).This combined approach aims to effectively assist in identifying and analyzing accident causes.First,unstructured accident reports undergo preprocessing,including data cleaning,word tokenization,and stop-word removal.Key information is extracted and visualized using Term Frequency-Inverse Document Frequency(TF-IDF).Next,the optimal number of topics(K-value)is determined based on perplexity and coherence measures.LDA is then applied to cluster the accident reports into K distinct topics related to accident causes.Labels for these topics are assigned by referencing both the extracted key information from the reports and summaries of causes found in relevant literature.Finally,an accident cause label repository is established.The accident cause labels are determined based on key information extracted from accident reports and summaries found in references,establishing an accident cause label repository.Following this,"accident histories"are extracted from reports using regular expressions.The 318 accident records are partitioned into 9∶1 training and testing sets.The"accident histories"in the training set are paired with their respective cause labels.Afterwards,the data from the 318 accidents are split into a 9∶1 ratio for training and testing purposes.The"accident history"along with its corresponding cause label values in the training set are used to construct a word vector matrix using Word2Vec for training the CNN model.Subsequently,the"accident history"from the test set is inputted into the CNN model to predict the corresponding cause label value,thereby determining the predicted cause of the accident.Finally,to evaluate the effectiveness and accuracy of the CNN model,its output was compared with those of the Support Vector Machine(SVM)and Naïve Bayes(NB)models using three major metrics.The CNN model achieved an accuracy rate of 0.81,demonstrating superior overall performance compared to the other models.These findings suggest that the proposed model is well-suited for this dataset,showing promise in recognizing and predicting accident causes.It may also be considered for application to other types of accidents in the future.By extracting cause topic labels from accident reports and integrating them into the accident cause prediction model,the model's prediction outcomes can serve as a supplementary tool.When combined with expert knowledge and on-site observations,these results can jointly offer decision support for investigating and predicting accident causes.This integrated approach provides a valuable reference and foundation for research and practical applications in the field of construction safety.

safety engineeringconstruction electrocution accidentscauses of accidentsLatent Dirichlet Allocation(LDA)Word2Vec modelConvolutional Neural Network(CNN)

李珏、潘悦、吴畅

展开 >

长沙理工大学交通运输工程学院,长沙 410075

长沙理工大学交通基础设施智慧建造与运维管理湖南省高等学校重点实验室,长沙 410075

安全工程 施工触电事故 事故原因 狄利克雷主题模型(LDA) Word2Vec模型 卷积神经网络(CNN)

湖南省自然科学基金项目湖南省教育厅项目

2021JJ3074420K011

2024

安全与环境学报
北京理工大学 中国环境科学学会 中国职业安全健康协会

安全与环境学报

CSTPCD北大核心
影响因子:0.943
ISSN:1009-6094
年,卷(期):2024.24(10)
  • 8