Identifying and predicting the causes of construction electric shock accidents through integration of LDA and CNN
To enhance the investigation of accident causes,this paper proposes a comprehensive model that integrates Latent Dirichlet Allocation(LDA),Word2Vec,and Convolutional Neural Networks(CNN).This combined approach aims to effectively assist in identifying and analyzing accident causes.First,unstructured accident reports undergo preprocessing,including data cleaning,word tokenization,and stop-word removal.Key information is extracted and visualized using Term Frequency-Inverse Document Frequency(TF-IDF).Next,the optimal number of topics(K-value)is determined based on perplexity and coherence measures.LDA is then applied to cluster the accident reports into K distinct topics related to accident causes.Labels for these topics are assigned by referencing both the extracted key information from the reports and summaries of causes found in relevant literature.Finally,an accident cause label repository is established.The accident cause labels are determined based on key information extracted from accident reports and summaries found in references,establishing an accident cause label repository.Following this,"accident histories"are extracted from reports using regular expressions.The 318 accident records are partitioned into 9∶1 training and testing sets.The"accident histories"in the training set are paired with their respective cause labels.Afterwards,the data from the 318 accidents are split into a 9∶1 ratio for training and testing purposes.The"accident history"along with its corresponding cause label values in the training set are used to construct a word vector matrix using Word2Vec for training the CNN model.Subsequently,the"accident history"from the test set is inputted into the CNN model to predict the corresponding cause label value,thereby determining the predicted cause of the accident.Finally,to evaluate the effectiveness and accuracy of the CNN model,its output was compared with those of the Support Vector Machine(SVM)and Naïve Bayes(NB)models using three major metrics.The CNN model achieved an accuracy rate of 0.81,demonstrating superior overall performance compared to the other models.These findings suggest that the proposed model is well-suited for this dataset,showing promise in recognizing and predicting accident causes.It may also be considered for application to other types of accidents in the future.By extracting cause topic labels from accident reports and integrating them into the accident cause prediction model,the model's prediction outcomes can serve as a supplementary tool.When combined with expert knowledge and on-site observations,these results can jointly offer decision support for investigating and predicting accident causes.This integrated approach provides a valuable reference and foundation for research and practical applications in the field of construction safety.