基于RoBERTa-BiGRU-CRF的交通事故处置流程文本信息抽取
Text Information Extraction of Traffic Accident Disposal Process Based on RoBERTa-BiGRU-CRF
陈娇娜 1张静 1靳引利 2王鹏1
作者信息
- 1. 西安石油大学 电子工程学院,陕西 西安 710065
- 2. 长安大学 电子与控制工程学院,陕西 西安 710061
- 折叠
摘要
为改善现有交通事故应急信息识别中处置流程抽取不足的问题,以提高应急处置知识抽取的准确率,针对交通事故文本信息自然语言描述的复杂性,提出了一种基于预训练模型和混合深度学习网络的交通事故处置流程抽取方法.首先,从事故属性、处置机构、处置措施、处置效果和任务预判5个方面定义交通事故处置流程实体,并采用BIO标注实体类型.然后,将RoBERTa预训练模型生成的词向量作为输入,采用BiGRU模型进行特征提取,通过CRF模型进行条件约束来获得最终实体类型,并对RoBERTa-BiGRU-CRF组合模型的交通事故处置流程抽取结果进行时序融合,利用图数据库对抽取结果进行知识存储和可视化展示.最后,以陕西省高速公路交通事故文本信息为样本数据集,分别比较了不同预训练模型和深度学习网络的模型性能,利用消融实验论证了RoBERTa-BiGRU-CRF模型的有效性,并通过某起交通事故进行了实例验证.结果显示,RoBERTa-BiGRU-CRF组合模型的抽取效果最佳,F1值为99.77%.研究表明,所提方法能有效从文本信息中抽取交通事故应急处置流程关键要素,实现了应急处置流程抽取结果的可视化呈现,可为应急处置决策提供参考.
Abstract
To address the existing issue of insufficient extraction in recognizing traffic accident emergency information and improve the accuracy of emergency processing knowledge extraction,a method for extract-ing the traffic accident process based on pre-training model and hybrid deep learning networks was proposed aiming at the complexity of natural language description of traffic accident text information.Firstly,the disposal process entities of traffic accidents were defined from five aspects such as acci-dent characteristic,disposal agency,disposal measures,disposal effects,and task prediction,and these entity types were marked using BIO notation.Then,the word vectors generated by the RoBERTa pre-trained model were used as input,the BiGRU model was used for feature extraction,and the CRF model was used for conditional constraints to obtain the final entity type.Furthermore,time-series fu-sion was used to the traffic accident handling process extraction results obtained from RoBERTa-BiG-RU-CRF combined model,and the extracted results were stored and visualized using a graph data-base.Finally,using the text information of highway traffic accidents in Shaanxi Province as a sample data set,the performance of different pre-trained models and deep learning networks were compared,the effectiveness of the RoBERTa-BiGRU-CRF model was demonstrated through ablation experi-ments and validated through an example of one traffic accident.The results demonstrated that the Ro-BERTa-BiGRU-CRF combined model yielded superior extraction results with the F1 value of 99.77%.Research has shown that the proposed method can effectively extract key elements of the emergency response process for traffic accidents from textual information,achieve visual presentation of the re-sults of emergency response process extraction,and provide reference for emergency response deci-sion-making.
关键词
交通安全/交通事故/实体抽取/预训练模型/深度学习/时序融合Key words
traffic safety/traffic accident/entity extraction/pre-trained model/deep learning/time-series fusion引用本文复制引用
基金项目
国家自然科学基金青年基金(52002315)
国家重点研发计划(2019YFB1600700)
出版年
2024