首页|基于RailBERT的列控车载ATP测试案例事件抽取方法研究

基于RailBERT的列控车载ATP测试案例事件抽取方法研究

扫码查看
在列车超速防护(Automatic Train Protection,ATP)车载设备的实验室测试中,测试案例的数量庞大、复杂性高且存在大量列控领域专业术语,现有方法和模型由于缺乏列控领域知识,难以准确解析其语境信息并自动生成详细的结构化表示.针对这一问题,提出一种基于铁路双向编码器表示(Rail Bidirectional Encoder Representations from Transformers,RailBERT)模型的测试案例事件抽取方法.首先,通过新词挖掘算法扩展列控领域的专业术语并构建语料库,在此基础上采用基于铁路领域全词掩码(Railway Whole Word Masking,RWWM)预训练任务训练针对铁路列控领域的RailBERT模型,以增强模型对领域语境的理解.然后,提出一种基于事件抽取的方法来自动提取车载ATP测试案例的预期结果,通过预定义事件类型及事件论元,全方位解析和表征预期结果.最后,将RailBERT与双向长短期记忆网络(Bidirectional Long Short-Term Memory,BiL-STM)和条件随机场(Conditional Random Field,CRF)结合,以增强模型捕捉序列信息和标签之间依赖关系的能力,从而更有效地从测试案例中提取事件.实验结果表明:在测试案例事件抽取数据集中,所提模型的F1值达到90.3%,能够较准确地从测试案例中提取预定义的事件,进而生成测试案例预期结果的结构化表示,为实现自动测试奠定基础.
Research on RailBERT-based event extraction method for test cases of train control system on-board ATP
In the laboratory testing of Automatic Train Protection(ATP)onboard equipment,the large volume and high complexity of test cases,combined with numerous specialized terms in the train con-trol domain,pose significant challenges for existing methods and models.These approaches often lack domain-specific knowledge,making it difficult to accurately interpret contextual information and auto-matically generate detailed structured representations.To address these challenges,this paper pro-poses an event extraction method for test cases based on the Rail Bidirectional Encoder Representa-tions from Transformers(RailBERT)model.First,a corpus of specialized terms in the train control domain is expanded and constructed using a neologism mining algorithm.A RailBERT model tailored to the train control system domain is then pre-trained with a Railway Whole Word Masking(RWWM)task to improve its understanding of domain-specific contexts.Then,an event extraction approach is developed to automatically extract the expected outcomes of onboard ATP test cases.The predefined event types and event theory elements are used to achieve comprehensive parsing and characterization of the expected results.Finally,the RailBERT is integrated with Bidirectional Long Short-Term Memory(BiLSTM)and Conditional Random Field(CRF)to enhance its ability to capture dependen-cies between sequence information and labels,thereby enabling more effective event extraction from test cases.The experimental results show that the proposed model achieves an F1 score of 90.3%on the test case event extraction dataset.This model accurately extracts predefined events from test cases and generates structured representations of the expected outcomes,providing a foundation for the implementation of automated testing.

onboard equipmenttest casenatural language processingevent extractionpre-trained model

程烨、李开成、魏国栋

展开 >

北京交通大学 自动化与智能学院,北京 100044

车载设备 测试案例 自然语言处理 事件抽取 预训练模型

2024

北京交通大学学报
北京交通大学

北京交通大学学报

CSTPCD北大核心
影响因子:0.525
ISSN:1673-0291
年,卷(期):2024.48(5)