针对海运货物邮件实体识别中存在识别精度不高、实体边界确定困难的问题,提出一种结合深度学习与规则匹配的识别方法。其中:深度学习方法是在BiLSTM-CRF(Bidirectional Long Short Term Memory-Conditional Random Field)模型的基础上添加词的字符级特征,并融入多头注意力机制以捕获邮件文本中长距离依赖;规则匹配方法则根据领域实体特点制定规则来完成识别。根据货物邮件特点将语料进行标注并划分为:货物名称、货物重量、装卸港口、受载期和佣金五个类别。在自建语料中设置多组对比实验,实验表明所提方法在海运货物邮件实体识别的F1值达到79。3%。
NAMED ENTITY RECOGNITION OF SHIPPING CARGO MAIL BASED ON DEEP LEARNING AND RULES
To address the problems such as low recognition accuracy and the difficulty in entity boundary determination in named entity recognition of shipping cargo mails,this paper proposes a named entity recognition model based on deep learning and rules.Based on the model BiLSTM-CRF(bidirectional long short term memory-conditional random field),the deep learning method added word character level features and engaged in the multi-head attention mechanisms to obtain the long-distance dependence of texts.The rule matching method made corresponding rules according to the characteristics of domain entities to complete the recognition.According to the characteristics of shipping cargo mails,corpus was marked and divided into five categories:cargo name,quantity,loading and discharge port,laycan and commission.A series of comparative experiments were conducted in self-built shipping cargo text corpus.The experimental results show that the F1 value reaches 79.3%in the field of shipping cargo mails entity recognition.
Named entity recognitionShipping cargo mailMulti-head attention mechanismCharacter-level featuresRule matching