首页|面向机加工艺规程文本的实体识别模型

面向机加工艺规程文本的实体识别模型

扫码查看
为实现非结构化工艺规程文本中关键信息的高效识别,建立一种基于机加工领域词典和神经网络的命名实体识别模型。首先,结合机加工领域词典与jieba分词技术进行数据集的自动标注,并在对工艺参数信息进行标注的过程中将数字和标志字母划分为一个分词单位以增强后续特征提取效果;其次,在word2vec词嵌入的基础上,采用双向长短时记忆网络对文本进行特征提取;最后,采用条件随机场综合上下文逻辑以提高关键工艺信息的识别准确率。在包含431条工步内容的数据集上,对所提模型的识别效果进行实验,结果表明,所提模型的准确率、召回率和Fi值分别为90。20%,93。88%和92。00%,在与领域内传统模型的对比上具有一定优势,并使用3个不同工艺规程数据集验证了该模型的鲁棒性。
Named Entity Recognition Method for Process Planning Text
To realize the efficient recognition of critical information in unstructured process planning text,a named entity recognition model based on technology dictionary and neural network is established.Firstly,the technology dictionary and jieba word segmentation technology are comprehensively combined to realize automatic annotation of datasets,especially,the number and its identification letters are recognized as one unit in the automatic annotation of process parameter data,which enhances the effect of subsequent feature extraction.Secondly,the bidirectional long short term memory network is used to extract the feature of text information based on word2vec.Finally,conditional random field model is used to synthesize contextual logic to improve the recognition accuracy of critical process information.To verify the effectiveness of the proposed model,431 work steps are utilized as training sample.Experimental results show that the values of accuracy rate,recall and F1 are 90.20%,93.88%and 92.00%respectively,which has certain advantages compared with traditional models in the field.In addition,three experimental datasets from different tech-nology books are tested,the results also show high robustness of the proposed model.

bidirectional long short term memory networkconditional random fieldnamed entity recognitionknowledge extraction

董含笑、李豫虎、乔立红、黄志成

展开 >

北京航空航天大学机械工程及自动化学院 北京 100191

双向长短时记忆网络 条件随机场 命名实体识别 知识抽取

国家重点研发计划

2024

计算机辅助设计与图形学学报
中国计算机学会

计算机辅助设计与图形学学报

CSTPCD北大核心
影响因子:0.892
ISSN:1003-9775
年,卷(期):2024.36(2)
  • 12