面向工艺实体识别的双向神经概率转换器

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：工艺实体识别旨在识别出产品制造中所遵照或是产生的文本中蕴含的零件、材料、属性和属性值等实体.目前,工艺等领域实体识别大多加入词典或正则规则等领域实体先验知识,修正神经网络模型识别结果或是生成预识别特征加入模型中.但上述方法未能实现领域实体识别的先验知识与神经网络模型统一建模,领域知识的加入没有减小模型训练代价,仍需大量标注数据.为解决上述问题,提出了面向工艺实体识别的双向神经概率转换器(Bi-NPT),将工艺实体识别先验知识建模为正则规则,然后将正则规则转化为参数化的概率有限状态转换器,使得模型在训练前带有实体识别的先验知识,同时具有可训练性.通过在标注数据上的训练,模型能够习得正则规则未覆盖实体的识别能力.实验结果表明,提出的Bi-NPT在未训练的情况下与正则规则实体识别效果相当,这表明未经过训练的初始模型即携带了实体识别知识.在小样本条件下,Bi-NPT优于PER,Template-based BART和NNShot方法;在充足样本条件下,Bi-NPT优于BiLSTM与TENER等方法.

外文标题：Bidirectional Neural Probabilistic Transducer for Process Text Entity Recognition

外文摘要：Process text entity recognition aims to recognize entities such as parts,materials,attributes and attribute values from texts generated or associated with the manufacturing process of products.Recently,in most domain-specific entity recognition tasks,such as process domain,prior knowledge in the form of dictionaries or rules is used to adjust neural network model results or generate pre-recognized features to incorporate into the model.However,these methods do not realize the integration of domain entity recognition knowledge and neural network models.Furthermore,the addition of domain knowledge does not reduce the training cost of the model and still need a large amount of labeled data.To address these challenges,this paper proposes a bidirec-tional neural probabilistic transducer(Bi-NPT)for process text entity recognition.This approach models the domain-specific prior knowledge for process text entity recognition as regular rules,and then converts these rules into a parameterized probabilistic fi-nite state transducer.This method makes the model carry entity recognition prior knowledge before training,while being traina-ble.The model acquires the ability to recognize entities not covered by the regular rules by training on labeled data.Experimental results demonstrate that the proposed Bi-NPT performs comparably to regular rule-based entity recognition without training,sug-gesting that the untrained initial model already has possess entity recognition knowledge.Additionally,Bi-NPT outperforms other methods such as PER,Template-based BART,NNShot in few-shot and BiLSTM,TENER in rich-resource scenarios.

外文关键词：

Process textEntity recognitionRegular rulesProbabilistic finite state transducer

作者：

李瑞婷、王裴岩、王立帮、杨丹清忻

展开 >

作者单位：

沈阳航空航天大学计算机学院沈阳 110136

关键词：

工艺文本实体识别正则规则概率有限状态转换器

基金：

辽宁省应用基础研究计划全国科技名词审定委员会科研项目国家自然科学基金

项目编号：

2022JH2/101300248YB2022015U1908216

出版年：

2024

DOI：

10.11896/jsjkx.230700206

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量26