计算机工程与设计2024,Vol.45Issue(5) :1413-1419.DOI:10.16208/j.issn1000-7024.2024.05.018

多特征融合的专利功效短语抽取

Patent efficacy phrase extraction based on multi-feature fusion

游新冬 赵颖 刘佳琦 吕学强
计算机工程与设计2024,Vol.45Issue(5) :1413-1419.DOI:10.16208/j.issn1000-7024.2024.05.018

多特征融合的专利功效短语抽取

Patent efficacy phrase extraction based on multi-feature fusion

游新冬 1赵颖 1刘佳琦 1吕学强1
扫码查看

作者信息

  • 1. 北京信息科技大学 网络文化与数字传播重点实验室,北京 100101
  • 折叠

摘要

为提高专利功效短语抽取的准确率和召回率,保障专利布局等研究工作的高质量进行,提出一种融合多特征的专利功效短语抽取模型.基于Bert-BiLSTM-CRF的整体框架,利用Bert模型对文本进行向量化,融合偏旁部首、五笔、词长+词性等特征输入到BiLSTM或Transformer进行编码,使用CRF解码得到对应输入的标签序列,得到专利功效短语.实验采用新能源汽车领域的专利文本作为训练数据,尝试组合不同的特征进行实验,实验结果表明,所提模型在准确率、召回率、F1值上均取得了明显提升,验证了多特征融合在功效短语抽取任务上的有效性.

Abstract

To improve the accuracy and recall rate of patent efficacy phrase extraction and ensure the high quality of patent layout research,a multi-feature extraction model of patent efficacy phrase was proposed.Based on the overall framework of Bert-BiL-STM-CRF,the Bert model was used to vectorize the text,which integrated features such as radicals,five strokes,word length and part of speech,and these features were inputted into BiLSTM or transformer for encoding,and the CRF decoder was used to get the corresponding tag sequences and then enter the tag sequence to get the patent efficacy phrases.Patent texts in the field of new energy vehicles were used as training data.With different features combined,experimental results show that the proposed model can achieve significant improvements in accuracy,recall,and F1 value,which verifies the effectiveness of multi-feature fu-sion on the task of extracting efficacy phrases.

关键词

多特征融合/专利功效短语/深度学习/词语抽取/双向长短期记忆模型/条件随机场模型/词向量模型

Key words

multi-feature fusion/patent efficacy phrase/deep learning/word extraction/BiLSTM model/CRF model/Word2vec model

引用本文复制引用

基金项目

国家自然科学基金(62171043)

北京市自然科学基金(4212020)

国家语委项目(ZDI145-10)

国家语委项目(YB145-3)

国防科技重点实验室项目(6412006200404)

北京市教委科学研究计划(KM202111232001)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量15
段落导航相关论文