情报杂志2024,Vol.43Issue(6) :126-133,144.DOI:10.3969/j.issn.1002-1965.2024.06.018

面向技术识别的专利实体抽取——以类脑智能领域为例

Patent Entity Extraction for Technology Recognition:A Case Study of Brain-Inspired Intelligence

邢晓昭 苑朋彬 陈亮 任亮 余池
情报杂志2024,Vol.43Issue(6) :126-133,144.DOI:10.3969/j.issn.1002-1965.2024.06.018

面向技术识别的专利实体抽取——以类脑智能领域为例

Patent Entity Extraction for Technology Recognition:A Case Study of Brain-Inspired Intelligence

邢晓昭 1苑朋彬 1陈亮 1任亮 1余池1
扫码查看

作者信息

  • 1. 中国科学技术信息研究所 北京 100038
  • 折叠

摘要

[研究目的]专利实体抽取是基于专利文本的技术识别的基础.目前专利实体抽取任务面临自动化程度和准确率较低等问题,该研究从两方面对此进行改进:一是建立特定领域的高质量专利语料库,二是将先进的算法模型运用到专利实体抽取中.[研究方法]定义了包含13 种实体类型的细粒度信息体系,并据此对921 篇类脑智能专利的标题和摘要进行人工标注,此后运用Bert-BiLSTM-CRF模型,融合深度学习和机器学习对类脑智能专利实体进行识别.[研究结论]模型在总体上获得0.8 的准确率、召回率和F1 值,不同类型实体的识别效果具有差异.为了验证模型的性能,设计了几个对比实验.结果显示,微调数据和增加训练规模可以提高模型性能,本模型性能优于同时期一些经典模型.

Abstract

[Research purpose]Patent entity extraction is the basis of technology recognition from patent texts.At present,patent entity extraction is faced with the problem of low automation and accuracy.This study intended to improve this problem from two aspects:one is to establish a high-quality patent corpus in a specific field,and the other is to apply an advanced algorithm model to patent entity extrac-tion.[Research method]In this regard,a fine-grained information system was defined which contained13 entity types and the titles and abstracts of 921 patents in the field of brain-inspired intelligence were manually marked according to the annotation rules.Then a Bert-BiLSTM-CRF model which integrates deep learning and machine learning was used to identify the brain-inspired intelligence patent enti-ties.[Research conclusion]The model achieved accuracy rate,recall rate and F1 value of 0.8 on the whole and entities performed differ-ently according to their types.In order to verify the performance of the model,several comparative experiments were designed.The results showed that fine-tuning data and increasing training scale could improve the performance of the model.Moreover,the model is superior to some classical models during the same period.

关键词

专利实体/专利文本/专利挖掘/技术识别/深度学习/机器学习/Bert-BiLSTM-CRF模型

Key words

patent entity/patent text/patent mining/technology recognition/deep learning/machine learning/Bert-BiLSTM-CRF model

引用本文复制引用

基金项目

国家社会科学基金青年项目(21CTQ039)

出版年

2024
情报杂志
陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心
影响因子:1.502
ISSN:1002-1965
段落导航相关论文