首页|基于内容理解与指标融合的高价值专利识别

基于内容理解与指标融合的高价值专利识别

扫码查看
[研究目的]针对专利文本描述性强,传统高价值专利分类采用指标计算,未能在专利文件中考虑详细的上下文信息或相对长的文本序列等问题,结合专利文本内容和指标特征信息,以内容理解与指标融合的方式识别高价值专利.[研究方法]提出一种基于内容理解与指标融合的高价值专利识别模型,首先利用BERT-BiLSTM抽取专利文本的上下文和顺序特征,然后融合提取的专利文本特征和专利指标特征,最终使用XGBoost算法完成专利的高价值专利分类.[研究结论]经过多组对比实验的验证,提出的方法在识别基本电气原件及电通信技术领域内获得了中国国家知识产权局授予的中国专利奖的精确度达到了 74.19%,召回率达到了 76.66%,F1 值达到了 75.40%,表明该方法能够有效提升高价值专利的分类准确性.
High-Value Patent Recognition Based on Content Understanding and Indicators Integration
[Research purpose]In view of the strong description of patent text,the traditional high-value patent classification adopts the index calculation,and fails to consider the detailed context information or relatively long text sequence in patent documents.This paper proposes a method to identify high-value patents by combining the content of patent text and the feature information of patent indexes.[Research method]It proposes a high-value patent recognition model based on content understanding and index fusion.Firstly,BERT-BiLSTM is used to extract the contextual and sequential features of patent text,and then the extracted features of patent text and patent in-dex are fused,and finally XGBoost algorithm is used to complete the high-value patent classification.[Research conclusion]After sev-eral groups of comparative experiments,the method proposed shows an accuracy of 74.19%in identifying the winning of the China Patent Award granted by the State Intellectual Property Office of China in the field of identifying basic electrical components and electrical com-munication technology,a recall rate of 76.66%,and an F1value of 75.40%.The results show that the method can effectively improve the classification accuracy of high-value patents.

high-value patentspatent identificationpatent textpatent indicatorsBERT-BiLSTMXGBoost

唐恒、张星星、汪满容

展开 >

江苏大学知识产权学院 镇江 212013

江苏大学科技信息研究所 镇江 212013

高价值专利 专利识别 专利文本 专利指标 BERT-BiLSTM XGBoost

国家自然科学基金国家重点研发计划

715731082019YFB1405200

2024

情报杂志
陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心
影响因子:1.502
ISSN:1002-1965
年,卷(期):2024.43(4)
  • 36