基于ARSER深度匹配模型的专利检索策略研究

Research on Patent Retrieval Strategy Based on ARSER Deep Matching Model

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：专利匹配是专利检索工作的重要组成部分,其核心是快速准确地发现与查询专利相似的其他专利文件.专利检索工作贯穿于专利生命周期的不同阶段,几乎是所有专利分析任务的基础.以往专利检索研究中,专利匹配通常基于标题、摘要短文本,或者利用词频-逆文本频率指数(TF-IDF)、Word2vec等方法基于专利说明书进行浅层语义文本匹配.为充分利用专利说明书深层语义信息,提出深度语义文本匹配模型ARSER,模型编码层利用BERT获得专利说明书向量嵌入,通过注意力机制分别得到一项专利说明书中的句子向量在另一项专利说明书中的最佳匹配向量,计算两者匹配结果作为局部匹配,最后融合局部匹配得到最终匹配结果.以知识产权领域的信息检索开放评测平台CLEF-IP提供的数据集作为数据来源,通过ARSER模型在专利检索中识别出专利现有技术文件的有效性,与过去专利检索改进策略研究中通常使用的TF-IDF、潜在狄利克雷分配(LDA)或者神经网络语言模型word2vec、Doc2Vec等处理方法进行对比.实验结果表明,提出的ARSER模型在专利匹配任务上性能优于其他基准方法.

外文摘要：Patent matching is a crucial component of patent retrieval tasks,with its core focus on discovering other patent documents similar to the queried patent rapidly and accurately.Patent retrieval is pervasive throughout various stages of the patent life cycle and serves as the foundation for nearly all patent analysis tasks.In previous patent retrieval research,patent matching primarily relied on short texts such as titles and abstracts,or employed methods like TF-IDF and word2vec for shallow semantic text matching based on patent specifications.To fully leverage the deep semantic information contained in patent specifications,this paper proposes the ARSER model for deep semantic text matching.In ARSER model,patent specifications are first segmented into sentences.These sentences are then represented as vectors using BERT,with zero vectors used for padding to generate consistent vector representations for different patent specifications.The attention mechanism is then applied to obtain the best-matching vector for each sentence vector from one patent specification within another.The matching results between these vectors are calculated as local matches,which are subsequently fused to obtain the final matching result.Using the dataset provided by Conference and Labs of the Evaluation Forum-Intellectual Property(CLEF-IP),the effectiveness of the ARSER model in identifying prior art documents in patent retrieval is evaluated.This is compared with traditional methods such as TF-IDF,LDA,or neural network language models like word2vec,which are commonly used in previous research on improving patent retrieval strategies.The experimental results demonstrate that the proposed ARSER model outperforms other benchmark methods in terms of retrieval performance for patent matching tasks.Specifically,compared to the best-performing Doc2Vec model,ARSER achieves a 3.97%improvement in recall rate.

外文关键词：

patent matchingpatent retrieval strategiespatent specificationretrieval performanceARSER modelBERT model

作者：

施国良、张笑笑、陈挺、吴静

展开 >

作者单位：

河海大学商学院,江苏南京 211100

关键词：

专利匹配专利检索策略专利说明书检索性能 ARSER模型 BERT模型

出版年：

2024

DOI：

10.3969/j.issn.1000-7695.2024.22.017

科技管理研究

广东省科学学与科技管理研究会

科技管理研究

CSTPCDCHSSCD

影响因子：0.779

ISSN：1000-7695

年,卷(期)：2024.44(22)