基于ARSER深度匹配模型的专利检索策略研究
Research on Patent Retrieval Strategy Based on ARSER Deep Matching Model
施国良 1张笑笑 1陈挺 1吴静1
作者信息
- 1. 河海大学商学院,江苏南京 211100
- 折叠
摘要
专利匹配是专利检索工作的重要组成部分,其核心是快速准确地发现与查询专利相似的其他专利文件.专利检索工作贯穿于专利生命周期的不同阶段,几乎是所有专利分析任务的基础.以往专利检索研究中,专利匹配通常基于标题、摘要短文本,或者利用词频-逆文本频率指数(TF-IDF)、Word2vec等方法基于专利说明书进行浅层语义文本匹配.为充分利用专利说明书深层语义信息,提出深度语义文本匹配模型ARSER,模型编码层利用BERT获得专利说明书向量嵌入,通过注意力机制分别得到一项专利说明书中的句子向量在另一项专利说明书中的最佳匹配向量,计算两者匹配结果作为局部匹配,最后融合局部匹配得到最终匹配结果.以知识产权领域的信息检索开放评测平台CLEF-IP提供的数据集作为数据来源,通过ARSER模型在专利检索中识别出专利现有技术文件的有效性,与过去专利检索改进策略研究中通常使用的TF-IDF、潜在狄利克雷分配(LDA)或者神经网络语言模型word2vec、Doc2Vec等处理方法进行对比.实验结果表明,提出的ARSER模型在专利匹配任务上性能优于其他基准方法.
Abstract
Patent matching is a crucial component of patent retrieval tasks,with its core focus on discovering other patent documents similar to the queried patent rapidly and accurately.Patent retrieval is pervasive throughout various stages of the patent life cycle and serves as the foundation for nearly all patent analysis tasks.In previous patent retrieval research,patent matching primarily relied on short texts such as titles and abstracts,or employed methods like TF-IDF and word2vec for shallow semantic text matching based on patent specifications.To fully leverage the deep semantic information contained in patent specifications,this paper proposes the ARSER model for deep semantic text matching.In ARSER model,patent specifications are first segmented into sentences.These sentences are then represented as vectors using BERT,with zero vectors used for padding to generate consistent vector representations for different patent specifications.The attention mechanism is then applied to obtain the best-matching vector for each sentence vector from one patent specification within another.The matching results between these vectors are calculated as local matches,which are subsequently fused to obtain the final matching result.Using the dataset provided by Conference and Labs of the Evaluation Forum-Intellectual Property(CLEF-IP),the effectiveness of the ARSER model in identifying prior art documents in patent retrieval is evaluated.This is compared with traditional methods such as TF-IDF,LDA,or neural network language models like word2vec,which are commonly used in previous research on improving patent retrieval strategies.The experimental results demonstrate that the proposed ARSER model outperforms other benchmark methods in terms of retrieval performance for patent matching tasks.Specifically,compared to the best-performing Doc2Vec model,ARSER achieves a 3.97%improvement in recall rate.
关键词
专利匹配/专利检索策略/专利说明书/检索性能/ARSER模型/BERT模型Key words
patent matching/patent retrieval strategies/patent specification/retrieval performance/ARSER model/BERT model引用本文复制引用
出版年
2024