面向TRIZ的专利技术三元组抽取研究与应用

Extracting Triplets of Technology Patents for TRIZ

扫码查看

原文链接

维普
万方数据

中文摘要：[目的]针对专利技术三元组自动抽取的准确性和效率不高的问题,研究专利技术三元组抽取的模型,以提升个性化、细粒度、多维度的深度抽取与语义关联的准确性.[方法]针对技术问题、解决方案、技术功能与技术效果等4个技术主题维度,提出基于WeakLabel-Bert-BiGRU-CRF模型的抽取方法,使用宏平均等指标进行模型评估.[结果]选择石墨烯能量存储应用领域专利作为数据集,实验结果表明,相比于Bert-BiGRU-CRF模型,所提模型针对三元组抽取的宏平均超过0.8,进一步减轻了数据标注的工作量,抽取效果更好.[局限]所提模型需要领域专家和专利情报分析人员共同参与数据标注,标注质量的不同会对应用效果产生影响.[结论]基于WeakLabel-Bert-BiGRU-CRF模型,研建对应的原型系统,以便后续进一步使用与推广专利技术三元组抽取方法,在科技文献知识挖掘领域也有较广泛的应用前景.

外文摘要：[Objective]This paper proposes a model for extracting patented technology triplets.It tries to improve the accuracy of personalization,fine-grained,multi-dimensional deep extraction,and semantic association.[Methods]We constructed an extraction method based on the WeakLabel-Bert-BiGRU-CRF model for four technical dimensions:problems,solutions,functions,and effects.We evaluated the model using indicators such as the macro average.[Results]We examined the new model with patents in graphene energy storage applications.Compared to the Bert-BiGRU-CRF model,the proposed method achieved a macro average of over 0.8 for triplet extraction and reduced the workload of data annotation.[Limitations]The proposed model requires domain experts and patent analysts in data annotation,and annotation quality affects application effectiveness.[Conclusions]The proposed model could effectively extract patent technology triplets,which has a broad application prospect in scientific literature knowledge mining.

外文关键词：

TRIZTriplet ExtractionPatented TechnologyWeakLabel-Bert-BiGRU-CRF

作者：

刘春江、李姝影、方曙、胡正银、钱力

展开 >

作者单位：

中国科学院成都文献情报中心成都 610299

中国科学院大学经济与管理学院信息资源管理系北京 100190

中国科学院文献情报中心北京 100190

关键词：

TRIZ 三元组抽取专利技术 WeakLabel-Bert-BiGRU-CRF

基金：

国家社会科学基金项目中国科学院2020年度西部之光人才项目中国科学院青年创新促进会

项目编号：

19BTQ088E1C00002012022173

出版年：

2024

DOI：

10.11925/infotech.2096-3467.2023.0492

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSTPCDCSSCICHSSCD北大核心EI

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2024.8(6)