首页|基于深度语义理解的"技术—知识"关联实体识别及演化分析研究

基于深度语义理解的"技术—知识"关联实体识别及演化分析研究

扫码查看
[目的/意义]从细粒度视角理解"技术—知识"关联实体,构建专利文献中技术要素与知识要素识别的实现方案.[方法/过程]选取传统机器学习模型HMM、CRF,深度学习模型BiLSTM-CRF、BERT-Softmax、BERT-CRF和BERT-BiLSTM-CRF进行任务训练学习,以便确定性能最优的细粒度技术与知识实体识别模型.[结果/结论]为了验证所构建技术与知识实体识别的理论框架,以出版印刷领域文本作为实验验证场景,从专利文本中随机抽取7853条有效语料句,标注了 71626 个实体,通过训练学习确定BERT-BiLSTM-CRF为性能较好的实体识别模型,其对知识与技术实体识别综合性能F1 值为 0.82.此外,运用训练出的最优模型从 66665 篇专利文本的第一权利要求、权利要求、独立权利要求和技术功效中识别出4769296对知识与技术实体关联组合体,并分析了技术演化路径和"技术—知识"关联网络结构的演化规律.
Research on Related Entity Recognition and Evolution Analysis of"Technology-Knowledge"Based on Deep Semantic Understanding
[Objective/Significance]Understand the"technology-knowledge"related entities from a fine-grained perspective,and construct a implementation scheme for identifying technology elements and knowledge elements in patent documents.[Methods/Processes]The traditional machine learning models HMM and CRF,and the deep learning models BiLSTM-CRF,BERT-Softmax,BERT-CRF,and BERT-BiLSTM-CRF are selected for the task training and learning in order to identify the fine-grained technology with optimal performance and the knowledge entity recognition model.[Results/Conclusions]In order to validate the theoretical framework of technology and knowledge entity recognition,this paper takes the text in the field of publishing and printing as the experimental validation scenario,randomly selects 7853 valid corpus sentences from patent texts,and annotates 71626 entities,and determines that the BERT-BiLSTM-CRF is the entity recognition model with better performance through training,and the F1 value of its comprehensive performance for knowledge and technology entity recognition is 0.82.In addition,this paper applies the trained optimal model to identify 4769296 pairs of knowledge-technology entity association combinations from the first claim,claims,independent claims and technical effects of 66665 patents,and analyzes the technology evolution paths and the evolution pattern of the"technology-knowledge"association network structure.We also analyzed the technology evolution path and the evolution law of"technology-knowledge"association network structure.

Scientific and Technological InformationSubject Technology CorrelationEntity RecognitionBERT

杨金庆、李嘉琦、杨儒汉、罗星雨、程秀峰

展开 >

华中师范大学信息管理学院 武汉 430079

富媒体数字出版内容组织与知识服务重点实验室 北京 100038

科技情报 学科技术关联 实体识别 BERT

2024

情报工程

情报工程

CSTPCDCHSSCD
ISSN:
年,卷(期):2024.10(5)