基于自然语言处理的易水学派文本挖掘与句法分析图谱构建研究

Construction of a syntactic analysis map for Yishui school through text mining and natural language pro-cessing research

赵汉青 ¹李玥函 ¹邹欣妍¹

扫码查看

作者信息

1. 河北大学中医学院,河北保定 071000
折叠

摘要

自然语言处理中,实体与关系抽取是构建知识图谱、设计问答系统、语义分析等任务中不可或缺的环节.中医易水学派的信息多数以非结构化文言文本形式储存,中医文本关键信息抽取对挖掘和研究中医学术流派有重要作用.为了更高效地解决以上问题,研究引入人工智能方法,构建自然语言处理技术架构下基于条件随机场的分词和实体关系抽取模型识别与抽取中医文本实体关系,利用词频-逆文档频率算法的常用加权技术提取不同古籍文本中的关键实体信息,并使用基于人工神经网络依存句法分析技术,深入剖析古籍条文,以揭示其中实体之间复杂而精确的语法关系,将其表示为可视化树形结构,为下一步构建易水学派知识图谱及利用人工智能方法开展中医学术流派研究奠定基础.

Abstract

Entity and relationship extraction is a crucial component in natural language processing tasks such as knowledge graph construction,question answering system design,and semantic analysis.The information per-taining to Yishui school of traditional Chinese medicine primarily exists in the form of unstructured classical Chi-nese text,making key information extraction from TCM texts essential for mining and studying TCM academic schools.To efficiently address these challenges using artificial intelligence methods,this paper presents a word segmentation and entity relationship extraction model based on conditional random field within the framework of natural language processing technology to identify and extract entity relationships from TCM texts.Important key entity information from different ancient books is extracted using commonly employed TF-IDF information retriev-al and data mining weighting techniques.Additionally,grammatical relationships between entities in each ancient book article are analyzed using a neural network dependency parsing analyzer,which are then represen-ted as tree structures for visualization purposes.This paper lays the foundation for subsequent steps involving building a knowledge graph for Yishui school and utilizing artificial intelligence methods to conduct research on TCM academic schools.

关键词

自然语言处理/知识图谱/易水学派/句法分析

Key words

natural language processing/knowledge graph/Yishui school/syntactic analysis

引用本文复制引用

基金项目

国家自然科学基金(82004503)

河北省高等学校科学技术研究项目资助(BJK2024108)

出版年

2024

医学研究与教育

河北大学

医学研究与教育

影响因子：0.675

ISSN：1674-490X

段落导航