中文在线医疗社区问答内容知识图谱构建研究

A Knowledge Graph Construction for Q&A Text in Chinese Online Medical Community

扫码查看

原文链接

维普
万方数据

中文摘要：[目的/意义]为有效抽取在线医疗社区问答文本中包含的医疗知识,综合利用多种深度学习方法,有针对性地设计一套知识图谱构建方法,以应对其口语化、噪声多、规范性差的文本特性给知识抽取带来的巨大挑战.[方法/过程]以寻医问药网糖尿病相关问答文本为数据源,结合对社区用户健康需求的分析,定义适合社区文本的实体和关系类型.使用BERT-wwm进行词嵌入以解决一词多义问题,通过BiLSTM-CRF模型进行实体识别.在关系标注时,设计一种实体遮蔽(entity mask)方式以解决关系重叠问题,而后使用CNN-Attention模型进行关系抽取.最后综合使用词典匹配和实体名称相似度进行实体对齐,并使用Neo4j图数据库存储和可视化得到的糖尿病知识图谱.[结果/结论]实验结果显示上述方法能够大幅提升对在线医疗社区问答文本的知识抽取效果,有效将非结构化的社区医疗问答文本转化为结构化的数据,对于社区知识发现、在线智能健康服务等方面具有推动作用.

外文摘要：[Purpose/Significance]This paper designs a set of knowledge graph construction method with some deep learning methods to facilitate knowledge extraction from colloquial,noisy and poorly normalized on-line medical community Q&A texts.[Method/Process]This paper utilized diabetes-related Q&A texts from xywy.com as the dataset,and determined entity and relationship categories through an analysis of the healthcare needs of the community users.The BERT-wwm model was employed for word embedding to solve polysemy,and then the BiLSTM-CRF model for entity recognition.When annotating the relations between entities,an entity mask was de-signed to avoid the relation overlap,and the CNN-Attention model was adopted for relation extraction.Ultimately,structured data was obtained through entity alignment using dictionary matching and entity name similarity,and stored and visualized using Neo4j.[Result/Conclusion]Experiments verify the effectiveness of the above methods.This paper extracts the medical knowledge from non-structured OMC text into structured data,which can promote the community knowledge discovery and online intelligent health services.

外文关键词：

online medical communityknowledge graphBERTattention mechanismdeep learning

作者：

席运江、李曼、邓雨珊、廖晓、邝云英

展开 >

作者单位：

华南理工大学工商管理学院广州 510641

广州城市理工学院管理学院广州 510800

广东金融学院互联网金融与信息工程学院广州 510521

广州科技职业技术大学信息工程学院广州 510550

展开 >

关键词：

在线医疗社区知识图谱 BERT 注意力机制深度学习

基金：

国家自然科学基金广东省自然科学基金基础与应用基础研究项目

项目编号：

721710902023A1515011551

出版年：

2024

DOI：

10.13266/j.issn.0252-3116.2023.24.010

图书情报工作

中国科学院文献情报中心

图书情报工作

CSTPCDCSSCICHSSCD北大核心

影响因子：2.203

ISSN：0252-3116

年,卷(期)：2024.68(4)

参考文献量31