首页|面向在线健康社区UGC的医疗健康知识图谱构建研究——以小儿腹泻病为例

面向在线健康社区UGC的医疗健康知识图谱构建研究——以小儿腹泻病为例

扫码查看
构建面向在线健康社区用户生成内容(User Generated Content,UGC)数据的医疗健康知识图谱,探究基于用户潜在需求的健康知识抽取,对优化在线健康社区信息组织与检索,支撑在线健康社区知识服务创新具有重要意义.提出基于在线健康社区UGC数据的实体识别组合模型LDA-BERT-BiLSTM-CRF,首先利用LDA主题模型对在线健康社区UGC数据进行主题聚类分析从而提取实体类型,基于细分实体类型利用BERT-BiLSTM-CRF模型进行命名实体识别;然后采用MC-BERT-CasRel模型抽取在线健康社区UGC数据中的重叠三元组,并通过SBERT模型实现实体对齐;最后利用Neo4j图数据库完成知识图谱的存储和可视化.以小儿腹泻病为例,基于所提方法最终构建包含939个实体和3 224个关系的小儿腹泻病知识图谱.与目前主流模型进行对比实验,结果表明,所采用的组合模型LDA-BERT-BiLSTM-CRF与关系抽取模型MC-BERT-CasRel较传统方法知识抽取更准确,实体分类也更具针对性.
Construction of Medical Health Knowledge Map for UGC in Online Health Community:Taking Child Diarrheal Disease as an Example
It is of great significance to construct the medical health knowledge map oriented to the user generated content(UGC)data of online health community and explore the health knowledge extraction based on the potential needs of users to optimize the information organization and retrieval of online health community and support the knowledge service innovation of online health community.This paper proposes a combined entity recognition model LDA-BERT-BiLSTM-CRF based on UGC data of online health communities.We use the LDA topic model to perform thematic cluster analysis on UGC data of online health communities to extract entity types.Based on subdivision entity type,BERT-BiLSTM-CRF model is used to identify named entity.Then,MC-BERT-CasRel model is used to extract overlapping triples from UGC data in online health communities.Entity alignment is realized by SBERT model.Finally,the storage and visualization of knowledge map are realized by using Neo4j graph database.Taking child diarrheal disease as an example,a knowledge map of child diarrheal disease containing 939 entities and 3 224 relationships is constructed based on this method.Compared with the current mainstream models,the results show that the combined model LDA-BERT-BiLSTM-CRF and the relationship extraction model MC-BERT-CasRel are more accurate than the traditional knowledge extraction methods,and the entity classification is more targeted.

Knowledge Map ConstructionOnline Health CommunityUGCLDAKnowledge Extraction

孟秋晴、郑铭瑞、田玥璐、刘逸品、王琼弟

展开 >

贵州财经大学信息学院,贵阳 550025

南京大学软件学院,南京 210008

知识图谱构建 在线健康社区 用户生成内容 LDA 知识抽取

贵州省科技厅科技计划贵州省教育厅青年科技人才成长项目

黔科合基础-ZK[2021]一般336黔教合KY字[2022]192号

2024

数字图书馆论坛
中国科学技术信息研究所(ISTIC)北京万方数据股份有限公司

数字图书馆论坛

CSTPCD
影响因子:0.337
ISSN:1673-2286
年,卷(期):2024.20(8)
  • 9