基于BERT的百科知识库实体对齐

A Fusion Method of Multimodal Course Resources Based on BERT

扫码查看

原文链接

NETL
NSTL
维普
万方数据

中文摘要：基于微调 BERT(bidirectional encoder representation from transformers)模型的实体对齐方法,对齐百度百科、互动百科的多模态资源.首先,通过下游的分类任务微调 BERT模型,提升模型预测正确结果的能力;其次,针对数据集正负样本比例不均衡的问题,提出负采样策略,提升模型的准确程度与泛化性能,实验结果证明分类任务性能明显提升,AUC(area under the curve)值提升 0.29;最后,将优化后的模型应用于实体对齐任务中,利用输出概率进行排序并预测最终对齐的实体对,实验结果优于基于相似度计算的实体对齐方法,F1 值达到 95.9%.

外文摘要：The existing research mostly uses the methods based on similarity calculation to align the entities of Baidu Baike and Interactive Encyclopedia Knowledge community,hence to expand the data source of Knowledge graph.However,the methods require manual threshold setting,and the language models are not able to learn deep domain semantic knowledge.To solve these problems,the entity alignment method of fine-tuning BERT Word embedding is proposed to align the multimodal resources of Baidu Baike and Interactive Encyclopedia in the paper.Firstly,the enhancement of the ability for the BERT model to predict correct results is able to be obtained through fine-tuning the model according the downstream classification tasks.Secondly,in light of the problem of imbalanced proportion of positive and negative samples in the dataset,a negative sampling strategy is proposed to improve the accuracy and generalization performance of the model,and the results show that the classification task performance is significantly improved,with an AUC value increase of 0.29.Finally,the optimized model is applied to the entity alignment task and by using output probability to sort and predict the final aligned entity pairs,the obtained results are superior to the entity alignment method based on similarity calculation,with an F1 value of 95.9%.

外文关键词：

BERTintegration of curriculum resourcesentity alignmentnegative sampling

作者：

高茂、张丽萍、侯敏、闫盛、赵宇博

展开 >

作者单位：

内蒙古师范大学计算机科学技术学院,内蒙古呼和浩特 010022

关键词：

BERT 课程资源融合实体对齐负采样

基金：

内蒙古自治区自然科学基金资助项目内蒙古自治区自然科学基金资助项目内蒙古自治区哲学社会科学研究专项资助项目内蒙古自治区联合基金资助项目

项目编号：

2018MS060092021LHMS06012ZSZX211022023LHMS06009

出版年：

2023

DOI：

10.3969/j.issn.1001-8735.2023.06.011

内蒙古师范大学学报(自然科学汉文版)

内蒙古师范大学

内蒙古师范大学学报(自然科学汉文版)

CSTPCD

影响因子：0.291

ISSN：1001-8735

年,卷(期)：2023.52(6)

参考文献量14