A Fusion Method of Multimodal Course Resources Based on BERT
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
维普
万方数据
基于微调 BERT(bidirectional encoder representation from transformers)模型的实体对齐方法,对齐百度百科、互动百科的多模态资源.首先,通过下游的分类任务微调 BERT模型,提升模型预测正确结果的能力;其次,针对数据集正负样本比例不均衡的问题,提出负采样策略,提升模型的准确程度与泛化性能,实验结果证明分类任务性能明显提升,AUC(area under the curve)值提升 0.29;最后,将优化后的模型应用于实体对齐任务中,利用输出概率进行排序并预测最终对齐的实体对,实验结果优于基于相似度计算的实体对齐方法,F1 值达到 95.9%.
The existing research mostly uses the methods based on similarity calculation to align the entities of Baidu Baike and Interactive Encyclopedia Knowledge community,hence to expand the data source of Knowledge graph.However,the methods require manual threshold setting,and the language models are not able to learn deep domain semantic knowledge.To solve these problems,the entity alignment method of fine-tuning BERT Word embedding is proposed to align the multimodal resources of Baidu Baike and Interactive Encyclopedia in the paper.Firstly,the enhancement of the ability for the BERT model to predict correct results is able to be obtained through fine-tuning the model according the downstream classification tasks.Secondly,in light of the problem of imbalanced proportion of positive and negative samples in the dataset,a negative sampling strategy is proposed to improve the accuracy and generalization performance of the model,and the results show that the classification task performance is significantly improved,with an AUC value increase of 0.29.Finally,the optimized model is applied to the entity alignment task and by using output probability to sort and predict the final aligned entity pairs,the obtained results are superior to the entity alignment method based on similarity calculation,with an F1 value of 95.9%.
BERTintegration of curriculum resourcesentity alignmentnegative sampling