计算机工程与设计2024,Vol.45Issue(9) :2690-2696.DOI:10.16208/j.issn1000-7024.2024.09.018

基于贝叶斯网和RoBERTa的文本派生关系挖掘方法

Bayesian network and RoBERTa ensembles for text derivation relation mining

庄园 翁年凤 李杰
计算机工程与设计2024,Vol.45Issue(9) :2690-2696.DOI:10.16208/j.issn1000-7024.2024.09.018

基于贝叶斯网和RoBERTa的文本派生关系挖掘方法

Bayesian network and RoBERTa ensembles for text derivation relation mining

庄园 1翁年凤 2李杰1
扫码查看

作者信息

  • 1. 南京信息工程大学计算机学院、网络空间安全学院,江苏南京 210044;国防科技大学第六十三研究所,江苏南京 210007
  • 2. 国防科技大学第六十三研究所,江苏南京 210007;国防科技大学大数据与决策实验室,湖南长沙 410073
  • 折叠

摘要

对不实信息进行溯源分析是抑制社交网络中不实信息传播的重要手段,传统数据溯源方法主要针对结构化数据,难以准确判断文本之间的派生关系.针对这些问题,提出一种基于贝叶斯网和RoBERTa的文本派生关系挖掘方法,通过RoBERTa模型获得文本向量;通过RoBERTa模型初步预测文本间的派生关系,得到文本是否具有派生关系的分类标签;基于向量距离、文本距离、时间跨度和文本分类标签构建贝叶斯网,对文本派生关系进行判断.实验结果表明,所提方法查准率、查全率、F1值均高于对比方法,验证了该方法的有效性.

Abstract

Tracing and analyzing false information is important tools to suppress the spread of false information in social net-works.Traditional traceability methods are mainly used for structured data,so it is difficult to accurately judge the derivation re-lation between texts.To solve the above problems,Bayesian Network and RoBERTa ensembles for text derivation relation mi-ning was proposed.The text vector was obtained by RoBERTa model.RoBERTa model was used to preliminarily predict the derivation relation between the texts and get the classification label of whether the text had derivation relation.The Bayesian net-work was constructed by taking distance measurement information between texts and vectors,time span information and text classification labels to judge the text derivation relation.Experimental results show that the precision,recall,Fl value of the proposed method are higher than those of comparison methods,verifying the effectiveness of this method.

关键词

数据溯源/文本派生/贝叶斯网/预训练语言模型/派生关系/文本距离/概率模型

Key words

data provenance/text derivation/Bayesian network/per-trained language model/derivation relation/text distance/probabilistic models

引用本文复制引用

基金项目

国家自然科学基金项目(61371196)

国家重大科技专项基金项目(2015ZX01040201-003)

出版年

2024
计算机工程与设计
中国航天科工集团二院706所

计算机工程与设计

CSTPCD北大核心
影响因子:0.617
ISSN:1000-7024
参考文献量2
段落导航相关论文