基于领域知识微调的缺陷报告严重性预测

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：有效预测缺陷报告的严重性,对快速、准确分派缺陷报告,帮助开发人员及时发现并处理软件中的缺陷至关重要.现有主流的基于传统信息检索或通用预训练模型的缺陷报告严重性预测方法,存在忽略上下文语义或缺陷报告特性导致预测效果受限的问题.对此,提出一种基于领域知识微调的缺陷报告严重性预测方法.利用能充分考虑文本上下文语义的BERT预训练模型,并使用缺陷报告数据对其进行模型微调使其学习到相关的领域知识.微调后的BERT模型用于抽取缺陷报告的语义特征,随后使用支持向量机进行严重性预测模型的构建.在 Mozilla,Eclipse和Apache选取的共计15个项目上进行的实验表明,在准确率、召回率和F1值上,相较传统的信息检索方法,所提方法分别能提升4.5％～22.0％,3.0％～22.0％,4.0％～22.0％;相较通用BERT模型,微调后的BERT模型的准确率、召回率和F1值分别能够提高2.0％～5.1％,1.9％～5.1％,1.8％～5.0％.

外文标题：Bug Report Severity Prediction Based on Fine-tuned Embedding Model with Domain Knowledge

外文摘要：Accurately predicting the severity of bug reports is crucial for efficiently assigning them and facilitating developers to timely detect and fix software bugs.However,existing severity prediction methods based on traditional information retrieval or general pre-training models have limitations in prediction accuracy due to the ignorance of context semantics or bug report charac-teristics.To address this problem,this paper proposes a severity prediction method based on domain knowledge fine-tuning.A BERT pre-trained model that can fully consider the semantic context of text is used,and the model is fine-tuned with bug report data to learn relevant domain knowledge.The fine-tuned BERT model is then used to extract semantic features of bug reports,and a support vector machine is employed to construct a severity prediction model.Experimental results on 15 projects,including Mozilla,Eclipse,and Apache,demonstrate that compared with traditional information retrieval methods,the proposed method can improve the accuracy,recall,and F1 score by 4.5％to 22.0％,3.0％to 22.0％,and 4.0％to 22.0％,respectively.Compared with the general BERT model,the fine-tuned BERT model can improve the accuracy,recall,and F1 score by 2.0％～5.1％,1.9％～5.1％,and 1.8％～5.0％,respectively.

外文关键词：

Word embeddingBERTPretrained modelBug reportFine-tuningSeverity prediction

作者：

陈冰婷、邹卫琴、蔡碧瑜、刘文杰

展开 >

作者单位：

南京航空航天大学计算机科学与技术学院南京 211106

关键词：

词嵌入 BERT 预训练模型缺陷报告微调严重性预测

基金：

国家自然科学基金南京航空航天大学前瞻布局科研专项南京航空航天大学人才科研启动基金

项目编号：

62002161

出版年：

2024

DOI：

10.11896/jsjkx.230400068

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量25