首页|基于领域知识微调的缺陷报告严重性预测

基于领域知识微调的缺陷报告严重性预测

扫码查看
有效预测缺陷报告的严重性,对快速、准确分派缺陷报告,帮助开发人员及时发现并处理软件中的缺陷至关重要.现有主流的基于传统信息检索或通用预训练模型的缺陷报告严重性预测方法,存在忽略上下文语义或缺陷报告特性导致预测效果受限的问题.对此,提出一种基于领域知识微调的缺陷报告严重性预测方法.利用能充分考虑文本上下文语义的BERT预训练模型,并使用缺陷报告数据对其进行模型微调使其学习到相关的领域知识.微调后的BERT模型用于抽取缺陷报告的语义特征,随后使用支持向量机进行严重性预测模型的构建.在 Mozilla,Eclipse和Apache选取的共计15个项目上进行的实验表明,在准确率、召回率和F1值上,相较传统的信息检索方法,所提方法分别能提升4.5%~22.0%,3.0%~22.0%,4.0%~22.0%;相较通用BERT模型,微调后的BERT模型的准确率、召回率和F1值分别能够提高2.0%~5.1%,1.9%~5.1%,1.8%~5.0%.
Bug Report Severity Prediction Based on Fine-tuned Embedding Model with Domain Knowledge
Accurately predicting the severity of bug reports is crucial for efficiently assigning them and facilitating developers to timely detect and fix software bugs.However,existing severity prediction methods based on traditional information retrieval or general pre-training models have limitations in prediction accuracy due to the ignorance of context semantics or bug report charac-teristics.To address this problem,this paper proposes a severity prediction method based on domain knowledge fine-tuning.A BERT pre-trained model that can fully consider the semantic context of text is used,and the model is fine-tuned with bug report data to learn relevant domain knowledge.The fine-tuned BERT model is then used to extract semantic features of bug reports,and a support vector machine is employed to construct a severity prediction model.Experimental results on 15 projects,including Mozilla,Eclipse,and Apache,demonstrate that compared with traditional information retrieval methods,the proposed method can improve the accuracy,recall,and F1 score by 4.5%to 22.0%,3.0%to 22.0%,and 4.0%to 22.0%,respectively.Compared with the general BERT model,the fine-tuned BERT model can improve the accuracy,recall,and F1 score by 2.0%~5.1%,1.9%~5.1%,and 1.8%~5.0%,respectively.

Word embeddingBERTPretrained modelBug reportFine-tuningSeverity prediction

陈冰婷、邹卫琴、蔡碧瑜、刘文杰

展开 >

南京航空航天大学计算机科学与技术学院 南京 211106

词嵌入 BERT 预训练模型 缺陷报告 微调 严重性预测

国家自然科学基金南京航空航天大学前瞻布局科研专项南京航空航天大学人才科研启动基金

62002161

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(z1)
  • 25