基于推荐列表的缺陷文件识别

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：缺陷定位是缺陷修复的关键步骤,同时也是一项繁琐的软件活动.现有的静态缺陷定位技术通常将缺陷定位视为一个检索任务,即为每个缺陷报告生成一份按照程序实体与缺陷相关度降序排列的可疑文件推荐列表.然而,开发人员仍需人工一一审查从而找到真正有缺陷的文件,这增加了定位的时间和成本.为解决这个问题,提出了一个相应的解决方案.首先运行主流的基于信息检索的静态缺陷定位技术来获得一个初始的可疑文件推荐列表;然后依据问题特性提出3类领域特征,并基于这3类特征构建一个机器学习模型,尝试从列表中识别出真正有缺陷(Truly Buggy)的源代码文件.在4个开源项目(Zoo-Keeper,OpenJPA,Tomcat,AspectJ)的2 558个bug上进行了实验,结果表明,在最初可疑文件推荐列表上可以获得72.6％～80.7％的真正有缺陷的文件预测准确率.同时探究了3类特征子集及各个特征在预测真正有缺陷的文件上的重要性,发现缺陷报告与源代码的关系特征更重要.

外文标题：Buggy File Identification Based on Recommendation Lists

外文摘要：Bug localization is a key step for bug fixing but also a tedious software activity.Existing static defect location tech-niques typically treat defect location as a search task,generating a list of recommended documents for each defect report in de-scending order of program entity relevance to the defect.However,developers still need to manually review each file to find the ones that are actually defective,which increases the time and cost of locating them.To solve this problem,this paper proposes a solution.Firstly,running state-of-the-art information-retrieval-based(IR-based)bug localization techniques to obtain an initial buggy files recommendation list.Then,three domain characteristics are proposed according to the characteristics of the problem,and a machine learning model is built based on these three characteristics,trying to identify the truly buggy files from the list.Preliminary experiments verify that the proposed approach is reasonable and actionable in practice.Experiments are carried out on four open source projects with 2 558 bugs(ZooKeeper,OpenJPA,Tomcat,AspectJ)and the results show that it could obtain 72.6％～80.7％prediction accuracy initially recommending the buggy code files in the list.At the same time,we explore the three feature subsets and the importance of each feature in predicting the truly buggy files,and find that the feature of the rela-tionship between the bug report and the source code is more important.

外文关键词：

Bug ReportBug localizationMachine learningInformation retrievalBuggy files

作者：

王昭丹、邹卫琴、刘文杰

展开 >

作者单位：

南京航空航天大学计算机科学与技术学院南京 211106

关键词：

缺陷报告缺陷定位机器学习信息检索缺陷文件

基金：

国家自然科学基金南京航空航天大学前瞻布局科研专项南京航空航天大学人才科研启动基金

项目编号：

62002161

出版年：

2024

DOI：

10.11896/jsjkx.230600088

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(z1)

参考文献量39