首页|基于主题一致性保持和伪相关反馈库扩展的缺陷报告重构方法

基于主题一致性保持和伪相关反馈库扩展的缺陷报告重构方法

扫码查看
为了加快开发人员定位软件缺陷,研究人员提出了一系列基于文本检索的缺陷定位技术,自动为用户所提交的缺陷报告推荐可疑的代码文件.由于用户的专业知识不同,编写的缺陷报告质量不一致,因此某些低质量的缺陷报告无法被成功定位.对低质量的缺陷报告进行重构从而改进其定位效果,是常见的解决方案.现有基于查询扩展和查询缩减的主流重构方法,容易出现重构前后查询主题不一致或所依赖伪相关库质量差导致重构质量低的问题.对此,提出了一种基于主题一致性保持和伪相关反馈库扩展的缺陷报告重构方法,由主题一致性保持的查询缩减阶段和伪相关反馈库扩展的查询扩展阶段两部分组成.查询缩减阶段将缺陷报告的概要问题描述和从问题描述文本中提取的关键词合并来解决主题不一致性问题;查询扩展阶段综合使用多种定位工具(即Lucene,BugLocator和Blizzard)来获得伪相关反馈库,并从中提取查询扩展关键词,以解决现有伪相关反馈库质量差导致的重构质量低的问题;最后将查询缩减和扩展阶段的输出合并得到重构后的查询.通过在6个Java项目上进行实验发现,对于使用现有缺陷定位方法无法在TOP 10可疑推荐文件中定位的低质量缺陷报告,使用所提重构方法后,能定位其中21%~39%的缺陷即Accuracy@10,MRR@10为10%~16%.对比现有重构技术,所提重构方法在Accuracy@10和MRR@10两个指标上分别可以提升7%~32%和2%~13%.
Bug Report Reformulation Method Based on Topic Consistency Maintenance and Pseudo-correlation Feedback Library Extension
To enhance the speed of locating software bugs for developers,a set of bug location techniques based on text retrieval has been proposed.These techniques aim to automatically recommend potentially suspicious code files associated with bug reports submitted by users.However,due to varying levels of professional expertise among users,the quality of bug reports tends to be inconsistent.As a result,some low-quality bug reports cannot be successfully located.To improve the quality of those bug re-ports,it is common to refactor the bug reports.Existing mainstream methods for reformulation,which involve query extension and query reduction,often face issues such as inconsistent query topics before and after reformulation or the utilization of poor-quality pseudo-correlation libraries.To address this problem,this paper proposes a bug report reformulation method that focuses on maintaining topic consistency and extending pseudo-correlation feedback libraries.This method consists of two parts:the query reduction stage,which aims to maintain topic consistency through combining a concise problem description with keywords extrac-ted from the text,and the query expansion stage,which involves using various locating tools(Lucene,BugLocator,and Blizzard)to comprehensively obtain a pseudo-correlation feedback library.From this library,additional keywords for query expansion are ex-tracted to address the issue of low reformulation quality caused by the inadequacy of the existing pseudo-correlation feedback li-brary.Ultimately,the outputs of the query reduction and expansion stages are combined to form the reformulated query.Through experiments conducted on six Java projects,it is discovered that for low-quality bug reports that could not be identified among the top 10 recommended files using the existing bug location method,21%~39%of them can be located using the proposed reformu-lation method,i.e.,Accuracy@10 and MRR@10 is 10%~16%.Compared with existing reformulation techniques,the Accuracy@10 and MRR@10 of the proposed reformulation method can improve by 7%~32%and 2%~13%,respectively.

Bug localizationQuery reformulationQuery reductionQuery expansionPseudo-correlation feedback librariesQuality of bug report

刘文杰、邹卫琴、蔡碧瑜、陈冰婷

展开 >

南京航空航天大学计算机科学与技术学院 南京 211106

缺陷定位 查询重构 查询缩减 查询扩展 伪相关反馈库 缺陷报告质量

国家自然科学基金国家自然科学基金南京航空航天大学前瞻布局科研专项资金

6200216162272225

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(7)