Buggy File Identification Based on Recommendation Lists
Bug localization is a key step for bug fixing but also a tedious software activity.Existing static defect location tech-niques typically treat defect location as a search task,generating a list of recommended documents for each defect report in de-scending order of program entity relevance to the defect.However,developers still need to manually review each file to find the ones that are actually defective,which increases the time and cost of locating them.To solve this problem,this paper proposes a solution.Firstly,running state-of-the-art information-retrieval-based(IR-based)bug localization techniques to obtain an initial buggy files recommendation list.Then,three domain characteristics are proposed according to the characteristics of the problem,and a machine learning model is built based on these three characteristics,trying to identify the truly buggy files from the list.Preliminary experiments verify that the proposed approach is reasonable and actionable in practice.Experiments are carried out on four open source projects with 2 558 bugs(ZooKeeper,OpenJPA,Tomcat,AspectJ)and the results show that it could obtain 72.6%~80.7%prediction accuracy initially recommending the buggy code files in the list.At the same time,we explore the three feature subsets and the importance of each feature in predicting the truly buggy files,and find that the feature of the rela-tionship between the bug report and the source code is more important.