Automatic Naming of Source Code Based on Information Retrieval
Automatic naming of source code entails predicting a descriptive name that reflects the code function within a given method body.This practice can improve code readability and comprehension,thus enhancing the software development efficiency.Traditional naming approaches only use single information,such as lexical or syntactic information of the code,whereas deep learning-based naming approaches usually ignore similar examples in the corpus;both these approaches affect the code naming accuracy.To address these problems,this paper proposes an approach for automatic naming of source codes based on information retrieval.The proposed approach utilizes a pre-trained model and Bidirectional Encoder Representations from Transformers(BERT)-whitening method,which is an overall method for extracting the effective features of the input code and the code in the corpus,and calculates the semantic similarity between them on the basis of the Euclidean distance.Subsequently,the code with the highest semantic similarity ranking to the input code is selected as a candidate library among the corpus codes.The lexical and syntactic similarity between the input code and candidate library codes is calculated using the Jaccard index and the Longest Common Subsequence(LCS)method.Finally,lexical and syntactic similarities are fused to match the code fragment in the candidate library with the highest similarity to the input code.The method name of the code snippet is then reused as the method name of the input code.Experimental results show that the F1 value of the proposed approach on the public Java-small dataset increases by 6.93 and 1.22 percentage points compared to that for the Vector Space Model(VSM)and Code2Vec model,respectively,indicating excellent predictive performance.