Multi-label code smell detection method based on pre-trained model and BiLSTM-CNN
To improve the accuracy of multi-label code smell detection,a multi-label code smell detection method DMSmell(Deep Multi-Smell)based on pre-trained model and BiLSTM-CNN was proposed.Firstly,the static analysis tool was used to obtain the text information and structural metric information in the source code,and two detection rules were adopted to label the code smell instances;Secondly,the pre-training model of CodeBERT was used to generate the word vectors corresponding to the textual information,and the deep feature extraction of the word vectors and the structural metric features were performed by using BiLSTM and CNN,respectively;Finally,the detection of multi-label code smell was accomplished by combining the attention mechanism and multi-layer perceptron,and the performance of the DMSmell method was evaluated.The results show that the DMSmell method improves the accuracy of multi-label code smell detection to a certain extent.Compared with the classifier chain-based method,the accurate match ratio has improved by 1.36 percentage points,the micro-recall rate has improved by 2.45 percentage points,and the micro-F1 has improved by 1.1 percentage points.The results show that the combination of textual information with structural metric information and the use of deep learning techniques for feature extraction and classification can effectively improve the accuracy of code smell detection,which provides an important reference for the research and application of multi-label code smell detection.