首页|基于预训练模型与BiLSTM-CNN的多标签代码坏味检测方法

基于预训练模型与BiLSTM-CNN的多标签代码坏味检测方法

扫码查看
为了提高多标签代码坏味检测的准确率,提出一种基于预训练模型与BiLSTM-CNN的多标签代码坏味检测方法DMSmell(deep multi-smell).首先,利用静态分析工具获取源代码中的文本信息和结构度量信息,并采用2种检测规则对代码坏味实例进行标记;其次,利用CodeBERT预训练模型生成文本信息对应的词向量,并分别采用BiLSTM和CNN对词向量和结构度量信息进行深度特征提取;最后,结合注意力机制和多层感知机,完成多标签代码坏味的检测,并对DMSmell方法进行了性能评估.结果表明:DMSmell方法在一定程度上提高了多标签代码坏味检测的准确率,与基于分类器链的方法相比,精确匹配率提高了 1.36个百分点,微查全率提高了 2.45个百分点,微F1提高了 1.1个百分点.这表明,将文本信息与结构度量信息相结合,并利用深度学习技术进行特征提取和分类,可以有效提高代码坏味检测的准确性,为多标签代码坏味检测的研究和应用提供重要的参考.
Multi-label code smell detection method based on pre-trained model and BiLSTM-CNN
To improve the accuracy of multi-label code smell detection,a multi-label code smell detection method DMSmell(Deep Multi-Smell)based on pre-trained model and BiLSTM-CNN was proposed.Firstly,the static analysis tool was used to obtain the text information and structural metric information in the source code,and two detection rules were adopted to label the code smell instances;Secondly,the pre-training model of CodeBERT was used to generate the word vectors corresponding to the textual information,and the deep feature extraction of the word vectors and the structural metric features were performed by using BiLSTM and CNN,respectively;Finally,the detection of multi-label code smell was accomplished by combining the attention mechanism and multi-layer perceptron,and the performance of the DMSmell method was evaluated.The results show that the DMSmell method improves the accuracy of multi-label code smell detection to a certain extent.Compared with the classifier chain-based method,the accurate match ratio has improved by 1.36 percentage points,the micro-recall rate has improved by 2.45 percentage points,and the micro-F1 has improved by 1.1 percentage points.The results show that the combination of textual information with structural metric information and the use of deep learning techniques for feature extraction and classification can effectively improve the accuracy of code smell detection,which provides an important reference for the research and application of multi-label code smell detection.

software engineeringcode smellpre-trained modelmulti-label classificationdeep learning

刘海洋、张杨、田泉泉、王晓红

展开 >

河北科技大学信息科学与工程学院,河北石家庄 050018

软件工程 代码坏味 预训练模型 多标签分类 深度学习

国家自然科学基金河北省自然科学基金河北省引进留学人员资助项目

61440012F2023208001C20230358

2024

河北工业科技
河北科技大学

河北工业科技

CSTPCD
影响因子:0.694
ISSN:1008-1534
年,卷(期):2024.41(5)
  • 2