循环神经网络对于代码序列数据有着良好的处理能力,软件缺陷修复的补丁生成模型大多采用循环神经网络实现.然而,基于循环神经网络的补丁生成模型在处理代码序列中长距离依赖问题时仍然具有局限性,其修复成功率和修复效率较低.针对此问题,提出一种基于自注意力神经机器翻译的软件缺陷自动修复方法(Self-attention Neural machine translation based automatic software Repair,SNRepair).首先,为有效缓解源码中的未登录词问题,对数据集引入子词切分技术进行预处理;其次,为解决源代码中棘手的长距离依赖问题并更充分地利用局部信息,构建融合局部建模的Transformer程序补丁生成模型;然后,采用缺陷自动定位技术定位缺陷语句位置,利用参数优化后的Transformer补丁生成模型生成候选补丁;最后,运行测试用例验证候选补丁.在具有395个真实Java软件缺陷的De-fects4J缺陷库上实验评估,结果表明SNRepair方法与对比方法比较,修复成功率和修复效率更高.
Self-Attention Neural Machine Translation for Automatic Software Repair
Recurrent neural network has good ability of the processing for code sequences,and the patch generation model is mostly implemented by it.However,recurrent neural network-based patch generation models still have some limi-tations when dealing with long-distance dependencies in code sequences,and their repair success rate and repair efficiency is low.To address the issue,we present SNRepair,an automatic software fault repair based on self-attention neural machine translation.First,the subword tokenization technology is introduced to preprocess the dataset to alleviate the problem of out of vocabulary.Second,a Transformer program patch generation model that integrates local modeling is constructed to alle-viate the long-distance dependencies in the source code and make better use of local information.Third,the automatic fault localization technology is used to locate the possible fault position and the Transformer patch generation model through pa-rameter optimization is adopted to generate candidate patches.Finally,the candidate patches are verified by test cases.On the 395 real Java software faults in the Defects4J,the results show that the SNRepair has higher repair success rate and re-pair efficiency than the compared ones.