MMCUP:Updating Code Comments Based on Multi-Modal Information
As the complexity of software continues to increase,program comprehension has become increasingly important in the process of software development.Code comments are one of the most important documents in program comprehension,and high-quality code comments are of great value for program maintenance.However,during software development developers often neglect to update corresponding comments after changing code.This could introduce inconsistent comments,which not only cause confusion in software development and maintenance but also have a negative impact on the robustness of the system.To address this problem,some research has attempted to automatically update corresponding comments when code changes occur.While code contains abundant and explicit structural information,existing approaches often treat code as plain text and ignore its structural information when updating comments.This can lead to many failures in comment updates.To address this issue,this paper proposes a code comment updating approach called MMCUP(Multi-Modal Comment UPdating)that integrates multi-modal information.MMCUP uses three modalities of information,which includes old code comment sequence,code edit sequence,and AST difference sequence.First,data processing is performed to construct comment sequences based on the original comment information.Then,Code edit sequences and AST difference sequences are constructed based on the code before and after changes.These sequences are combined with old code comments to form input sequences that are fed into the model.After that,the Transformer encoder is used to encode each token in the input sequence separately.During training,multi-modal information features are fused through a multi-head attention mechanism.Finally,the decoder in Transformer is used to decode the encoded multi-modal information features and update comments.Experimental results show that MMCUP has improved Accuracy by 5.8%compared to HatCUP and 4.4%compared to HebCUP.The Recall@5 is 3.6%higher than HatCUP which achieves previous best performance.To determine whether the code edit sequence and AST difference sequence used in MMCUP can help improve the performance of comment update,we conducted ablation experiments.The experimental results show that both the code edit sequence and AST difference sequence can improve comment update performance.In addition,we conducted experimental analysis on the effect of MMCUP when facing more complex scenarios of code changes.The results show that compared with other approaches,for complex samples,MMCUP has shown the best performance among all comment update approaches.This indicates that MMCUP can learn different comment update situations to cope with more complex scenarios.To further validate the effectiveness of our method,we conducted a manual evaluation comparing MMCUP with HatCUP.The results of the manual evaluation also showed that the comments updated by MMCUP were more in line with the expectations of developers.Meanwhile,we discussed the reasons for failure cases of MMCUP and conducted an threats to validity analysis.In future research,we will further optimize the structured features extracted from code.For example,we will utilize control flow and data flow when extracting code features to increase the information obtained by the model.Additionally,we plan to explore other modalities of information that could be integrated into our model to further improve its performance.
code comment updatingprogram comprehensioncode-comment co-evolutiondeep learningsequence-to-sequence model