融合多结构信息的代码注释生成模型

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：代码注释可以帮助开发人员理解代码的功能和实现方法.代码注释生成模型可以自动识别代码中的关键信息,并生成相关注释,提高代码的可读性和可维护性.现有的代码注释生成模型通常只使用抽象语法树结构信息来表示代码,导致模型生成注释质量不高.提出一种融合多结构信息的代码注释生成模型,该模型在代码抽象语法树的基础上,增加了数据流图结构信息来表示代码.模型使用 Trans-former的编码器对抽象语法树序列进行编码,捕获代码全局信息.使用图神经网络对数据流图进行特征提取,提供变量之间的计算依赖关系等信息.然后使用跨模态注意力机制融合抽象语法树和数据流 2 种特征,经过Transformer的解码器生成相应的注释.实验结果表明,与 6 种主流模型相比,所提出的模型在Java和Python数据集上的BLEU、METEOR 和ROUGE-L指标得分均有提高,生成的注释也具有良好的可读性.

外文标题：A code summarization generation model fusing multi-structure data

外文摘要：Code summarization can help developers understand the function and implementation of the code.The code summarization generation model can automatically identify the key information in the code and generate relevant summarization to improve the readability and maintainability of the code.Ex-isting code summarization generation models usually only use abstract syntax tree structure information to represent code,resulting in low-quality model-generated summarization.Aiming at this problem,this paper proposes a code summarization generation model that integrates multi-structure data.Firstly,the model adds data flow graph structure information to represent code on the basis of abstract syntax tree.Secondly,in order to capture the global information of the code,the model uses Transformer's en-coder to encode the abstract syntax tree sequence.In addition,the model uses the graph neural network to extract features from the data flow graph and provide information such as the computational depen-dencies between variables.Finally,the model uses the cross-modal attention mechanism to fuse the two features of the abstract syntax tree and the data flow and generate corresponding summarization through the Transformer decoder.The experimental results show that,compared with the six mainstream mod-els,the model improves the scores of BLEU,METEOR and ROUGE-L on the Java and Python data-sets,and the generated summarization is also very readable.

外文关键词：

code understandingcode summarization generationgraph neural networkmulti-feature fusionnatural language processing

作者：

余天赐、高尚

展开 >

作者单位：

江苏科技大学计算机学院,江苏镇江 212100

关键词：

代码理解代码注释生成图神经网络多特征融合自然语言处理

基金：

国家自然科学基金国家自然科学基金

项目编号：

6217610762376109

出版年：

2024

DOI：

10.3969/j.issn.1007-130X.2024.04.011

计算机工程与科学

国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心

影响因子：0.787

ISSN：1007-130X

年,卷(期)：2024.46(4)

参考文献量22