计算机研究与发展2024,Vol.61Issue(2) :307-323.DOI:10.7544/issn1000-1239.202330730

基于分层表示和上下文增强的类摘要生成技术

Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement

陈豪伶 虞慧群 范贵生 李明辰 黄子杰
计算机研究与发展2024,Vol.61Issue(2) :307-323.DOI:10.7544/issn1000-1239.202330730

基于分层表示和上下文增强的类摘要生成技术

Class Summarization Generation Technology Based on Hierarchical Representation and Context Enhancement

陈豪伶 1虞慧群 1范贵生 1李明辰 1黄子杰1
扫码查看

作者信息

  • 1. 华东理工大学计算机科学与工程系 上海 200237
  • 折叠

摘要

代码摘要是源代码的自然语言解释,高质量的代码摘要有助于提高开发人员程序理解效率.近年来,代码自动摘要的研究集中在为方法粒度的代码片段生成摘要.然而,对于面向对象的语言,例如Java,类才是项目的基本组成单元.基于上述问题,提出一种基于分层表示和上下文增强的类摘要生成方法HRCE(hierarchical representation and context enhancement),并构建了一个包含 358 992个<Java类,上下文,摘要>数据对的类摘要数据集.HRCE使用代码精简策略去除类的非关键代码,从而缩短代码长度.然后,对类的层次结构,包括类签名、属性和方法分别进行建模,获得类的语义信息和层次结构信息.此外,从项目中抽取父类的签名及摘要来刻画类在项目中依赖的上下文.实验表明,基于分层表示和上下文增强的类摘要生成模型能够表征代码的语义和层次结构,并可以从目标类的内部和外部获取信息.HRCE在BLEU,METEOR,ROUGE-L等评估指标上超过了所有基准模型.

Abstract

Code summarization is a natural language description of source code,and high-quality code summaries help to improve developers'program understanding efficiency.In recent years,research on code summarization has focused on generating summaries for method-grained code snippet.However,in an object-oriented language such as Java,class is the basic programming unit.Due to the above problems,we propose a class summarization generation method based on hierarchical representation and context enhancement,called HRCE,as well as constructs a class summarization dataset containing 358 992 pairs of<Java class,content,summary>.HRCE uses code simplification strategy to remove non-critical code of class to shorten the code length.Then,HRCE models the class hierarchy,including class signature,attribute and method respectively,to obtain the semantic information and hierarchical structure information of the class.In addition,HRCE selects parent's class signature and class summary to describe the context that the class depends on in the project.Experiments show that a generative model for class summarization based on hierarchical representation and context enhancement is able to characterize the semantics and hierarchical structure of the code,and obtain information from both inside and outside of the target class.As a result,HRCE outperforms all baseline models on evaluation metrics such as BLEU,METEOR,ROUGE-L,etc.

关键词

代码自动摘要/分层表示/上下文增强/深度学习/类摘要

Key words

automatic code summarization/hierarchical representation/context enhancement/deep learning/class summarization

引用本文复制引用

基金项目

国家自然科学基金项目(62372174)

国家自然科学基金项目(62276097)

大数据流通与交易技术国家工程实验室课题研究计划()

上海市促进高质量发展专项资金项目(2021-GYHLW-01007)

出版年

2024
计算机研究与发展
中国科学院计算技术研究所 中国计算机学会

计算机研究与发展

CSTPCD北大核心
影响因子:2.649
ISSN:1000-1239
参考文献量1
段落导航相关论文