基于提示学习的轻量化代码生成方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：代码自动生成是提高软件开发效率的有效途径之一,已有的研究一般将代码生成作为一项序列到序列的任务,并且大规模预训练语言模型的微调过程往往伴随着高昂的算力开销.文中提出了一种基于提示学习的轻量化代码生成方法(Prompt Learning based Parameter-Efficient Code Generation,PPECG),该方法通过查询代码语料库中与当前需求最相似的结果作为提示,指导预训练语言模型进行代码生成,并且在该过程中固定模型的绝大多数参数以实现减少算力开销的目的.为了验证PPECG的有效性,文中选取了两个代码生成数据集,分别是CONCODE和Solidity4CG,通过计算生成结果的BLEU,Code-BLEU以及Exact Match值来验证PPECG的有效性,实验结果表明,PPECG有效地减少了微调时的显存开销,且在上述指标上基本接近甚至优于目前的SOTA方法,能够较好地完成代码生成的任务.

外文标题：Prompt Learning Based Parameter-efficient Code Generation

外文摘要：Automatic code generation is one of the effective ways to improve the efficiency of software development.Existing re-search often regards code generation as a sequence-to-sequence task,and the process of fine-tuning of large-scale pre-trained lan-guage models is often accompanied by high computing cost.In this paper,a method of prompt learning based parameter-efficient code generation is proposed.This method guides the pre-trained language model to generate code by querying the result which is most similar to the current intent in the code corpus,and most of the parameters of the model are fixed in the process to achieve the effect of reducing computing cost.In order to verify the effectiveness of PPECG,two datasets for code generation are selected in this paper,namely CONCODE and Solidity4CG.The effectiveness of PPECG is verified by calculating the BLEU,CodeBLEU and Exact Match values of the generated results.Experimental results show that PPECG effectively reduces the graphic memory cost during fine-tuning,and is basically close to or even better than the current SOT A method on the above benchmarks,which is capable of completing code generation tasks well.

外文关键词：

Code generationPrompt learningPre-trained language modelInformation retrievalSmart contract

作者：

徐一然、周宇

展开 >

作者单位：

南京航空航天大学计算机科学与技术学院南京 210016

南京航空航天大学高安全系统的软件开发与验证技术工信部重点实验室南京 211100

关键词：

代码生成提示学习预训练语言模型信息检索智能合约

基金：

国家自然科学基金国防基础科研项目江苏省自然科学基金

项目编号：

61972197JCKY2022605C006BK20201292

出版年：

2024

DOI：

10.11896/jsjkx.230400137

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(6)

参考文献量30