首页|预训练模型在软件工程领域应用研究进展

预训练模型在软件工程领域应用研究进展

扫码查看
近年来深度学习在软件工程领域任务中取得了优异的性能。众所周知,实际任务中优异性能依赖于大规模训练集,而收集和标记大规模训练集需要耗费大量资源和成本,这限制了深度学习技术在实际任务中的广泛应用。随着深度学习领域预训练模型(pre-trained model,PTM)的发布,将预训练模型引入到软件工程(software engineering,SE)任务中得到了国内外软件工程领域研究人员的广泛关注,并得到了质的飞跃,使得智能化软件工程进入了一个新时代。然而,目前没有研究提炼预训练模型在软件工程领域的成功和机遇。为阐明这一交叉领域的工作(pre-trained models for software engineering,PTM4SE),系统梳理当前基于预训练模型的智能软件工程相关工作,首先给出基于预训练模型的智能软件工程方法框架,其次分析讨论软件工程领域常用的预训练模型技术,详细介绍使用预训练模型的软件工程领域下游任务,并比较和分析预训练模型技术这些任务上的性能。然后详细介绍常用的训练和微调PTM的软件工程领域数据集。最后,讨论软件工程领域使用PTM面临的挑战和机遇。同时将整理的软件工程领域PTM和常用数据集发布在https://github。com/OpenSELab/PTM4SE。
Research Progress of Pre-trained Model in Software Engineering
In recent years,deep learning has achieved excellent performance in software engineering(SE)tasks.Excellent performance in practical tasks depends on large-scale training sets,and collecting and labeling large-scale training sets require a lot of resources and costs,which limits the wide application of deep learning techniques in practical tasks.With the release of pre-trained model(PTM)in the field of deep learning,researchers in SE have begun to pay attention to PTM and introduced PTM into SE tasks.PTM has made a qualitative leap in SE tasks,which makes intelligent software engineering enter a new era.However,none of the studies have refined the success,failure,and opportunities of pre-trained models in SE.To clarify the work in this cross-field(pre-trained models for software engineering,PTM4SE),this study systematically reviews the current studies related to PTM4SE.Specifically,the study first describes the framework of the intelligent software engineering methods based on pre-trained models and then analyzes the commonly used pre-trained models in SE.Meanwhile,it introduces the downstream tasks in SE with pre-trained models in detail and compares and analyzes the performance of pre-trained model techniques on these tasks.The study then presents the datasets used in SE for training and fine-tuning the PTMs.Finally,it discusses the challenges and opportunities for PTM4SE.The collated PTMs and datasets in SE are published athttps://github.com/OpenSELab/PTM4SE.

software repository miningpre-trained model(PTM)programming language model

宫丽娜、周易人、乔羽、姜淑娟、魏明强、黄志球

展开 >

南京航空航天大学计算机科学与技术学院,江苏 南京 211106

高安全系统的软件开发与验证技术工信部重点实验室(南京航空航天大学),江苏 南京 211106

中国矿业大学计算机科学与技术学院,江苏 徐州 221116

软件仓库挖掘 预训练模型 程序语言模型

2025

软件学报
中国科学院软件研究所,中国计算机学会

软件学报

北大核心
影响因子:2.833
ISSN:1000-9825
年,卷(期):2025.36(1)