首页|基于深度学习的程序合成研究进展

基于深度学习的程序合成研究进展

扫码查看
随着软件工程实践的不断深入、开源社区的蓬勃发展,基于深度学习的程序合成引起了学术界和工业界的广泛关注.基于深度学习的程序合成,即程序智能合成,旨在利用深度学习技术自动生成满足用户意图的程序.相较于传统合成方法在扩展性和实用性方面的局限性,程序智能合成凭借其易扩展、可学习迭代等特性,已迅速崭露头角,成为软件工程领域的研究热点之一.最近,研究学者们在程序智能合成方面取得了显著进展,如GPT-4在LeetCode网站上的表现已经可以与人类相媲美.同时,工业界也推出了多款AI编程助手,如Copilot、Comate等,旨在解决软件开发的产能瓶颈.本文从多个角度出发,包括用户意图理解、程序理解、模型训练、模型测试与评估,归纳梳理了程序智能合成的研究进展,综述了该领域近几年的研究成果.此外,本文还对可能面临的挑战进行了探讨,并展望了未来的发展趋势.本文的研究有助于研究学者们全面了解程序智能合成领域的最新研究进展,同时也有助于软件开发人员快速掌握程序智能合成的技术方案和思路,以满足工业实践的需要.
Advances in Deep Learning-Based Program Synthesis
With the vigorous rise of software engineering practices,and the thriving development of open-source communities,deep learning-based program synthesis has emerged as a focal point of interest in both academia and industry.This field encompasses a range of disciplines including software engineering,deep learning,data mining,natural language processing,and programming languages.Deep learning-based program synthesis,namely intelligent program synthesis,utilizes deep learning techniques to extract knowledge from vast program repositories,with the goal of creating smart tools that improve the quality and productivity of computer programming.In contrast to traditional synthesis methods reliant on heuristic rules or expert systems,program intelligent synthesis has swiftly gained prominence due to its highly scalable and self-optimizing characteristics,becoming a research focus on both software engineering and artificial intelligence domains.The rapid advancement of pre-training techniques has led to the increasing adoption of large-scale language models in program synthesis,propelling continuous advancements in this domain.For example,GPT-4 has demonstrated human-comparable performance on platforms like LeetCode,while DeepMind's AlphaCode addresses challenges in natural language competitive programming.Simultaneously,the industry has introduced a series of AI programming assistants such as Copilot,Comate,and CodeWhisperer,significantly enhancing development efficiency and drastically reducing the learning curve in programming,thereby enabling broader participation in software development.To foster deeper research and widespread application in this field,this paper systematically explores the latest research progress in program intelligent synthesis from various perspectives.It comprehensively discusses aspects such as user intent understanding,program comprehension,model training,model testing,and evaluation,with detailed subdivisions.User intent understanding aims to locate and understand user intentions by integrating contextual semantics and knowledge swiftly and accurately.The paper introduces methods for understanding users from different angles,including input-output pairs,natural language,programs,and visual aspects.Program comprehension analyzes and extracts critical information from programs at various abstraction levels and perspectives,transforming it into forms understandable by computers.This paper presents program comprehension methods based on text sequences,tree structures,and graph structures.Model training uses this information to generate new code,while model testing and evaluation verify and optimize the quality and performance of generated code.The paper also examines challenges such as uneven dataset quality,low efficiency in user intent under-standing and program comprehension,as well as issues regarding model interpretability and robustness.Furthermore,the paper anticipates future trends,including higher-quality datasets,more efficient methods for user intent understanding and program comprehension,more robust model architectures,and improved application of these technologies in practical industrial settings.This research not only aids the academic community in comprehensively understanding the latest developments in the field of intelligent program synthesis but also assists software developers in quickly mastering relevant technologies and strategies to meet industrial demands.Through continuous exploration and innovation,intelligent program synthesis is poised to achieve greater breakthroughs in the future,driving innovation and development across the entire software engineering domain.The integration of these advancements promises to revolutionize software engineering practices,ushering in an era of enhanced efficiency and creativity in programming and development workflows.

intelligent software engineeringdeep learningprogram synthesisprogram compre-hensionuser intent understanding

苟倩文、董云卫、李泳民

展开 >

西北工业大学计算机学院 西安 710072

西北工业大学软件学院 西安 710072

北京大学计算机学院高可信软件技术教育部重点实验室 北京 100871

智能软件工程 深度学习 程序合成 程序理解 用户意图理解

国家自然科学基金重大项目国家自然科学基金重大项目

6219273362192730

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(11)