基于词汇的源代码克隆检测技术综述

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：代码克隆指在软件开发过程中对源代码复用、修改、重构产生的文本相似或结构相似的代码.代码克隆对提升软件开发效率、节约开发成本有积极作用,但也会引起Bug传播,并对软件的稳定性、可维护性产生负面影响.代码克隆检测在剽窃检测、漏洞检测、版权侵权等领域具有重要的研究意义和应用价值.基于词汇的克隆检测技术能快速检测1-3型克隆,能扩展到其他编程语言,已被广泛应用于大规模克隆检测任务中.文中对近5年基于词汇的克隆检测技术的研究现状进行了梳理,根据相似性算法中的基本计算粒度将其分为4类,并对10余个技术特征进行了分析和总结,讨论其局限性及面临的挑战,最后结合新技术的发展提出了基于词汇的克隆检测技术未来可能的研究方向.

外文标题：Summary of Token-based Source Code Clone Detection Techniques

外文摘要：Code cloning refers to the generation of similar or identical code during software development due to the reuse,modifi-cation,and refactoring of source code.Code cloning has a positive impact on improving software development efficiency and redu-cing development costs,but it can also do harm to the development and maintenance of software system,including but not limited to the decline of stability,and propagation of software defects.Clone detection techniques for source code have important research and application value in plagiarism detection,vulnerability detection,copyright infringement,and other fields.Although some ex-cellent detection tools and techniques have emerged,there are still challenges in detecting syntactic and semantic clones on a large scale and in an effective manner.Among them,lexical-based clone detection technology can quickly detect type 1-3 clones and can be extended to other programming languages and large-scale projects,therefore it is commonly used for clone detection in large-scale databases.This paper reviews the research status of lexical-based clone detection technology in the past decade,analyzes and summarizes 16 selected literature from 10 characteristics,and finally proposes possible research directions for lexical-based clone detection technology in the future in light of new technological developments.

外文关键词：

Software securitySource code clone detectionCode representationDeep learning

作者：

刘春玲、戚旭衍、唐永鹤、孙雪凯、李晴浩、张雨

展开 >

作者单位：

信息工程大学网络空间安全学院郑州 450001

关键词：

软件安全源代码克隆检测代码表征深度学习

基金：

河南省重点研发计划

项目编号：

221111210300

出版年：

2024

DOI：

10.11896/jsjkx.230400117

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(6)

参考文献量48