Summary of Token-based Source Code Clone Detection Techniques
Code cloning refers to the generation of similar or identical code during software development due to the reuse,modifi-cation,and refactoring of source code.Code cloning has a positive impact on improving software development efficiency and redu-cing development costs,but it can also do harm to the development and maintenance of software system,including but not limited to the decline of stability,and propagation of software defects.Clone detection techniques for source code have important research and application value in plagiarism detection,vulnerability detection,copyright infringement,and other fields.Although some ex-cellent detection tools and techniques have emerged,there are still challenges in detecting syntactic and semantic clones on a large scale and in an effective manner.Among them,lexical-based clone detection technology can quickly detect type 1-3 clones and can be extended to other programming languages and large-scale projects,therefore it is commonly used for clone detection in large-scale databases.This paper reviews the research status of lexical-based clone detection technology in the past decade,analyzes and summarizes 16 selected literature from 10 characteristics,and finally proposes possible research directions for lexical-based clone detection technology in the future in light of new technological developments.