Cross Language Code Plagiarism Detection Based on Program Flow Chart and Graph Atten-tion Network
Cross language code plagiarism detection has been widely used in the fields such as software intellectual property protection and computer programming teaching.However,the syntactic differences between different programming languages reduce the similarity between codes,resulting in lower accuracy of plagiarism detection.Therefore,this paper proposes a cross language code plagiarism de-tection approach based on program flowchart and graph attention network.First,source code is converted into a program flowchart and its features are extracted as the representation of the code using graph attention network.Second,the representation of the code is com-pared line by line using cross-matching method to obtain the similarity feature vectors of the code.Finally,the similar feature vectors of the source code to be detected are combined,and the probability of plagiarism is calculated using a fully connected neural network.The experimental results show that compared with existing cross language code plagiarism detection approaches,the proposed approach in this paper has improved accuracy,recall,and F1 value.Compared with the CLCDSA based on attribute counting andASTleamer based on abstract syntax trees,the F1 values have been increased by 11%and 16%,respectively.