Design and Implementation of Question Bank Plagiarism Detection System Based on Simhash Algorithm
The Simhash algorithm is a technique based on Locality Sensitive Hashing(LSH),known for its rapid computation speed and high accuracy in plagiarism detection.This algorithm converts text features into binary codes and evaluates the similarity of texts by calculating the Hamming distance between these binary codes.In various fields such as text deduplication and duplicate document detection,the Simhash algorithm has demonstrated significant effectiveness.Therefore,applying the Simhash algorithm to question bank plagiarism detection is highly feasible and has practical application value.
Simhash algorithmHamming distancequestion bank plagiarism detection systemtext similarity calculationHash function