首页|基于Simhash算法的题库查重系统的设计与实现

基于Simhash算法的题库查重系统的设计与实现

扫码查看
Simhash算法是一种基于局部敏感哈希(LSH)的技术,以其快速的计算速度和高度的查重准确性而知名.该算法通过将文本特征转换为二进制码,进而通过计算这些二进制码之间的汉明距离来评估文本的相似度.在文本去重和重复文档检测等多个领域,Simhash算法已经展现出了显著的效果.鉴于此,将Simhash算法应用于题库查重具有很高的可行性和实际应用价值.
Design and Implementation of Question Bank Plagiarism Detection System Based on Simhash Algorithm
The Simhash algorithm is a technique based on Locality Sensitive Hashing(LSH),known for its rapid computation speed and high accuracy in plagiarism detection.This algorithm converts text features into binary codes and evaluates the similarity of texts by calculating the Hamming distance between these binary codes.In various fields such as text deduplication and duplicate document detection,the Simhash algorithm has demonstrated significant effectiveness.Therefore,applying the Simhash algorithm to question bank plagiarism detection is highly feasible and has practical application value.

Simhash algorithmHamming distancequestion bank plagiarism detection systemtext similarity calculationHash function

熊良钰、邓伦丹

展开 >

南昌大学科学技术学院,江西共青城

Simhash算法 汉明距离 题库查重系统 文本相似度计算 哈希函数

2024

科学技术创新
黑龙江省科普事业中心

科学技术创新

影响因子:0.842
ISSN:1673-1328
年,卷(期):2024.(9)
  • 13