一种基于编辑距离的中文字符串近似匹配算法
A Chinese String Matching Algorithm Based on Edit Distance
王昭 1薛晨浩 1裴卓雄1
作者信息
- 1. 国家计算机网络应急技术处理协调中心山西分中心,山西 太原 030012
- 折叠
摘要
字符串近似匹配是模式匹配领域中的一个重要研究方向.在中文字符串近似匹配中,基于字符操作的编辑距离不能准确衡量由复制、剪贴等操作导致的相似关系.基于此,在传统编辑距离的基础上引入了字符串的平移和复制操作,给出了一种在贪心算法基础上进行动态规划搜索的计算方法,能有效计算改进的编辑距离,在真实数据集上的实验结果和分析显示了对文本检索的有效性.
Abstract
Approximate string matching is an important research direction in the field of pattern matching.In Chinese string approximate matching,the edit distance based on character operation cannot accurately measure the similarity relationship caused by string copy and cut operations.This paper introduces the shift and copy operations of strings on the basis of the traditional edit distance,and presents a calculation method for dynamic programming search based on the greedy algorithm,which can effectively calculate the improved edit distance.Experimental re-sults and analysis show the effectiveness for text retrieval.
关键词
字符串匹配/近似匹配/动态规划算法/编辑距离Key words
string matching/approximate matching/dynamic programming algorithm/edit distance引用本文复制引用
出版年
2024