基于音形码的地名地址数据相似度计算与去重方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：重复数据的处理是地名地址数据治理时一项重要的任务.本文针对广东省地名地址数据库存在的重复数据的问题,提出了一种基于音形码汉字相似度的计算方法,介绍了基于音形码地名地址去重的原理、流程和方法,并结合相关原理开发地名地址数据去重软件.以荔湾区地名地址数据为实验数据,通过软件计算荔湾区地名地址数据库中数据的相似度,结合去重规则和距离的差异进行数据判断,解决地名地址数据库重复的问题,保证数据库的准确性.实验结果表明,该软件对重复数据的匹配程度较高,地名地址数据重复的问题可以通过音形码和距离双驱动方法得到有效解决,为其他区域地名地址数据治理提供可靠的解决方案.

外文标题：Similarity calculation and duplication method of geographical name and address data based on phonetic code

外文摘要：The processing of duplicate data is an important task in the management of geographical name and address data.To address the problem of duplicate data in the geographical name and address database of Guangdong Province,this paper proposed a method to calculate Chinese character similarity based on phonetic codes and introduced the principle,process,and method of de-duplication of geographical names and addresses based on phonetic codes.In addition,according to relevant principles,the geographical name and address data deduplication software was developed.This paper took the geographical name and address data of Liwan District as experimental data,calculated the similarity of data in the geographical name and address database of Liwan District by software,and judged the data duplication by the duplication rule and the difference of distance.As a result,it solved the problem of duplicate data in the geographical name and address database and ensured the accuracy of the database.The experimental results show that the software can match duplicate data with high accuracy,and the problem of duplicate geographical name and address data can be effectively solved by the dual drive method of phonetic codes and distance,providing a reliable solution for the management of geographical names and addresses in other regions.

外文关键词：

geographical name and addressphonetic codesimilaritydistancededuplication

作者：

严海峰、简梓红、江秀明

展开 >

作者单位：

广东省地图院,广东广州 510075

广东省测绘工程有限公司,广东广州 510663

关键词：

地名地址音形码相似度距离去重

基金：

广东省科技计划

项目编号：

2021B1111610001

出版年：

2024

DOI：

10.19580/j.cnki.1007-3000.2024.09.006

北京测绘

北京市测绘设计研究院,北京测绘学会

北京测绘

影响因子：0.55

ISSN：1007-3000

年,卷(期)：2024.38(9)