Similarity calculation and duplication method of geographical name and address data based on phonetic code
The processing of duplicate data is an important task in the management of geographical name and address data.To address the problem of duplicate data in the geographical name and address database of Guangdong Province,this paper proposed a method to calculate Chinese character similarity based on phonetic codes and introduced the principle,process,and method of de-duplication of geographical names and addresses based on phonetic codes.In addition,according to relevant principles,the geographical name and address data deduplication software was developed.This paper took the geographical name and address data of Liwan District as experimental data,calculated the similarity of data in the geographical name and address database of Liwan District by software,and judged the data duplication by the duplication rule and the difference of distance.As a result,it solved the problem of duplicate data in the geographical name and address database and ensured the accuracy of the database.The experimental results show that the software can match duplicate data with high accuracy,and the problem of duplicate geographical name and address data can be effectively solved by the dual drive method of phonetic codes and distance,providing a reliable solution for the management of geographical names and addresses in other regions.
geographical name and addressphonetic codesimilaritydistancededuplication