CFGT:A Lexicon-based Chinese Address Element Parsing Model
As a key step in the geocoding process,address element parsing directly affects the accuracy of geocoding.Due to the diversity and complexity of Chinese address expressions,two similar address texts may be completely different in geographical representation.Traditional address element parsing based on dictionary matching cannot handle ambiguous words well,thus showing poor recognition accuracy.A lexicon-based Chinese address element parsing model CFGT:collaborative flat-graph trans-former is proposed,which uses self-matched words,nearest contextual and other lexical information to enhance the character se-quence representation of address text,effectively curbing the ambiguity of address text expression.Specifically,the model first constructs two collaboration graphs,flat-lattice and flat-shift,to capture the knowledge of self-matched words and nearest contex-tual words for address characters,and designs a fusion layer to implement collaboration between graphs.Secondly,with the help of the improved relative position encoding,the enhancing effect of word information on the address text character sequence is fur-ther strengthened.Finally,Transformer and conditional random fields are used to analyze address elements.Experiments are con-ducted on multiple public datasets such as Weibo and Resume,as well as the private dataset Address.Experimental results show that the performance of the CFGT is superior to previous Chinese address element parsing models and existing models in the field of Chinese named entity recognition.
Chinese address recognitionLexicon enhancementExternal informationNamed entity recognition