Chunk-based Tibetan Dependency Parsing and Automatic Annotation Method
Dependency parsing is one of the core techniques in natural language processing,aiming to determine the syntactic structure of a sentence by analyzing the dependency relationships between words in a sentence.Cur-rently,the study of Tibetan dependency parsing is facing challenges such as difficulty in parsing long sentences and incomplete mapping of coarse-grained dependency conversions.To address these issues,a Tibetan depen-dency syntactic analysis and automatic annotation method based on chunks and fine-grained part-of-speech matching rules is proposed in this paper.This method begins with refining the Tibetan dependency syntax anno-tation system,then manually annotates datasets based on this system and extracts part-of-speech matching rules.Subsequently,it enhances the accuracy of parsing long sentences through Tibetan sentence chunk recogni-tion.Finally,it develops a prototype system named TDParser for automatic Tibetan dependency syntax annota-tion and constructs a Tibetan dependency syntax treebank containing 7 335 dependency syntax entries.Our ex-perimental results verified the performance of TDParser and the effectiveness of the automatic annotated data.