As the important foundation of entity relationship extraction research,the construction of a high-quality,standardized corpus can improve the precision and recall of the entity relationship extraction task.At present,the construction of Tibetan relationship extraction corpus mostly relies on traditional manual annotation methods and is limited to specific domains,which has the problems of low annotation efficiency and relative lack of person relationship corpus.Therefore,this paper constructs a Tibetan person-entity recognition corpus;by analyzing person-relationship features and entity-relationship categories and their annotation specifications,and constructing a trigger word dictionary for corpus back-labeling,it generates 15 400 entity-recognition and 8 000 Tibetan person-relationship extraction annotated corpora.In order to verify the usability of the corpus,the named entity recognition and relationship extraction experiments are utilized for statistical analysis,and its entity recognition F1 value reaches 67.2%,and its relationship extraction F1 value reaches 66.2%,which shows that the construction of this corpus provides a data basis for the subsequent research oriented to the Tibetan character relationship extraction.
CorpusCharacter relationship extractionTibetan textTrigger words