Study on Identification of Tibetan News Element Based on RoBERTa-BiLSTM-CRF
News element recognition is a process of extracting key information entities such as time,location,people,organizations,and events from news texts,serving as the foundation for news content analysis.While sig-nificant progress has been made for Chinese news element recognition,few studies have been conducted for Ti-betan news and the existing element classification systems are rather coarse,making it difficult to comprehensive-ly cover various key information in Tibetan news reports.Therefore,in this paper,the element classification of Ti-betan news is refined into 10 categories.Meanwhile,addressing the challenges in Tibetan news texts such as un-clear word boundaries,numerous out-of-vocabulary words,and word polysemy,we propose a Tibetan news ele-ment recognition method based on RoBERTa-BiLSTM-CRF.This method first encodes Tibetan news texts using the RoBERTa pre-trained language model,then extracts features through BiLSTM and self-attention mecha-nism,and finally employs conditional random fields for sequence labeling to complete the recognition and classi-fication of news elements.Experiments conducted on our self-built dataset(Tibetan news)demonstrate the effec-tiveness of this method,achieving an F1 score of 88.8%.