[Objective]This paper proposes an integrated model incorporating radical information to improve the low accuracy and efficiency of existing automatic word segmentation and part-of-speech tagging for Classical Chinese.[Methods]Based on over 70,000 Chinese characters and their radicals,we constructed a radical vector representation model,Radical2Vector.We combined this model with SikuRoBERTa for representing Classic Chinese texts,forming an integrated BiLSTM-CRF model as the main experimental framework.Additionally,we designed a dual-layer scheme for word segmentation and part-of-speech tagging.Finally,we conducted experiments on the Zuo Zhuan dataset.[Results]The model achieved an F1 score of 95.75%for the word segmentation task and 91.65%for the part-of-speech tagging task.These scores represent 8.71%and 13.88%improvements over the baseline model.[Limitations]The approach only incorporates a single radical for each character and does not utilize other components of the characters.[Conclusions]The proposed model successfully integrates radical information,effectively enhancing the performance of textual representation for Classical Chinese.This model demonstrates exceptional performance in word segmentation and part-of-speech tagging tasks.
Word SegmentationPart-Of-Speech TaggingAncient Chinese Information Processing