首页|New Findings from Nanjing Agricultural University Describe Advances in Machine Learning (Automatic Sentence Segmentation for Classical Chinese: the Spring and Autumn Annals As an Example)
New Findings from Nanjing Agricultural University Describe Advances in Machine Learning (Automatic Sentence Segmentation for Classical Chinese: the Spring and Autumn Annals As an Example)
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
Oxford Univ Press
Current study results on Machine Learning have been published. According to news reporting originating in Nanjing, People's Republic of China, by NewsRx journalists, research stated, “There exists no sentence boundary in most classical Chinese literature texts. Since it is difficult to read literature of this kind, experts in literature or linguistics would segment the sentence manually.” Financial support for this research came from National Office of Philosophy and Social Sciences. The news reporters obtained a quote from the research from Nanjing Agricultural University, “This article explores the effectiveness of classical Chinese sentence segmentation method so as to provide a reference for classical Chinese punctuation. On the basis of the machine learning methods, we chose three components of machine learning, namely models, tagging schemes, and features, to compare the learning results. The models include conditional random field (CRF) models, long short term memory (LSTM) models, BiLSTM-CRF models, and three Bidirectional Encoder Representation from Transformers (BERT) models. There are five tagging schemes in this article and three features including the statistical feature, Guangyun, and Fanqie. Finally, the performance of the combined feature template is evaluated by ten-fold cross-validation on four classical Chinese texts in different genres. The SikuBERT model is proved to be the most effective model for sentence segmentation at present. Different tagging schemes and various features are introduced. The results show that 5-tag-J tagging schemes can improve performance. Statistical feature, as an important clue for classical Chinese sentence segmentation, is useful in related tasks, but Guangyun and Fanqie have little impact.”
NanjingPeople's Republic of ChinaAsiaCyborgsEmerging TechnologiesMachine LearningNanjing Agricultural University