Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
[Objective]This paper proposes a new method based on pre-trained language models for completing ancient texts,utilizing representations obtained from pre-training models at different semantic levels and for simplified and traditional Chinese characters.The method constructs a mixture-of-experts system and a simplified-traditional Chinese fusion model to complete ancient texts.[Methods]We designed the mixture-of-experts system-based model for transmitted texts and constructed the simplified-traditional Chinese character fusion model for excavated literature.We fully integrated and explored the model's capabilities in different scenarios to improve its ability to complete ancient texts.[Results]We examined the new models with self-constructed datasets of transmitted and excavated texts.The models achieved accuracy of 70.14%and 57.13%for the completion task.[Limitations]We only utilized natural language processing approaches.Future improvements involve leveraging multimodal techniques,combining computer vision with natural language processing,and integrating image and semantic information to yield better results.[Conclusions]The proposed models achieve high accuracy on the constructed datasets of ancient literature,providing a competitive solution for completing ancient texts.
Digitization of Ancient BooksPre-trained Language ModelsMixture-of-Experts Systems