Research on Unsupervised Automatic Intertextual Discovery Based on Large Models of Ancient Books
[Purpose/Significance]For the study of highly intertextual pre-Qin classics,an unsupervised intertextual automatic discovery process is established to better carry out content analysis and textual research of ancient books based on large language models,which improves work efficiency.[Method/Process]It com-paratively analyzed different technical routes of existing language models,and introduced a contrastive learning framework to train intertextual automatic discovery models in an unsupervised manner.By constructing an idiom origin tracing tasks to evaluate a series of models,it selected the optimal model.[Result/Conclusion]The re-sults of idiom origin tracing show that there are still a large number of factual errors in the current Chat LLMs.The ESimCSE-GujiRoBERTa model has achieved the best results in the idiom origin tracing task.This model shows excellent semantic discrimination ability in the intertextual recognition of citations in the classics of pre-Qin scholars.At the same time,judging from the results of the intertextual identification of the"Chun Qiu San Zhuan",the automatic discovery of intertextuality can provide a useful perspective for the textual research and collation of ancient books.
ancient book intertextualitylanguage modelunsupervised learningcontrastive learning