A Study on Automatic Categorization of the Siku Quanshu Based on a Large Lan-guage Model
The craze of ancient book research and the contemporary requirement of ancient book revital-isation have raised higher requirements for automatic classification of ancient books.This study explores the classification effect of Xunzi large language series models on the automatic classification of ancient books by combining the large language model along the current preface with the 25 categories of corpus from the histo-ry and scripture sections of the Siku Quanshu as the input corpus.Through the comparison experiments with its base model,the results show that Xunzi large language models for ancient books have obvious advantages in the automatic classification task of ancient books,among which the Xunzi-Baichuan2-7B large language model has the most significant advantage in the automatic classification task of ancient books,and the overall classification Fl value reaches 96.90%.In addition,the experiments of adjusting the training data size show that the Xunzi-Baichuan2-7B large language model is able to achieve comparable classification results with the base model with only a small amount of data.Therefore,the automatic classification model for ancient books based on Xunzi large language models for ancient books proposed in this study can achieve efficient fine-grained classification of ancient books and opens up a new way for the classification of ancient books in resource-constrained contexts.
Siku QuanshuClassification modelsXunzi large language modelAutomatic text classification