AIGC-Driven Research on Automatic Summarization of Ancient Classics:From Natural Language Understanding to Natural Language Generation
As a key task in natural language processing,automatic text summarization aims at compressing long text information and solving the problem of text information overload.Taking the biography corpus in the Twenty-four Histories as an example,this article explores the feasible ways of automatic abstraction for ancient texts driven by AIGC technology from the extractive and generative approaches,provides reference for the creative transformation and innovative development of ancient classics resources,and helps to realize the value of ancient classics under the concept of digital humanities.Firstly,semantic representations are created based on GujiBERT,SikuBERT and BERT-ancient-Chinese models,and the importance is ranked by LexRank algorithm.Secondly,abstracts are generated using GPT-3.5-turbo,GPT-4 and ChatGLM3 models,and fine-tuned ChatGLM3 and GPT-3.5-turbo models are developed.Finally,the extracted abstracts are evaluated using information coverage rate and information diversity metrics,while the generated abstracts are evaluated using rouge and mauve metrics.The study shows that SikuBERT has a stronger ability in semantic representation and comprehension of ancient texts in the extractive summarization task,and common large language models each have their own distinctive abilities in automatic summarization in the field of ancient classics,but lack the ability to summarize the main ideas.The summarization capability of the GPT-3.5-turbo and ChatGLM3 models can be effectively improved by fine-tuning with small sample datasets.
re-valuing ancient classicsautomatic summarizationSikuBERTlarge language models