Performance Evaluation of Chinese Universal Large Model in the Field of Humanities and Social Sciences
[Purpose/Significance]This paper Starting from the field of humanities and social sciences,this paper compares the model performance of humanities and social sciences from the aspects of basic knowledge and academic texts.It aims to provide a systematic large language model evaluation benchmark for the humanities and social sciences,and the reference for researchers in related fields.[Method/Process]Seven evaluation tasks related to the field of humanities and social sciences were designed and corresponding indicators were selected.On this basis,the current open-source and high-performance general-purpose domain Chinese large language models were selected to complete the domain-specific tasks in the form of questions and answers by invoking the local models,and their performance in humanities and social sciences was quantitatively evaluated by selecting relevant indica-tors.[Result/Conclusion]The evaluation results show that among the open-source models selected in this paper,Qwen has the best performance,followed by Baichuan2,InternLM,and Atom has the worst in both the base model and the dialog model.Moreover,in most cases,the dialog model shows more superior performance compared to the base model.
humanities and social sciencelarge model evaluationdomain knowledgeacademic texts