人文社科领域中文通用大模型性能评测

Performance Evaluation of Chinese Universal Large Model in the Field of Humanities and Social Sciences

赵志枭 ¹胡蝶 ¹刘畅 ¹沈思 ²王东波¹

扫码查看

作者信息

1. 南京农业大学信息管理学院南京 210095;南京农业大学人文与社会计算研究中心南京 210095
2. 南京理工大学经济管理学院南京 210094
折叠

摘要

[目的/意义]以人文社科领域为出发点,从人文社科领域基础知识与人文社科学术文本两个方面入手进行人文社科领域模型性能比对.旨在为人文社科领域提供一份体系化的大模型评测基准,供人文社科相关领域研究人员参考.[方法/过程]设计7个人文社科领域相关的评测任务并选取对应指标,在此基础上,选取当前开源且性能较优的通用领域中文大模型,通过调用本地模型以问答形式完成领域化任务,并选取相关指标对其在人文社科领域的性能进行量化评测.[结果/结论]评测结果表明,在选取的开源模型中,无论是基座模型还是对话模型,Qwen性能最优、Baichuan2紧随其后、InternLM次之、Atom表现最差,此外,大多数情况下,相较于基座模型,对话模型表现出更加优越的性能.

Abstract

[Purpose/Significance]This paper Starting from the field of humanities and social sciences,this paper compares the model performance of humanities and social sciences from the aspects of basic knowledge and academic texts.It aims to provide a systematic large language model evaluation benchmark for the humanities and social sciences,and the reference for researchers in related fields.[Method/Process]Seven evaluation tasks related to the field of humanities and social sciences were designed and corresponding indicators were selected.On this basis,the current open-source and high-performance general-purpose domain Chinese large language models were selected to complete the domain-specific tasks in the form of questions and answers by invoking the local models,and their performance in humanities and social sciences was quantitatively evaluated by selecting relevant indica-tors.[Result/Conclusion]The evaluation results show that among the open-source models selected in this paper,Qwen has the best performance,followed by Baichuan2,InternLM,and Atom has the worst in both the base model and the dialog model.Moreover,in most cases,the dialog model shows more superior performance compared to the base model.

关键词

人文社科/大模型评测/领域知识/学术文本

Key words

humanities and social science/large model evaluation/domain knowledge/academic texts

引用本文复制引用

基金项目

江苏省社科基金后期资助项目(23HQBO63)

出版年

2024

图书情报工作

中国科学院文献情报中心

图书情报工作

CSTPCDCSSCICHSSCD北大核心

影响因子：2.203

ISSN：0252-3116

参考文献量9

段落导航