大语言模型中文问答正确性对比实验研究——以ChatGPT 3.5、Claude 1.0和文心一言2.1为例

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：[目的/意义]对大语言模型中文问答正确性进行实验测评研究,为中文用户使用大语言模型提供一定的指导作用.[方法/过程]针对科技、教育、医学、生活、旅游美食和哲学文化 6 个领域,分别设计常识性、专业性和开放性三类问题,每类 20 个问题,共计360 个问题.分别向ChatGPT 3.5、Claude 1.0 和文心一言2.1 提问,再针对回答进行正确性的人工评价.最后汇总评价结果,进行正确性的多方面对比分析.[结果/结论]实验分析表明中文语料数据的规模与质量,以及大语言模型的参数规模是影响大语言模型中文问答正确性的重要因素.

外文标题：A Comparative Experimental Study on the Accuracy of Chinese Question-answering in Large Language Models:Case Study of ChatGPT 3.5,Claude 1.0,and Wenxinyiyan 2.1

外文摘要：[Purpose/significance]The paper conducts an experimental evaluation study on the accuracy of Chinese question-an-swering in large language models,aims to provide guidance for the Chinese users of large language models.[Method/process]Aiming at the six fields of science and technology,education,medicine,life,tourism and food,philosophy and culture,this paper designs three types of questions:common sense,professionalism and openness,20 questions in each category,a total of 360 questions.It asks ques-tions to ChatGPT 3.5,Claude 1.0 and Wenxinyiyan 2.1 respectively,and then manually evaluates the correctness of the answers.Fi-nally,the evaluation results are summarized and the correctness is compared and analyzed in many aspects.[Result/conclusion]The experimental analysis indicates that the scale and quality of Chinese corpus data and the parameter scale of large language models,are important factors influencing the accuracy of Chinese question-answering in large language models.

外文关键词：

large language modelChinese question-answeringexperimental study

作者：

唐明伟、陈宙、丁晗萱、朱翼、顾明辉、陈羽

展开 >

作者单位：

南京审计大学计算机学院江苏南京 211815

关键词：

大语言模型中文问答实验研究

出版年：

2024

DOI：

10.3969/j.issn.1005-8095.2024.07.010

情报探索

福建省科技情报学会,福建省科技信息研究所

情报探索

CHSSCD

影响因子：0.52

ISSN：1005-8095

年,卷(期)：2024.(7)