首页|大语言模型中文问答正确性对比实验研究——以ChatGPT 3.5、Claude 1.0和文心一言2.1为例

大语言模型中文问答正确性对比实验研究——以ChatGPT 3.5、Claude 1.0和文心一言2.1为例

扫码查看
[目的/意义]对大语言模型中文问答正确性进行实验测评研究,为中文用户使用大语言模型提供一定的指导作用.[方法/过程]针对科技、教育、医学、生活、旅游美食和哲学文化 6 个领域,分别设计常识性、专业性和开放性三类问题,每类 20 个问题,共计360 个问题.分别向ChatGPT 3.5、Claude 1.0 和文心一言2.1 提问,再针对回答进行正确性的人工评价.最后汇总评价结果,进行正确性的多方面对比分析.[结果/结论]实验分析表明中文语料数据的规模与质量,以及大语言模型的参数规模是影响大语言模型中文问答正确性的重要因素.
A Comparative Experimental Study on the Accuracy of Chinese Question-answering in Large Language Models:Case Study of ChatGPT 3.5,Claude 1.0,and Wenxinyiyan 2.1
[Purpose/significance]The paper conducts an experimental evaluation study on the accuracy of Chinese question-an-swering in large language models,aims to provide guidance for the Chinese users of large language models.[Method/process]Aiming at the six fields of science and technology,education,medicine,life,tourism and food,philosophy and culture,this paper designs three types of questions:common sense,professionalism and openness,20 questions in each category,a total of 360 questions.It asks ques-tions to ChatGPT 3.5,Claude 1.0 and Wenxinyiyan 2.1 respectively,and then manually evaluates the correctness of the answers.Fi-nally,the evaluation results are summarized and the correctness is compared and analyzed in many aspects.[Result/conclusion]The experimental analysis indicates that the scale and quality of Chinese corpus data and the parameter scale of large language models,are important factors influencing the accuracy of Chinese question-answering in large language models.

large language modelChinese question-answeringexperimental study

唐明伟、陈宙、丁晗萱、朱翼、顾明辉、陈羽

展开 >

南京审计大学计算机学院 江苏南京 211815

大语言模型 中文问答 实验研究

2024

情报探索
福建省科技情报学会,福建省科技信息研究所

情报探索

CHSSCD
影响因子:0.52
ISSN:1005-8095
年,卷(期):2024.(7)