A Comparative Experimental Study on the Accuracy of Chinese Question-answering in Large Language Models:Case Study of ChatGPT 3.5,Claude 1.0,and Wenxinyiyan 2.1
[Purpose/significance]The paper conducts an experimental evaluation study on the accuracy of Chinese question-an-swering in large language models,aims to provide guidance for the Chinese users of large language models.[Method/process]Aiming at the six fields of science and technology,education,medicine,life,tourism and food,philosophy and culture,this paper designs three types of questions:common sense,professionalism and openness,20 questions in each category,a total of 360 questions.It asks ques-tions to ChatGPT 3.5,Claude 1.0 and Wenxinyiyan 2.1 respectively,and then manually evaluates the correctness of the answers.Fi-nally,the evaluation results are summarized and the correctness is compared and analyzed in many aspects.[Result/conclusion]The experimental analysis indicates that the scale and quality of Chinese corpus data and the parameter scale of large language models,are important factors influencing the accuracy of Chinese question-answering in large language models.
large language modelChinese question-answeringexperimental study