大语言模型的出现对自然语言处理产生了广泛的影响,已有研究表明大语言模型在各类下游任务中具有出色的Zero-shot及Few-shot能力,而对于大语言模型的语义分析能力的评估仍然比较缺乏。因此,本文基于汉语框架语义分析中的三个子任务:框架识别、论元范围识别和论元角色识别,分别在Zero-shot及Few-shot设定下评估了ChatGPT、Gemini和ChatGLM三个大语言模型在CFN2。0数据集上的语义分析能力,并与目前基于BERT(Bi-directional Encoder Representations from Transformers)的SOTA模型进行了比较。在框架识别任务中,大语言模型的准确率仅比SOTA模型低0。04;但在论元范围识别与论元角色识别任务上,大语言模型表现不佳,与SOTA(State of the Art)模型相比,F1分数分别相差0。13和0。39。以上结果表明,大语言模型虽具备一定的框架语义分析能力,但进一步提升大语言模型的语义分析能力仍然是一个具有挑战性的工作。
Evaluation of Chinese Frame Semantic Analysis Capabilities of Large Language Models
The emergence of large language models(LLMs)has a widespread impact on natural language processing.Studies have shown that the LLMs have excellent Zero-shot and Few-shot capabilities in various downstream tasks,but the evaluation of the se-mantic analysis capabilities of the LLMs is still lacking.Therefore,based on three subtasks in Chinese frame semantic analysis:frame identification,argument identification,and role identification,this paper evaluates the semantic analysis capabilities of three LLMs,namely ChatGPT,Gemini,and ChatGLM,on the CFN2.0 dataset under Zero-shot and Few-shot settings,and compares them with the current BERT-based SOTA model.In the frame identification task,the accuracy of the LLMs is only 0.04 lower than that of the SOTA model.However,in the argument identification and role identification task,the performance of the LLMs is suboptimal,with F1 scores differing by 0.13 and 0.39,respectively compared to the SOTA model.The above results show that although the LLMs have certain frame semantic analysis capabilities,further improving the semantic analysis capabilities of LLMs is still a challenging work.
large language modelframe identificationargument identificationrole identification