A Multi-Modal Assessment Framework for Comparison of Specialized Deep Learning and General-Purpose Large Language Models

扫码查看

原文链接

NETL
NSTL
IEEE

外文摘要：Recent years have witnessed tremendous advancements in Al tools (e.g., ChatGPT, GPT-4, and Bard), driven by the growing power, reasoning, and efficiency of Large Language Models (LLMs). LLMs have been shown to excel in tasks ranging from poem writing and coding to essay generation and puzzle solving. Despite their proficiency in general queries, specialized tasks such as metaphor understanding and fake news detection often require finely tuned models, posing a comparison challenge with specialized Deep Learning (DL). We propose an assessment framework to compare task-specific intelligence with general-purpose LLMs on suicide and depression tendency identification. For this purpose, we trained two DL models on a suicide and depression detection dataset, followed by testing their performance on a test set. Afterward, the same test dataset is used to evaluate the performance of four LLMs (GPT-3.5, GPT-4, Google Bard, and MS Bing) using four classification metrics. The BERT-based DL model performed the best among all, with a testing accuracy of 94.61%, while GPT-4 was the runner-up with accuracy 92.5%. Results demonstrate that LLMs do not outperform the specialized DL models but are able to achieve comparable performance, making them a decent option for downstream tasks without specialized training. However, LLMs outperformed specialized models on the reduced dataset.

外文关键词：

Deep learningChatbotsTestingDepressionLarge language modelsInternetHandsFake newsBig DataData models

作者：

Mohammad Nadeem、Shahab Saquib Sohail、Dag Øivind Madsen、Ahmed Ibrahim Alzahrani、Javier Del Ser、Khan Muhammad

展开 >

作者单位：

Department of Computer Science, Aligarh Muslim University, Aligarh, India

School of Computing Science and Engineering, VIT Bhopal University, Sehore, India

University of South-Eastern Norway, Notodden, Norway

Computer Science Department, Community College, King Saud University, Riyadh, Saudi Arabia

TECNALIA, Basque Research & Technology Alliance (BRTA), Derio, Spain|Department of Mathematics, University of the Basque Country (UPV/EHU), Leioa, Spain

Visual Analytics for Knowledge Laboratory (VIS2KNOW Lab), Department of Applied Artificial Intelligence, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul, South Korea

展开 >

出版年：

2025

DOI：

10.1109/TBDATA.2025.3536937

IEEE transactions on big data

ISSN：

年,卷(期)：2025.11(3)

参考文献量57