Application and Measurement Validity Evaluation of Generative Artificial Intelligence in Content Analysis
This study aims to explore the application prospects and the possible validity loss of Generative Artificial Intelligence(AI)models such as GPT in content analysis research.By analyzing Chinese and English social media texts related to climate change,this study systematically evaluates the differences in measurement validity of GPT in coding three core concepts(i.e.,cognition,emotion,and stance)of journalism and communication studies across various dimensions:language/dataset,prompt-tuning strategy,and GPT model version.Additionally,it examines the potential reasons behind these differences.Findings reveal that GPT tends to over-interpret textual content and shows a bias toward"neutral texts".In multidimensional comparisons,no significant cross-linguistic/dataset differences were found;GPT-4 shows higher measurement validity in some categories compared to its 3.5 version.Also,the study discloses that the prompt-tuned GPT model canimprove coding accuracy to some extent,but introducing more examples may lead to a certain degree of validity loss.Furthermore,this research finds that the word-and semantic-level features of text can affect the measurement validity of GPT.
GPTLarge Language ModelContent AnalysisGenerative AIValidity