GPT在文本分析中的应用:一个基于Stata的集成命令用法介绍

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：数字智能时代,深度挖掘文本数据的价值日益重要,大语言模型的出现为此提供了新的契机.本文在人工智能技术赋能实证研究发展的框架下,首先分析了文献中词典法、文本相似度、监督式机器学习等传统文本分析方法的局限性;其次论述了大语言模型的优势及其对实证研究、特别是文本分析的赋能作用;在此基础上,借助由本文作者编制的Stata命令chatgpt,通过一系列案例展示了 GPT在提高文本数据处理效率、优化文本指标刻画能力、增强文本指标衡量精度以及丰富文本指标信息含量上的核心优势.本文认为,大语言模型将极大地释放文本数据价值,在基于文本分析方法的相关实证研究中具有巨大的应用潜力.

外文标题：Application of GPT in Textual Analysis:An Introduction to A Community Developed Command within Stata

外文摘要：In the context of the"Credibility Revolution"over the past four decades,empirical research has emerged as the predominant research paradigm in modern economics.With the rapid advancement of information technology,textual data available for empirical analysis,such as listed company annual reports,analyst research reports,news commentaries,social media,and earnings conference calls,are becoming increasingly abundant.This provides new perspectives for classic research questions in the fields of finance and accounting and offers new solutions to previously challenging problems.However,constrained by the limitations of traditional text analysis techniques,existing text metrics have consistently struggled to achieve satisfactory performance in terms of result credibility,cost,and repeatability.The academic community continues to anticipate a more effective text processing tool to facilitate the integration of textual information into empirical research.As the latest significant technological achievement in the field of artificial intelligence,large language models have demonstrated powerful capabilities in addressing various tasks within natural language processing and have made a significant impact on the entire field of artificial intelligence.Unlike traditional natural language processing,large language models focus more on building a universal,intelligent,and smoothly interactive processing system.Recent applications such as ChatGPT and API interfaces,which are based on OpenAI's Large Language Model GPT(Generative Pre-trained Transformer),can generate responses based on patterns and statistical regularities observed during pre-training and can interact in context with conversations.This study summarizes four application characteristics of large language models,namely,"virtual agent,""knowledge repository,"high indicator credibility,and low information processing costs,demonstrating their empowering role in empirical research,particularly in text analysis.Building on this foundation and utilizing a self-written Stata command"chatgpt,"which interacts with GPT within Stata,through a series of cases,we demonstrate how large language models can address the performance deficiencies of text metrics by improving text data processing efficiency,enhancing the characterization capability of text indicators,increasing the measurement accuracy of text indicators,and enriching the informational content of text indicators.We combine specific designed cases such as data cleaning,training set labeling,and precise segmentation to propose the effects of integrating large language models into empirical research.We further construct a novel measurement for topic classification and tone based on GPT.This approach significantly unleashes the value of textual data,advancing empirical research in the field of text analysis.Large language models are currently in a rapid development phase.As societal demands and computational resources continue to grow and concentrate,they will accumulate data and experience at an accelerated pace,iterating toward sophisticated versions.Therefore,in empirical research,the impact of large language models extends beyond the ecology of text analysis.Novel research directions and perspectives will emerge in economics and management,urging researchers to explore.First,large language models should be fine-tuned.Given the specialized nature of textual content in fields such as finance and accounting,fine-tuning large language models with high-quality relevant corpora can enhance the model's performance in those domains,serving as a necessary step to further promote academic research in text analysis.Second,large language models should be applied in behavioral economics.The"virtual agent"feature of large language models enables them to serve as virtual subjects,replacing human participants and significantly reducing experimental costs and difficulties in disciplines such as behavioral economics and economic psychology,thereby creating a broader space for academic exploration.Third,they should be used for economic forecasting.The ability of large language models to identify information from text extends beyond content processing,involving deep processing and incremental information generation based on their"knowledge repository"role.This study suggests that,when organically combined with existing models,this capability can inspire further expansion in areas such as asset pricing or macroeconomic forecasting.

外文关键词：

Textual AnalysisGPTLarge Language ModelNatural Language ProcessingTone

作者：

李春涛、闫续文、张学人

展开 >

作者单位：

河南大学国际商学院

中南财经政法大学金融学院

河南豫能控股股份有限公司

武汉大学经济与管理学院

展开 >

关键词：

文本分析 GPT 大语言模型自然语言处理文本语调

基金：

国家自然科学基金面上项目

项目编号：

72072051

出版年：

2024

数量经济技术经济研究

数量经济与技术经济研究所

数量经济技术经济研究

CSTPCDCSSCICHSSCD北大核心

影响因子：1.069

ISSN：1000-3894

年,卷(期)：2024.41(5)

被引量1
参考文献量81