Rich sentiment information is embedded in financial texts,which is of great significance for capturing fluctuations in fi-nancial market sentiment,aiding investor with decision-making,and implementing financial risk management.However,sentiment annotation in financial texts requires extensive domain expertise,making manual annotation costly.This paper designs an automatic annotation strategy based on distant supervision guided by emojis,utilizing the sentiment connotations conveyed by emojis in finan-cial texts to automatically label the sentiment polarity,thereby constructing a foundational labeled dataset.On this basis,the continu-al learning algorithm is employed to train a financial text sentiment classifier,predicting sentiment for unlabeled data and generating pseudo-labeled samples,and further augmenting the labeled dataset.Ultimately,a large-scale Chinese financial sentiment analysis dataset named StockSentCN,encompassing over 9.23 million stock comments,is automatically constructed.Under the human evalu-ation system,the Kappa consistency coefficient of the dataset reached 0.85,and the weighted average F1 score reached 90.34%,proving the high quality and reliability of the constructed dataset.The dataset is publicly available at:https://github.com/lidayuls/StockSentCN/.
Chinese financial sentiment analysisstock market sentimentdataset constructionemojiscontinual learning