摘要
网络文本情感分析是研究网络事件、洞察公众态度与舆情动态的重要方法.但现有的中文情感语料库在标注体系、样本规模和领域适配性上存在局限,限制了其在信息处理中的应用.因此,研究聚焦微博热点社会事件评论文本,旨在编制具有较高领域适用性且基于情感维度进行大样本标注的高频词情感语料库.研究利用Python工具采集网络文本,经过分词和清洗后筛选出高频词.63名有效被试对筛选出的661个高频词在愉悦度、唤醒度、优势度和趋向度四个情感维度上进行了 9点评定.结果显示:愉悦度、唤醒度和优势度的累计贡献率超过99%,说明评定能够较全面地捕捉到词的情感信息.此外,通过因子分析共提取出两个主成分:第一个主成分主要反映读者感受自身正向或负向情感体验并产生相应趋向或回避行为的程度,第二个主成分反映读者体验到的感情的强度和受控制程度.该语料库不仅丰富了网络文本情感分析的资源,也可作为后续网络社会热点事件分析和舆情预测的辅助工具,具有较高的理论和应用价值.
Abstract
Sentiment analysis of online texts is a key approach for studying online events,understanding public attitudes,and tracking trends in public opinion.However,existing Chinese sentiment corpora face limitations in annotation system,sample size,and domain adaptability,restricting their application in information processing.Therefore,this study focuses on comments related to popular social events on Weibo,aiming to develop a high-frequency word sentiment corpus with strong domain applicability and large-scale dimensional sentiment annotation.Using Python,online texts were collected,segmented,and cleaned to extract high-frequency words.A total of 63 valid participants rated 661 selected high-frequency words across four affective dimensions—valence(pleasure-displeasure),arousal,dominance,and approach-avoidance—on a 9-point scale.The results of exploratory factor analysis show that the cumulative contribution rate of pleasure,arousal,and dominance exceeds 99%,indicating that the assessment comprehensively captured the emotional information of vocabulary.Additionally,two principal components were extracted:the first component primarily reflects the extent to which readers perceive positive or negative emotional experiences and generate corresponding approach or avoidance tendencies,while the second reflects the intensity and level of control over these emotions.This corpus not only enriches resources for sentiment analysis of online texts but also serves as an auxiliary tool for analyzing hot social events and forecasting public opinion,demonstrating significant theoretical and practical value.