网络社会事件评论的高频词情感语料库构建研究

On the Construction of a High-frequency Word Sentiment Corpus for Online Social Event Comments

王雨函 ¹廖丹 ²梁国栋³

扫码查看

作者信息

1. 广东技术师范大学教育科学学院,广东广州 510665
2. 华南理工大学设计学院,广东广州 510006
3. 华南理工大学电子与信息学院,广东广州 510641
折叠

摘要

网络文本情感分析是研究网络事件、洞察公众态度与舆情动态的重要方法.但现有的中文情感语料库在标注体系、样本规模和领域适配性上存在局限,限制了其在信息处理中的应用.因此,研究聚焦微博热点社会事件评论文本,旨在编制具有较高领域适用性且基于情感维度进行大样本标注的高频词情感语料库.研究利用Python工具采集网络文本,经过分词和清洗后筛选出高频词.63名有效被试对筛选出的661个高频词在愉悦度、唤醒度、优势度和趋向度四个情感维度上进行了 9点评定.结果显示:愉悦度、唤醒度和优势度的累计贡献率超过99％,说明评定能够较全面地捕捉到词的情感信息.此外,通过因子分析共提取出两个主成分:第一个主成分主要反映读者感受自身正向或负向情感体验并产生相应趋向或回避行为的程度,第二个主成分反映读者体验到的感情的强度和受控制程度.该语料库不仅丰富了网络文本情感分析的资源,也可作为后续网络社会热点事件分析和舆情预测的辅助工具,具有较高的理论和应用价值.

Abstract

Sentiment analysis of online texts is a key approach for studying online events,understanding public attitudes,and tracking trends in public opinion.However,existing Chinese sentiment corpora face limitations in annotation system,sample size,and domain adaptability,restricting their application in information processing.Therefore,this study focuses on comments related to popular social events on Weibo,aiming to develop a high-frequency word sentiment corpus with strong domain applicability and large-scale dimensional sentiment annotation.Using Python,online texts were collected,segmented,and cleaned to extract high-frequency words.A total of 63 valid participants rated 661 selected high-frequency words across four affective dimensions—valence(pleasure-displeasure),arousal,dominance,and approach-avoidance—on a 9-point scale.The results of exploratory factor analysis show that the cumulative contribution rate of pleasure,arousal,and dominance exceeds 99％,indicating that the assessment comprehensively captured the emotional information of vocabulary.Additionally,two principal components were extracted:the first component primarily reflects the extent to which readers perceive positive or negative emotional experiences and generate corresponding approach or avoidance tendencies,while the second reflects the intensity and level of control over these emotions.This corpus not only enriches resources for sentiment analysis of online texts but also serves as an auxiliary tool for analyzing hot social events and forecasting public opinion,demonstrating significant theoretical and practical value.

关键词

网络评论文本/高频词/语料库/情感维度

Key words

online comment text/high-frequency word/corpus/affective dimension

引用本文复制引用

出版年

2025

华南理工大学学报(社会科学版)

华南理工大学

华南理工大学学报(社会科学版)

影响因子：0.477

ISSN：1009-055X

段落导航