首页|基于关键词扩展的社会化问答社区短文本分类研究——以法律问答社区为例

基于关键词扩展的社会化问答社区短文本分类研究——以法律问答社区为例

扫码查看
[研究目的]将关键词词向量特征扩展方法应用于社会化问答社区的短文本分类,解决问题短文本特征稀疏和语义不明确的缺陷,提高问答社区的信息服务质量.[研究方法]结合TF-IDF与Word2vec扩展关键词特征,以增强短文本的语义信息.将CNN特征提取、BiLSTM上下文信息捕捉和Attention权重分配的优势相结合,构建CNN-BiLSTM-Attention模型.以爬取到的"找法网"数据集为例,经过关键词词向量特征扩展后,利用CNN-BiL-STM-Attention模型实现法律问答社区短文本的有效分类.[研究结论]通过在8个法律主题上的实证研究表明,关键词扩展后的分类效果高于扩展前,且关键词扩展数量达到13个时分类效果最优;利用CNN-BiLSTM-Attention模型对扩展后的法律问答短文本进行分类,分类准确率达到97.63%,与其他几种分类器相比,该模型的分类准确率平均高出1.08%.
Short Text Classification Research on Socialized Q&A Community Based on Keyword Expansion:Taking the Legal Q&A Community as an Example
[Research purpose]Applying the keyword vector feature extension method to short text classification in socialized Q&A com-munities,addressing the shortcomings of sparse and unclear semantic features in problem short texts,and improving the information service quality of Q&A communities.[Research method]By combining TF-IDF with Word2vec to extend keyword features,the semantic infor-mation of short texts is enhanced.Leveraging the advantages of CNN for feature extraction,BiLSTM for capturing contextual information,and Attention mechanisms for weight allocation,a CNN-BiLSTM-Attention model is constructed.Taking the data from"china.findlaw.cn"as an example,after extending the keyword vector features,the CNN-BiLSTM-Attention model is utilized to effectively classify short texts in the legal Q&A community.[Research conclusion]Empirical research on 8 legal topics shows that the classification performance is improved after keyword expansion,and the optimal classification performance is achieved when the number of expanded keywords rea-ches 13.Using the CNN-BiLSTM-Attention model to classify the expanded legal Q&A short texts,the classification accuracy reaches 97.63%,which is 1.08%higher on average compared to several other classifiers.

keyword features extensionsocialized Q&A communityshort text classificationdeep learningCNN-BiLSTM-Attention model

臧志栋、韩挺、李秀霞

展开 >

扬州大学社会发展学院 扬州 225009

曲阜师范大学传媒学院 日照 276826

关键词特征扩展 社会化问答社区 短文本分类 深度学习 CNN-BiLSTM-Attention模型

2024

情报杂志
陕西省科学技术信息研究所

情报杂志

CSTPCDCSSCICHSSCD北大核心
影响因子:1.502
ISSN:1002-1965
年,卷(期):2024.43(12)