融合标签语义嵌入和图卷积的短文本特征扩展及分类方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对短文本长度过短、关键词偏少和标签信息利用不足造成的分类过程中面临特征稀疏和语义不明确的问题,提出了融合标签语义嵌入的图卷积网络模型.首先,在传统的术语频率和逆文档频率算法基础上,提出了融合单词所属文本的类间、类内分布关系的全局词频提取算法.其次,利用融合标签嵌入的方法,将每条训练文本与相对应的标签引入到同一个特征空间内,通过筛选聚合提取更能突显文本类别的近义词嵌入,作为文本图的文档节点的嵌入表示.最后,将文本图输入到图卷积神经网络学习后,获得的特征与预训练模型提取文本上下文的特征相融合,提升短文本的分类质量以及整个模型的泛化能力,在4个短文本数据集MR、web_snippets、R8和R52上对本文模型和14个基线算法进行了对比实验,结果表明本文提出的模型相比于对比模型具有更优的结果,在识别精度、召回率以及F1值上有着更好的表现.

外文标题：Short Text Feature Extension and Classification Method Based on Semantic Embedding of Tags and Graph Convolution Network

外文摘要：In short text classification,too short text length,fewer keywords and underutilization of the label information leads to the severe problems of sparse features and ambiguous semantics,which can affect the performance of short text classification.Agraph convolution network model based on tag semantic embedding is proposed for the problem.Firstly,according to TF/IDF,a new word frequency method is proposed,which comprehensively considers the inter-class and intra-class distribution of words in the global corpus.Then,through By label embedding method,each training text with the corresponding label is mapped into one feature space in the text graph.After filtering and aggregation in one feature space,synonyms embedded of label information can highlight the category representation.Finally,the text graph is input into the graph convolution neural network to learn new feature.Both the learned new feature Both the learned new feature and the features from the pre-training model can improve the classification accuracy of short texts and the generalization ability of the whole model.We choose four short text datasets such as MR,web_snippets,R8 and R52,to evaluate the performance of our proposed algorithm and fourteen benchmark models.The experimental results show that the proposed model in this paper is superior to others in classification accuracy,recall ratio and F1-score.

外文关键词：

short textsemantics of labelfeature spacegraph convolution networkpre-training model

作者：

张灵、李荣臻、郑苏

展开 >

作者单位：

广东工业大学计算机学院, 广东广州 510006

宁夏大学教育学院, 宁夏银川 750001

关键词：

短文本标签语义特征空间图卷积网络预训练模型

基金：

广东省交通运输厅科技项目

项目编号：

科技-2016-02-030

出版年：

2024

DOI：

10.12052/gdutxb.220132

广东工业大学学报

广东工业大学

广东工业大学学报

影响因子：0.628

ISSN：1007-7162

年,卷(期)：2024.41(1)

参考文献量6