首页|基于异构图神经网络的半监督网站主题分类

基于异构图神经网络的半监督网站主题分类

扫码查看
互联网网站数量快速增长使现有方法难以准确分类特定网站主题,如基于 URL 的方法无法处理未反映在URL中的主题信息,基于网页内容的方法受到数据稀疏性和语义关系捕捉的限制。为此,提出一种基于异构图神经网络的半监督网站主题分类方法 HGNN-SWT。该方法不仅利用网站文本特征来弥补仅使用URL特征的不足,还利用异构图对网站文本和词语的稀疏关系进行建模,通过处理图中的节点和边关系来提高分类性能。同时引入基于随机游走的邻居节点采样方法,考虑节点的局部特征和全局图结构,并提出特征融合策略,捕捉网站文本数据的上下文关系和特征交互。通过在自制的 Chinaz Website数据集上的实验,证明了 HGNN-SWT方法在网站主题分类任务中相较于现有方法具有更高的准确率。
Semi-supervised website topic classification based on hetero-geneous graph neural network
The rapid growth of the number of Internet websites has made existing methods challeng-ing to accurately classify specific website topics.URL-based methods,for example,struggle to handle topic information not reflected in the URL,while content-based methods face limitations due to data sparsity and challenges in capturing semantic relationships.To address this,a semi-supervised website topic classification method,HGNN-SWT,based on a heterogeneous graph neural network,is proposed.This method not only utilizes website text features to complement the limitations of using only URL fea-tures but also models sparse relationships between website text and words using a heterogeneous graph,improving classification performance by handling node and edge relationships within the graph.The ap-proach introduces a neighbor node sampling method based on random walks,considering both local fea-tures and the global graph structure of nodes.Additionally,a feature fusion strategy is proposed to cap-ture contextual relationships and feature interactions within website text data.Experimental results on a self-created Chinaz Website dataset demonstrate that HGNN-SWT achieves higher accuracy in website topic classification compared to existing methods.

website topicheterogeneous graph neural networksemi-supervisedfeature fusion

王谢中、陈旭、景永俊、王叔洋

展开 >

北方民族大学计算机科学与工程学院,宁夏 银川 750000

北方民族大学电气信息工程学院,宁夏 银川 750000

网站主题 异构图神经网络 半监督 特征融合

宁夏回族自治区重点研发项目中央高校基本科研业务费专项北方民族大学

2023BDE020172022PT_S04

2024

计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
年,卷(期):2024.46(4)
  • 27