基于混合von Mises-Fisher分布的双向对抗神经主题模型

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：主题模型作为一种文本分析工具,能够自动地从文本数据中挖掘潜在的主题或语义信息.然而,已有的主题模型常假设不适当的先验且难以利用外部语义知识提高主题的质量,导致主题一致性不足.针对这些局限,提出一种基于混合von Mises-Fisher(vMF)分布的双向对抗神经主题模型.该模型通过编码器执行主题推断,同时为向主题建模过程中引入外部语义知识,提出在生成器网络中将主题建模为词嵌入空间的混合vMF分布,判别器网络被训练用于识别真假样本.在4个公开文本语料的实验结果表明,与其他基准主题模型相比,所提模型获得了更高的主题一致性.此外,当基于提取的主题进行文本聚类实验时,所提模型有效提高了文本聚类的准确率.

外文标题：A bidirectional adversarial neural topic model based on mixed von Mises-Fisher distributions

外文摘要：Topic models serve as a textual analysis tool that automatically mine latent topics or semantic information from textual data.However,existing topic models often assume inappropriate priors and struggle to leverage external semantic knowledge to enhance the quality of topics,resulting in insufficient topic coherence.Targeting these limitations,this paper proposes a bidirectional adversarial neural topic model based on mixed von Mises-Fisher(vMF)distributions.This model performs topic inference through an encoder while introducing external semantic knowledge into the topic modeling process.Specifically,it suggests modeling topics as mixed vMF distributions in the word embedding space within the generator network,and a discriminator network is trained to distinguish between real and fake samples.Experimental results on four public text corpora show that the proposed model achieves higher topic coherence compared to other baseline topic models,and effectively improves the accuracy on text clustering experiments based on extracted topics.

外文关键词：

topic modeladversarial trainingtext miningneural networkvon Mises-Fisher(vMF)distribution

作者：

王睿、王延安、李子昂、孙国梓

展开 >

作者单位：

南京邮电大学计算机学院,江苏南京 210023

南京邮电大学江苏省无线传感网高技术研究重点实验室,江苏南京 210023

关键词：

主题模型对抗训练文本挖掘神经网络 von Mises-Fisher分布

出版年：

2024

DOI：

10.14132/j.cnki.1673-5439.2024.06.009

南京邮电大学学报(自然科学版)

南京邮电大学

南京邮电大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.486

ISSN：1673-5439

年,卷(期)：2024.44(6)