首页|基于混合von Mises-Fisher分布的双向对抗神经主题模型

基于混合von Mises-Fisher分布的双向对抗神经主题模型

扫码查看
主题模型作为一种文本分析工具,能够自动地从文本数据中挖掘潜在的主题或语义信息.然而,已有的主题模型常假设不适当的先验且难以利用外部语义知识提高主题的质量,导致主题一致性不足.针对这些局限,提出一种基于混合von Mises-Fisher(vMF)分布的双向对抗神经主题模型.该模型通过编码器执行主题推断,同时为向主题建模过程中引入外部语义知识,提出在生成器网络中将主题建模为词嵌入空间的混合vMF分布,判别器网络被训练用于识别真假样本.在4个公开文本语料的实验结果表明,与其他基准主题模型相比,所提模型获得了更高的主题一致性.此外,当基于提取的主题进行文本聚类实验时,所提模型有效提高了文本聚类的准确率.
A bidirectional adversarial neural topic model based on mixed von Mises-Fisher distributions
Topic models serve as a textual analysis tool that automatically mine latent topics or semantic information from textual data.However,existing topic models often assume inappropriate priors and struggle to leverage external semantic knowledge to enhance the quality of topics,resulting in insufficient topic coherence.Targeting these limitations,this paper proposes a bidirectional adversarial neural topic model based on mixed von Mises-Fisher(vMF)distributions.This model performs topic inference through an encoder while introducing external semantic knowledge into the topic modeling process.Specifically,it suggests modeling topics as mixed vMF distributions in the word embedding space within the generator network,and a discriminator network is trained to distinguish between real and fake samples.Experimental results on four public text corpora show that the proposed model achieves higher topic coherence compared to other baseline topic models,and effectively improves the accuracy on text clustering experiments based on extracted topics.

topic modeladversarial trainingtext miningneural networkvon Mises-Fisher(vMF)distribution

王睿、王延安、李子昂、孙国梓

展开 >

南京邮电大学计算机学院,江苏南京 210023

南京邮电大学江苏省无线传感网高技术研究重点实验室,江苏南京 210023

主题模型 对抗训练 文本挖掘 神经网络 von Mises-Fisher分布

2024

南京邮电大学学报(自然科学版)
南京邮电大学

南京邮电大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.486
ISSN:1673-5439
年,卷(期):2024.44(6)