The purpose of short text clustering is to discover the semantic classes of data based on the distance in the representation space.In order to address the problem of high-dimensional sparsity of features caused by traditional text representation models for short texts and the problem of less research on Bert-based multi-feature short text clustering,this paper investigates a Bert-based dual-feature short text clustering model BCCA.firstly,Bert is used to obtain word vector representations;secondly,CNN networks are used to en-hance the extraction of local features and context-aware self-referencing.attention network to enhance the ability of global feature ex-traction.Finally,to further enhance the clustering effect,the text representation module is jointly trained with the clustering module to optimize both text representation and clustering.In order to verify the model performance,experiments are conducted on three datasets,and the experimental results show that the proposed model achieves 82.8%accuracy on the dataset SearchSnippets.
关键词
短文本聚类/双特征/语境感知/Bert/CNN
Key words
short text clustering/dual feature/context awareness/Bert/CNN