首页|基于LDA-BERT相似性测度模型的文本主题演化研究

基于LDA-BERT相似性测度模型的文本主题演化研究

扫码查看
文章针对LDA主题模型在提取文本主题时忽略文本语义关联的问题,提出基于LDA-BERT的相似性测度模型:首先,结合利用TF-IDF和TextRank方法提取文本特征词,利用LDA主题模型挖掘文本主题;其次,通过嵌入BERT模型,结合LDA主题模型构建的主题—主题词概率分布,从词粒度层面表示主题向量;最后,利用余弦相似度算法计算主题之间的相似度.在相似性测度模型基础上构建向量相似度指标分析文献研究主题之间的关联,并绘制主题演化知识图谱.通过智慧图书馆领域的实证研究发现,使用LDA-BERT模型计算出的主题相似度结果相较于LDA主题模型的计算结果更加准确,与实际情况更相符.
Topic Evolution Research Based on LDA-BERT Similarity Measure Model
Aiming at the problem that the traditional LDA topic model ignores the semantic correlation when extracting text topics,this paper proposes a similarity measure model based on LDA-BERT.Firstly,by combining TF-IDF and TextRank methods,text feature words are extracted and text topics are mined using LDA model.Secondly,by embedding BERT model and combining LDA topic model,the probability distribution of subject-subject words is constructed to represent the topic vector from the level of word granularity.Finally,cosine similarity algorithm is used to calculate the similarity between subjects.Based on the similarity measure model,the vector similarity index was constructed to analyze the correlation between literature research topics,and the knowledge map of topic evolution was drawn.The empirical research was carried out in the field of smart library.It is found that the results calculated by the LDA-BERT model are more accurate than those of the topic LDA model calculations,and more consistent with the actual situation.

Similarity measureLDA-BERT modelLDA modelBERT modelTheme evolution

海骏林峰、严素梅、陈荣、李建霞

展开 >

华东理工大学科技信息研究所

华东理工大学科技信息研究所,上海,200237

相似性测度 LDA-BERT模型 LDA模型 BERT模型 主题演化

2024

图书馆工作与研究
天津图书馆 天津市图书馆学会 天津市少年儿童图书馆

图书馆工作与研究

CHSSCD北大核心
影响因子:1.326
ISSN:1005-6610
年,卷(期):2024.(1)
  • 24