基于LDA和TF-IDF的关键词提取算法研究

Research on Keyword Extraction Algorithm Based on LDA and TF-IDF

苏婧琼 ¹苏艳琼²

扫码查看

作者信息

1. 晋中信息学院,山西晋中 030800
2. 山西大学,山西太原 030000
折叠

摘要

在自然语言处理领域,对于海量的文本文件,让用户在最短的时间找到到自己感兴趣的文档,最关键的工作是要每篇文档的关键词提取出来.而不管是针对一篇长文章或是一篇短文章,通常能够直接通过这几个关键字去窥探出整篇文章背后的主题思想.文章分别介绍了 LDA主题模型和TFIDF算法在关键词提取中的应用,并进行了对比,结果表明在关键词提取方面都可以取得较好的效果.

Abstract

In the field of natural language processing,for massive text files,the most crucial task for users to find the documents they are interested in in the shortest possible time is to extract the keywords from each document.Whether targeting a long article or a short article,it is usu-ally possible to directly explore the theme behind the entire article through these keywords.This article introduces the application of LDA topic model and TFIDF algorithm in keyword cx-traction,and compares them.The results show that good results can be achieved in keyword ex-traction.

关键词

LDA主题模型/TFIDF算法/关键词提取

Key words

LDA theme model/TFIDF algorithm/Keyword extraction

引用本文复制引用

基金项目

2022年山西省高等学校科技创新计划各类项目(2022L665)

出版年

2024

长江信息通信

湖北通信服务公司

长江信息通信

影响因子：0.338

ISSN：2096-9759

参考文献量7

段落导航