首页|一种Spark GraphX框架下的关键词抽取方法

一种Spark GraphX框架下的关键词抽取方法

Keyword Extraction Method Based on Spark GraphX Framework

扫码查看
TextRank算法根据文本词语的位置关系构造图,应用图排序的算法计算出词语的权重,在计算过程中需要进行大量的迭代运算,在数据规模较大的时候,计算时间尤为可观.针对此问题,提出了一种基于Spark GraphX的关键词抽取方法,利用Spark GarpX所提供的分布式计算的图框架,将文本图数据分布式存储在不同的节点上,高效地实现了文本关键词的抽取.实验表明,本文中提出的基于Spark GraphX的关键词抽取方法,不仅计算时间短,抽取的关键词与人工标注的结果非常接近,具有一定的合理性.
The graph is constructed based on the positional relationship of the text words in textrank algorithm, and the the weight of words is calculated by using the algorithm of graph sorting. A lot of iterative operations are needed in the computing process, When the size of the data is large, the calculation time is particularly considerable. To solve this problem, A method of keyword extraction based on Spark GraphX is proposed. Using the graph framework of distributed computing provided by Spark GarpX, the text graph data is distributed on different nodes, and the text keyword extraction is efficiently realized. The result of experiments shows that automatic scoring method in the paper is more approximation to manual scoring. Therefore, the method has certain reasonableness. The key word extraction method based on Spark GraphX proposed in this paper is not only short in computation time, but also very close to the result of artificial annotation. and the experimen results showthat the method has a certain rationality and feasibility.

Spark GraphXkey words extractiongraph sortingword weight

程传鹏

展开 >

中原工学院 计算机学院, 郑州 450007

Spark GraphX 关键词提取 图排序 词语权重

河南省科技厅科技攻关项目河南省高等学校重点科研项目

172102210594 资助17A520066 资助

2019

小型微型计算机系统
中国科学院沈阳计算技术研究所

小型微型计算机系统

CSTPCDCSCD北大核心
影响因子:0.564
ISSN:1000-1220
年,卷(期):2019.40(2)
  • 2
  • 6