一种Spark GraphX框架下的关键词抽取方法

Keyword Extraction Method Based on Spark GraphX Framework

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：TextRank算法根据文本词语的位置关系构造图,应用图排序的算法计算出词语的权重,在计算过程中需要进行大量的迭代运算,在数据规模较大的时候,计算时间尤为可观.针对此问题,提出了一种基于Spark GraphX的关键词抽取方法,利用Spark GarpX所提供的分布式计算的图框架,将文本图数据分布式存储在不同的节点上,高效地实现了文本关键词的抽取.实验表明,本文中提出的基于Spark GraphX的关键词抽取方法,不仅计算时间短,抽取的关键词与人工标注的结果非常接近,具有一定的合理性.

外文摘要：The graph is constructed based on the positional relationship of the text words in textrank algorithm, and the the weight of words is calculated by using the algorithm of graph sorting. A lot of iterative operations are needed in the computing process, When the size of the data is large, the calculation time is particularly considerable. To solve this problem, A method of keyword extraction based on Spark GraphX is proposed. Using the graph framework of distributed computing provided by Spark GarpX, the text graph data is distributed on different nodes, and the text keyword extraction is efficiently realized. The result of experiments shows that automatic scoring method in the paper is more approximation to manual scoring. Therefore, the method has certain reasonableness. The key word extraction method based on Spark GraphX proposed in this paper is not only short in computation time, but also very close to the result of artificial annotation. and the experimen results showthat the method has a certain rationality and feasibility.

外文关键词：

Spark GraphXkey words extractiongraph sortingword weight

作者：

程传鹏

展开 >

作者单位：

中原工学院计算机学院, 郑州 450007

关键词：

Spark GraphX 关键词提取图排序词语权重

基金：

河南省科技厅科技攻关项目河南省高等学校重点科研项目

项目编号：

172102210594 资助17A520066 资助

出版年：

2019

小型微型计算机系统

中国科学院沈阳计算技术研究所

小型微型计算机系统

CSTPCDCSCD北大核心

影响因子：0.564

ISSN：1000-1220

年,卷(期)：2019.40(2)

被引量2
参考文献量6