大数据处理模型Apache Spark研究

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：Apache Spark是当前流行的大数据处理模型，具有快速、通用、简单等特点。Spark是针对MapReduce在迭代式机器学习算法和交互式数据挖掘等应用方面的低效率，而提出的新的内存计算框架，既保留了MapReduce的可扩展性、容错性、兼容性，又弥补了MapReduce在这些应用上的不足。由于采用基于内存的集群计算，所以Spark在这些应用上比MapReduce快100倍。介绍Spark的基本概念、组成部分、部署模式，分析Spark的核心内容与编程模型，给出相关的编程示例。

外文标题：Research on Apache Spark for Big Data Processing

外文摘要：Apache Spark is a popular model for large scale data processing at present, which is fast, general and easy. Compared with the MapRe-duce computing framework, Spark is efficient in iterative machine learning algorithms and interactive data mining applications while re-taining the compatibility, scalability and fault-tolerance of MapReduce. With its in-memory computing, Spark is up to 100x faster than Hadoop MapReduce in memory. Presents the basic conception, component and the deploying mode of Spark, introduces the internal ab-straction and the programming model, gives the programming examples.

外文关键词：

SparkHadoopMapReduceBig DataData Analysis

作者：

黎文阳

展开 >

作者单位：

四川大学计算机学院，成都 610065

关键词：

Spark Hadoop MapReduce 大数据数据分析

出版年：

2015

DOI：

10.3969/j.issn.1007-1423.2015.08.013

现代计算机(普及版)

中山大学

现代计算机(普及版)

影响因子：0.202

ISSN：1007-1423

年,卷(期)：2015.(3)

被引量31
参考文献量8