首页|大数据处理模型Apache Spark研究

大数据处理模型Apache Spark研究

扫码查看
Apache Spark是当前流行的大数据处理模型,具有快速、通用、简单等特点。Spark是针对MapReduce在迭代式机器学习算法和交互式数据挖掘等应用方面的低效率,而提出的新的内存计算框架,既保留了MapReduce的可扩展性、容错性、兼容性,又弥补了MapReduce在这些应用上的不足。由于采用基于内存的集群计算,所以Spark在这些应用上比MapReduce快100倍。介绍Spark的基本概念、组成部分、部署模式,分析Spark的核心内容与编程模型,给出相关的编程示例。
Research on Apache Spark for Big Data Processing
Apache Spark is a popular model for large scale data processing at present, which is fast, general and easy. Compared with the MapRe-duce computing framework, Spark is efficient in iterative machine learning algorithms and interactive data mining applications while re-taining the compatibility, scalability and fault-tolerance of MapReduce. With its in-memory computing, Spark is up to 100x faster than Hadoop MapReduce in memory. Presents the basic conception, component and the deploying mode of Spark, introduces the internal ab-straction and the programming model, gives the programming examples.

SparkHadoopMapReduceBig DataData Analysis

黎文阳

展开 >

四川大学计算机学院,成都 610065

Spark Hadoop MapReduce 大数据 数据分析

2015

现代计算机(普及版)
中山大学

现代计算机(普及版)

影响因子:0.202
ISSN:1007-1423
年,卷(期):2015.(3)
  • 31
  • 8