现代计算机(普及版)2015,Issue(3) :55-60.DOI:10.3969/j.issn.1007-1423.2015.08.013

大数据处理模型Apache Spark研究

Research on Apache Spark for Big Data Processing

黎文阳
现代计算机(普及版)2015,Issue(3) :55-60.DOI:10.3969/j.issn.1007-1423.2015.08.013

大数据处理模型Apache Spark研究

Research on Apache Spark for Big Data Processing

黎文阳1
扫码查看

作者信息

  • 1. 四川大学计算机学院,成都 610065
  • 折叠

摘要

Apache Spark是当前流行的大数据处理模型,具有快速、通用、简单等特点。Spark是针对MapReduce在迭代式机器学习算法和交互式数据挖掘等应用方面的低效率,而提出的新的内存计算框架,既保留了MapReduce的可扩展性、容错性、兼容性,又弥补了MapReduce在这些应用上的不足。由于采用基于内存的集群计算,所以Spark在这些应用上比MapReduce快100倍。介绍Spark的基本概念、组成部分、部署模式,分析Spark的核心内容与编程模型,给出相关的编程示例。

Abstract

Apache Spark is a popular model for large scale data processing at present, which is fast, general and easy. Compared with the MapRe-duce computing framework, Spark is efficient in iterative machine learning algorithms and interactive data mining applications while re-taining the compatibility, scalability and fault-tolerance of MapReduce. With its in-memory computing, Spark is up to 100x faster than Hadoop MapReduce in memory. Presents the basic conception, component and the deploying mode of Spark, introduces the internal ab-straction and the programming model, gives the programming examples.

关键词

Spark/Hadoop/MapReduce/大数据/数据分析

Key words

Spark/Hadoop/MapReduce/Big Data/Data Analysis

引用本文复制引用

出版年

2015
现代计算机(普及版)
中山大学

现代计算机(普及版)

影响因子:0.202
ISSN:1007-1423
被引量31
参考文献量8
段落导航相关论文