MapReduce V/S Spark
- kumarnitinkarn
- Sep 9, 2019
- 2 min read
Those who do not have any exposure to MapReduce can skip this article and start from upcoming one.
(1) Performance in terms of execution speed :
No need to mention, Spark provides 10x faster performance when compared to Mapreduce.
(2) Mode :
Spark Could be used for batch, interactive, streaming batch precessing.
MapReduce can process data in batch mode.
(3) Resources requirements :
Spark provides in-memory computation, results in faster performance but this comes with a cost of better primary memory.
MapReduce processing involves I/O operations from disk that cause degraded performance.
Both could be run on commodity hardware.
(4) Ease of use :
Spark could be implemented in Java/Python/Scala and also includes SparkSQL for traditional RDBMS users.
MapReduce needs developers with understanding of JAVA which make it difficult for coders to opt.
(5) Compatibility :
Spark could be easily integrated with HDFS, S3 or any other file system and with many No SQL database as well.
MapReduce also has many options when it comes to storage but default and promoted storage is HDFS.
(6) Security
Spark is a bit bare at the moment when it comes to security. Authentication is supported via a shared secret, the web UI can be secured via javax servlet filters, and event logging is included. Spark can run on YARN and use HDFS, which means that it can also enjoy Kerberos authentication, HDFS file permissions and encryption between nodes.
Hadoop MapReduce can enjoy all Hadoop Security benefits and integrate with Hadoop security projects, like Knox Gateway and Sentry. Project Rhino, which aims to improve Hadoop’s security, only mentions Spark in regards to adding Sentry support. Otherwise, Spark developers will have to improve Spark security themselves.
Verdict :
Spark has excellent performance and is highly cost-effective thanks to in-memory data processing. It’s compatible with all of Hadoop’s data sources and file formats, and thanks to friendly APIs that are available in several languages, it also has a faster learning curve. Spark even includes graph processing and machine-learning capabilities.
Comments