top of page

SPARK KNACK

Naturally Curious

Home: Welcome
Search

Deployment Modes in Spark

Deployment mode tells where the driver program will be running. Note : The spark driver is the program that declares the transformations...

RDD Actions with examples

Transformation creates RDD for other RDD/RDDs but the result is not computed until we trigger an action. When we a trigger an action,...

Transformations - Other important ones

So far we have studied two frequently used transformations map() and flatMap(), below are some other important transformations that one...

Partitions - Internals of Spark

Single liner - Partition is the unit of achieving parallelism in Spark. When we apply a transformation on a RDD, the transformation is...

Creating RDD - Basics

There are two ways by which you can create RDD : By using parallelize() : You can create RDD from existing collection using parallelize()...

RDD - Fundamental Data Structure of Spark

Resilient Distributed Dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated...

MapReduce V/S Spark

Those who do not have any exposure to MapReduce can skip this article and start from upcoming one. (1) Performance in terms of execution...

Why Spark ?

Apache Spark is open source, general-purpose distributed computing engine used for processing and analyzing a large amount of data....

Home: Blog2
Home: Subscribe

©2019 by Spark knack. Proudly created with Wix.com

bottom of page