top of page

SPARK KNACK

Naturally Curious

Home: Welcome
Search

SparkContext v/s SparkSession - Deep Dive

Prior to spark 2.0, Spark Context was the entry point of any spark application and used to access all spark features. The spark driver...

Spark Performance Tuning II : persist() and cache()

When we persist a RDD, each node stores any partitions of it that it computes in memory and reuses them in other actions on that dataset....

Deployment Modes in Spark

Deployment mode tells where the driver program will be running. Note : The spark driver is the program that declares the transformations...

RDD Actions with examples

Transformation creates RDD for other RDD/RDDs but the result is not computed until we trigger an action. When we a trigger an action,...

Transformations - Other important ones

So far we have studied two frequently used transformations map() and flatMap(), below are some other important transformations that one...

RDD Transformations - map() v/s flatMap()

We have already studied about the transformations and its basics. Here in this section we will study two very important and frequently...

Partitions - Internals of Spark

Single liner - Partition is the unit of achieving parallelism in Spark. When we apply a transformation on a RDD, the transformation is...

Creating RDD - Basics

There are two ways by which you can create RDD : By using parallelize() : You can create RDD from existing collection using parallelize()...

RDD - Fundamental Data Structure of Spark

Resilient Distributed Dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated...

MapReduce V/S Spark

Those who do not have any exposure to MapReduce can skip this article and start from upcoming one. (1) Performance in terms of execution...

Why Spark ?

Apache Spark is open source, general-purpose distributed computing engine used for processing and analyzing a large amount of data....

Home: Blog2
Home: Subscribe

©2019 by Spark knack. Proudly created with Wix.com

bottom of page