Spark Context/Spark Session : 'The entry point to Spark programming'

kumarnitinkarn
Sep 11, 2019
2 min read

Before diving deep in Spark Context and Spark Session, understand the basic difference between the two - Spark session is a unified entry point of a spark application from Spark 2.0.

Now Let's start :

Spark Context :

A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster(Don't worry will study accumulators and broadcast variables in later posts).

In order to create a SparkContext you should first create a SparkConf. The SparkConf stores

configuration parameters that your Spark driver application will pass to SparkContext.

Some of these parameters define properties of your Spark driver application and some are

used by Spark to allocate resources on the cluster. Such as, the number, memory size and cores uses by the executors running on the workernodes. setAppName() gives your Spark driver application a name so you can identify it in the Spark or Yarn UI.

Let's see how to create a SparkConf :

import org.apache.spark.SparkConf

val conf = new SparkConf().setAppName("FirstSparkDriverApp").setMaster("spark://master:7077").set("spark.executor.memory", "4g")

So, now we have SparkConf, we can use it to create SparkContext

import org.apache.spark.SparkContext

val sc = new SparkContext(conf)

Those who has interest in the API part that how sparkContext is implemented can continue reading :

Constructors of SparkContext:

SparkContext()

Create a SparkContext that loads settings from system properties

SparkContext(SparkConf config)

SparkContext(String master, String appName, SparkConf conf)

Alternative constructor that allows setting common Spark properties directly

SparkContext(String master, String appName, String sparkHome, scala.collection.Seq<String> jars, scala.collection.Map<String,String> environment)

Alternative constructor that allows setting common Spark properties directly

To refer all the methods in SparkContext provided in the spark documentation refer (Skip in initial stages of learning):

https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/SparkContext.html

Spark Session :

Spark 2.0 onwards, we get spark session packaged with the underlying spark functionality.

Let's see how easy it is to create Spark Session without creating spark conf first(unlike Spark Conf).

/ Create a SparkSession. No need to create SparkContext

// You automatically get it as part of the SparkSession

val spark = SparkSession

.builder() //to obtain builder object

.appName("SparkSessionZipsExample") //unique name of your application

.config("spark.executor.memory", "4g") // set all the configurations here

.enableHiveSupport() //to tell spark session to support hive

.getOrCreate() //get or create spark session

Now you have "spark" object to play around with the spark functionality.

Let's dive deep in some API part ;)

In environments, where Spark Session is already created (like repl,notebooks etc), use the builder to get the session :

SparkSession.builder().getOrCreate()

To create a new session (though already explain) :

SparkSession.builder

.master("local")

.appName("Word Count")

.config("spark.some.config.option", "some-value")

.getOrCreate()

Once created, SparkSession allows for creating a DataFrame (based on an RDD or a Scala Seq), creating a Dataset, accessing the Spark SQL services ,executing a SQL query, loading a table etc.

Spark Context versus Spark Session :

Spark Session ----> (SparkContext + StreamingContext + SqlContext)

> Spark Session is a unified version of Spark Context, so there is no need to create separate Contexts in order to use APIs of SQL, HIVE, and Streaming.

Spark Context/Spark Session : 'The entry point to Spark programming'

Recent Posts

Comments