Deployment Modes in Spark
- kumarnitinkarn
- Jan 9, 2020
- 1 min read
Deployment mode tells where the driver program will be running.
Note : The spark driver is the program that declares the transformations and actions on RDDs of data and submits such requests to the master. Its location is independent of the master/slaves.
It is possible in two ways :
Driver program could run on any worker node inside the cluster, which is also known as Spark cluster mode. Secondly, driver program could run on an external client (the system from which job is submitted) , we call it as a client spark mode.
Spark Client Mode
Here, Driver program of spark job will run on the machine from which job is submitted.
When to use Client mode :
When job submitting machine is within or near to “spark infrastructure”. Since there is no high network latency of data movement for final result
generation between “spark infrastructure” and “driver”.
Spark Cluster Mode :
Here Driver program of spark job will not run on the local machine from which job is submitted but it will run on any of the worker node in cluster.
When to use cluster mode :
When job submitting machine is remote from “spark infrastructure”. Since, within “spark infrastructure”, “driver” component will be running.
Thus, it reduces data movement between job submitting machine and “spark infrastructure”.
Happy Learning !!
Comments