What happens when you submit a Spark application?

Joydip Nath
2 min readJul 3, 2022

When we submit a spark application using spark-submit script

spark-submit –master <Spark master URL> --driver-memory 1g --executor-memory 1g --executor-cores 2 --py-files file1.py,file2.py,file3.zip,file4.egg wordByExample.py
Spark Architecture

Step by step process ignition:

  1. It creates a Driver Program in any of the edge nodes or data node:
    Edge Node are the nodes where we submit our spark application in real time.
    Functionality of Driver Program
    The driver process runs your main() function, sits on a node in the cluster. The driver process is absolutely essential — it’s the heart of a Spark Application and maintains all relevant information during the lifetime of the application and is responsible for three things:
    (a) Maintaining information about the Spark Application.
    (b) Responding to a user’s program or input.
    (c ) Analyzing, distributing, and scheduling work across the executors.
    (d) Helps in creating Lineage, Logical Plan and Physical Plan.
  2. The Driver Program creates a Spark Context which is a entry point to access any cluster manager e.g Yarn, Mesos etc
  3. The Driver then negotiate with the Cluster manager to get the resource to launch the executor.
    The executors are responsible for actually carrying out the work that the driver assigns them. This means that each executor is responsible for only two things:
    (a) Executing code assigned to it by the driver.
    (b) Reporting the state of the computation on that executor back to the driver node.
  4. The driver program will divide the submitted code as sub-job, stages and Task. This is called a DAG preparation (Logical DAG and Physical DAG).
  5. Once the Physical plan is created, the Driver then distribute and schedule task on executors.
  6. Driver program always monitors these tasks that are running on the executors till the completion of job
  7. Executors upon completing the task will send the results back to the Driver Program.
  8. After the job is completed or upon execution of stop() , the Driver Program then frees the allocated resource.
Spark Architecture Image (bigdatainterview.com)
Spark Architecture

Reference

[1] Spark The Definitive Guide Big Data Processing Made Simple by Bill Chambers, Matei Zaharia

--

--