This blog aims to dig into the different cluster management modes in which you can run your Spark application.
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program, which is called the Driver Program. Specifically, to run on a cluster, SparkContext can connect to several types of Cluster Managers, which allocate resources across applications. Once the connection is established, Spark acquires executors on the nodes in the cluster to run its processes, does some computations, and stores data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.
No comments:
Post a Comment