This is the third article of a four-part series about Apache Spark on YARN. Apache Spark allows developers to run multiple tasks in parallel across machines in a cluster, or across multiple cores on a desktop. A partition, or split, is a logical chunk of a distributed data set. Apache Spark builds a Directed Acyclic Graph (DAG) with jobs, stages, and tasks for the submitted application. The number