Skipped Stages Explanation

"Skipped Stages," as they're called in the Spark Admin UI, happen when the output of a Stage has already been computed - and stored in the local work/ directory - the output of the Stage.

This usually happens when re-running a Job with the same mapping of partition-to-Executor as the previous run.

This data is not guaranteed to be there, however, so Skipped Stages are not deterministic.

 

Have more questions? Submit a request

Comments