1- Catalyst Optimizer:

2- Cost-Based Optimizer:

3- Code Optimization:

1️⃣ Shuffle Optimization

Shuffles are expensive because they involve:

Common shuffle triggers:

join
groupBy
distinct
orderBy
repartition

Optimization techniques:

2️⃣ Partition Optimization