Data Locality in Hadoop/HDFS:

image.png


GROUP BY under the hood:

  1. Stage #1:

    image.png

  2. Stage #2:

    image.png


JOIN under the hood:

image.png