in map/reduce, let’s say we have 100 iterations for each iteration, we will read and write the intermediate data of that specific iteration from/to the disk. which massively slows the overall process
in Spark, No read or write operations will occur until the completion of all the iterations.
How is that done? By making all the processes within the 100 iterations be done through the memory, not the disk. which makes it much faster than having to deal with the disk after each iteration
Spark works great with steaming data and generative data it’s called Unified Analytics