Page 87 - Multicloud Workshop - Prework
P. 87
Map reduce
Distributed Computing
MapReduce is a programming model
MapReduce for data processing and generating
•
A computing task is parallelized large data sets with a parallel,
• distributed algorithm on a cluster. In
by distributing data onto the first step a worker node applies
multiple worker nodes the map() function to the local data
producing output data. Then the
The dataset cannot be stored
• output data is reshuffled so that data
on a single physical node that belongs to one key is located on
Data is stored local to the the same worker node. Now the
• worker nodes can process each group
compute process of output data per key in parallel.
© 2016 Engage ESM All Rights Reserved