Page 87 - Multicloud Workshop

Page 87 - Multicloud Workshop - Prework

P. 87

Map reduce

Distributed Computing

MapReduce is a programming model
MapReduce for data processing and generating
•
A computing task is parallelized large data sets with a parallel,
• distributed algorithm on a cluster. In
by distributing data onto the first step a worker node applies

multiple worker nodes the map() function to the local data
producing output data. Then the
The dataset cannot be stored
• output data is reshuffled so that data
on a single physical node that belongs to one key is located on

Data is stored local to the the same worker node. Now the
• worker nodes can process each group
compute process of output data per key in parallel.

82 83 84 85 86 87 88 89 90 91 92