Extract Information Map reduce Distributed Computing MapReduce A computing task is parallelized by distributing data onto multiple worker nodes The dataset cannot be stored on a single physical node Data is stored local to the compute process 29