Wednesday, April 6, 2016

MapReduce. What is it?

MapReduce is compound from two main steps:

Map step (input files)
Map step is the process of going through the unformatted data and generating a series of key-value pairs.

Shuffle intermediate step (processing of the data of input files)
All of the values for a given key are collated into separated piles (common keys go to one folder).

Reduce step (output files)
Producer Node will count tally for each key (from all data values of that key is a folder)


No comments: