Initially, I was planning to come up with at least one blog post per week, however due to work pressure and of course laziness kept me away from blogging.
As defined by Intel Co-founder Moore, no. of transistors that can be placed in IC is increasing exponentially, i.e. doubling the capacity approximately every two years. Ref: Moore’s Law
In fact this blog topic should be renamed to Quad Core; even we can get 8-core desktop however still these are not widely adopted. Do you think just by running our application is dual core processor will run twice as fast as in Single Core processor? If your answer is “Yes” then your assumption is wrong.
We are infected by variation of Parkinson’s Law i.e. “Software expands to nullify the performance gain from processor”. Ten years ago we have to manually save the file frequently & spell check the documents but now we don’t have to do it thanks to Hyper threading
Most us are accustomed with writing code that can utilize only one processor, no matter how much core we have in our desktop. To be precise we don’t how to write software that can utilize all the cores effectively, writing multi-threaded programs is really hard and writing multi-threaded programs that is scalable across multiple cores is even harder.
MapReduce is a programming model implemented by Google to facilitate processing of huge data set. This name MapReduce is taken from LISP a functional programming language, this programming paradigm is different from the style what we use it. Imperative Programming
Users specify Map function that processes Key/Value pair to generate intermediate Key/Value pairs and Reduce function merges all intermediate values with the associated keys. Oops isn’t it looking weird? Hold on for a while I will try to explain it in detail.

- Input files are split and assigned to different Worker (can be a different computer or different core in the same computer).
- Map phase takes a series of key/value pairs, processes each, and generates zero or more output key/value pairs.
- Reduce phase calls the applications reduce function iterate through the values that are associated with that key and output 0 or more key/value pairs.
- Output from each Reduce call is merged to one file to produce the results.
I am sure still it is not very clear to everyone, so I will try to explain this feature with one simple example. Let us take an employee database, at some point of time we all get salary hike
for simplicity let us assume that all employees are in same grade and hike is also same. If you carefully examine the problem statement you see a pattern?
Yes, we are going to do the same task irrespective of no. of records in the database, i.e. increase the salary so it doesn’t matter in what order we are going to process, we can run forward/backward, now if we split the i/p into two halves and assign it two cpu’s/cores now the operation is twice as fast imagine if we solve this problem using cluster of computers that will be shit load faster than doing it in single CPU this is what is MapReduce all about. In fact lot of real world problems can be expressed in this pattern.
MapReduce is entirely written in C & C++ and is extensively used by all Google applications the size of data is roughly 1.2 lakhs record described in the above example is trivial where as Google process Petabyte information and of course the whole system is fault tolerant. I would say that this is a key differentiator between Microsoft and Yahoo to maintain its number one position.
For those who are interested in further reading please direct your browser to Map Reduce and for j2EE folks there is JAVA implementation of MapReduce programming model named as Hadoop , further information available at
Hadoop Tutorial which is being aggressively promoted by Yahoo.
Author is Saravanan R.