Google Processing 20,000 Terabytes A Day, And Growing

googleogo7.gifA recent white paper by some Google engineers puts some numbers around the massive amount of computation that Google does every day to index the Web, process search results, and serve up ads, among other things. As oflast September, Google was processing 20,000 terabytes of data (20 petabytes) a day. This large-scale computing capability is a big part of Google’s competitive advantage over Yahoo, Microsoft, and everyone else.

Niall Kennedy reports the breakdown of how Google’s large-scale computing has grown, and estimates that hardware cost for each large-scale computing job (known as MapReduce) is about $1 million. The number of such jobs grew nearly an order of magnitude (10X) between 2004 and 2006, and then another order of magnitude a year and half later. See the chart below:

google-mapreduce-chart.png