By Mark Ward
BBC News Online technology correspondent
|
If anyone knows how to get their webpage to top Google's search results it is Matt Cutts.
Google: Best known for its search page
|
Mr Cutts is one of a team at Google who help webmasters and website creators tweak their pages to ensure they are properly indexed by the search engine.
But ironically, says Mr Cutts, he does not have an extensive personal web presence that can take advantage of this insider knowledge.
All he has is a few pages dating from his college days that he says he does not regularly update.
Though, it must be said, they do appear top of any search for the name "Matt Cutts" on Google.
Search here
Mr Cutts says that Google works hard to ensure that most of the problems that webmasters encounter can be solved automatically via its help pages or using the tools it provides.
Given the huge number of webpages out there in cyberspace, Google indexes more than 4.2 billion, it is the only approach that will work.
"We have a philosophy of trying to develop things scalably," Mr Cutts told BBC News Online.
The reason it can do this is because of the huge technical resources that Google has built up since it started.
In 2003 Google spent $173m on its data centres and is expecting to spend about $250m in 2004.
It is not all work at Google
|
Although Google's senior technology folk have filed papers about how it does what it does, it has been reluctant to say just how many servers it owns and operates.
The estimates of how many machines it has in its datacentres range from 10,000 to 80,000.
This concentration of computer power could be addressing more than 6,000 terabytes of data.
In contrast to most other net firms, Google does not rely on these machines being reliable and all are based around cheap and easy to replace PC chips.
"The model of having a lot of machines and have them fail is a very powerful one," says Mr Cutts. "You have a small team replacing hard drives and it never affects the index."
Instead, he says, Google uses software to keep its search system reliable.
Google used to update its web index every month which, because it caused results to jump around a little, was dubbed the Google Dance.
But not anymore, says Mr Cutts.
"Within the last year we have improved out way of processing and indexing the web," he says. "You are not going to see Google dances."
"Now we crawl a percentage of the web everyday," he says, "so after a relatively small time frame we hit every page."
Bombs away
Google does not just have one copy of the entire web, it has several to help with reliability and ensure results are returned quickly.
Gmail is Google's webmail service
|
Also, says Mr Cutts, there are quite a few Googlers, as staff are called, that keep an eye on its web index and make sure it is accurate.
Even the software at the heart of the search engine is regularly tweaked to ensure that results are relevant.
"We work on algorithmic solutions to scalably handle problems," he says. "We look at ways not just solutions for particular incidents but entire classes of problems."
"You do not have to worry about the bias of the computer. It's a fair and equitable way to tackle it."
Attempts to catch out the indexing system and force results to the top of returned results, called Google bombs, only work on a very small scale, says Mr Cutts.
Even blogs, which tend to refer to each other a lot, do not trouble the indexing system.
"Blogs are not so much of a problem," says Mr Cutts. "They show up less often than you expect."
In some respects, running the search system is just a preparation for everything else Google wants to do.
"Once you have thousands of machines with all these capabilities it's a lot of fun to see what else you can do with them," he says.