Tuesday, June 17, 2008

On Google IO, San Francisco

Back sometime around 1997 Mr. Brin and Mr. Page apparently told Mr. Bechtholsheim: „we would like to download the internet.“ Mr. Bechtholsheim laughed and signed the first check for 100.000$ to buy some servers for Google.

I remember back then SUN was big and anyone who could afford it, would buy some SUN servers. Only 100k$ didn't buy a lot of SUN’s back then.

So I assume they got the idea of building Google on consumer motherboards and hard drives, with the first Google FS on top of linux and mysql. It had to be cheap, so Windows and Oracle was out oft he question. This decision eventually lead to three distinct approaches contrary to what datacenter people and admins of mission critical hardware would advise in those days.

1. The software was free, no license fees attached.

2. Failure (of Hardware) is acknowledged as something that happens regularly

3. Redundancy is not bad after all (despite what you learned about normalization in college)

As we all know automating the handling of failure without disrupting service lead to the Google cloud, which today could be the largest (rumors go Google handles about 200 Petabytes of data today) and most robust database in the world.

On top they parallelized mysql with big table and map reduce, making it also the fastest database on the web. According to Jeff Dean around 1000 servers are hit in parallel whenever they receive a query. One half of those servers looks up the links, the other the documents and assembles the query with the text snipplets based on that query, the first 10 hits are returned in a quarter of a second.

So what do you do if you have the largest, fastest and most robust database in the world ? You apply the same principles that made you successful with hardware in the first place to software.

These are

1. Its free (open source) Google has a marginal cost of zero for additional query execution and hard drive space.

2. Redundancy – There will be a lot of projects around the Google offerings (api, app engine and google apps) that basically do the same thing.

3. Failure – With a lot of those open source projects being abandoned after a while, only a few will make it, but you only need a few big once like Google earth or Gmail.

Any project based on Google open source makes Google stronger.

According to an analysis by Don Dodge of Microsoft and Bradley Horowitz, of AltaVista (http://dondodge.typepad.com/the_next_big_thing/2008/06/social-networks-1-rule-or-the-community-pyramid.html), for every active contributor of a network-effect participation site only one percent oft he visitors actively contribute over a longer time. 10% chip in a little effort like commenting and the vast majority only consumes. That’s all you need to have a globally successful web service. According o him those numbers are consistent across Wikipedia, facebook and others.

If you assume that every Google developer is a member oft he 1% keeping the important projects alive, the open source community would be the 10% with a spill over into the 1%. That is a leverage on 1 to 100 for Google at zero cost to fuel the ad engine.

16000 developers at Google core

160000 developers working with it

1600000 consumers.

Google needs to fuel the ad engine so anything that serves ads will do.

You have two options to extend your reach. Build new channels for users to consume the offerings, that is what happens if you add mobile (android), offline capabilities and translation.

They also extend their reach within each channel with earth, apps, api, and all those google labs contenders. Now that Google has opened all the apis for read/write access, the community will do the permutations of coupling each service with each other service. The growth is exponential.

So if that bet wins, we can only imagine where Google is headed. If they can keep up their logistics and HDD latency with the upcoming flash drives I predict that the first functional AI on earth will rely on Google. So Mr. Kurzweil scores again.

No comments: