2009-09-16

Goole and reCaptcha: expanding Google's Cloud Brain


You've heard of Google Translate, right?

You know how it works? It learns:
Our system takes a different approach: we feed the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model. We've achieved very good results in research evaluations.
Heard of Google 411? Why would Google get into the 411 business? So it can learn. Read:
So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we're trying to get the voice out of video, we can do it with high accuracy.
So Google bought reCaptcha? Why? I'm betting it's the same reason.

It's not about finding one translation problem with one blurry word and fixing it. It's about learning how to read any and all blurry words. So it'll need humans to intervene less and less.

This is pure brilliance on Google's part: humans providing input--doing valuable work--and getting a valuable service.

And Mountain View silently reaps massive amounts of intelligence.

Google is throwing a lot of code over the fence--Java libraries, Chrome, Android--mocking the business model of traditional software companies.

And they don't care. Because in the end, they have a world-class, next-to-impossible-to-replicate cloud brain.

And it's getting smarter with every search, every new web page, every Goo411 call, and now, every new "You're a human, right?" verification.

Cool...and a bit scary.