3 Good Reasons Why Google Has Lost It On The Ajax Thing

4 comments

by Jaimie Sirovich & Yehuda Katz

Last month, Google proposed a rather technical solution to tackle the well-known AJAX crawlability problem. In short, we think Google has lost its mind. Their solution is very complex, and this is borne out not only by trying to read it as a developer, but also upon examining the responses.

First, we will look at the comments left by readers on the Webmaster Central blog. Comments are classified (anonymously) into three categories: Coherent, Confused, and Really Confused. The statistics are somewhat subjective, but the result does not bode well for the idea at all.

AJAX Proposal Confusion

Coherent — 10%

Confused — 50%

Really Confused — 40%

Even Google classified the blog entry itself as “Webmaster level: Advanced.” And it’s all downhill from there. There are three fundamental problems with the proposal:

1. It requires *you* to install something (a “headless” browser) on your server.

This is a lot of computation thrown at you in one short line of a spec. Webhosting simply can’t sustain this at $4.95/month.

Their suggestion to use an application called HtmlUnit implies that Java must be running in some form. That is not reasonable. We’re not aware of a browser emulator written in a language that can be readily installed. Ruby developers have tried, and it is non-trivial at best. We won’t comment on the low-end hosting ecosystem for PHP and ASP.NET except to say that it’s frequently even more oversold and under-supported.

This state of ridiculousness is compounded by the fact that Google could run the headless browser more easily and has several football fields full of computers to reduce the problem to a science.

2. Any document that refers to “escaping” is probably going to confuse everyone.

Most programmers don’t even escape things properly. Can we expect otherwise intelligent non-developers to get it right? Many, or even most, security problems stem from sloppy or mistaken approaches to escaping things. Oh, Google, the optimism!

Even if ‘_escaped_fragment_’ is only somewhat abstracted by the headless browser, and debugging problems go from frustrating to impossible, it is also entirely unclear to us how one can intercept the ‘_escaped_fragment_’ calls without a layered and more-complex application design. Let’s face it — most of the internet is thrown together and poorly implemented. It works anyway. This won’t work.

In a nutshell, this complicates design and frustrates non-advanced developers.

3. Even with this implementation, non-trivial modifications are necessary.

So it may not be easy to implement and handle things like _escaped_fragment_. Furthermore, what does _escaped_fragment_ even mean to a non-developer?

It is possible to create a Rails plugin to accomplish what Google is proposing, but it strikes us as a waste. Since most PHP applications are based on disparate or non-existent frameworks, the adaptation would be even more difficult for PHP developers. We cannot comment on ASP.NET, but the likelihood is that it will be non-trivial as well.

OK. That’s enough. The core issue at hand here is that Google is asking the wider world of Internet developers to solve a problem for Google. Google could choose to spider the internet and use “!” as a flag to spider such a link via its own headless browser, but telling us to do it is unreasonable and non-viable.

We’re firm believers that technology should solve problems in the human space, not create them. If the application is designed well, and unobtrusive JavaScript is applied with a bit of elbow grease, all of this can be avoided — the headless browser is beheaded, and the URLs are not subject to mutilation.

This all seems to take a hard problem out of the hands of really great developers with lots of machines into a huge number of lower-quality developers with weak machines. It does not scale well.

Lastly, most of this is already possible with some forethought and good design. JavaScript and jQuery (or similar) allows the programmer to “Ajaxify” a non-Ajax application in most cases. Thinking about URLs ahead of time as a developer is not a bad idea, either. In any case, if the programmer can understand what a headless browser is doing, what escaping is, etc., he can do it right now with the same talent in this non-proprietary way. There are plenty of opportunities to create poor designs and traps for spiders with either of these patterns regardless.

If the application cannot be designed this way, at a certain point, an application becomes a “real” application and probably should not be searchable anyway.

This is a very bad idea, Google. As technocrats, you must think these things all make sense; but you must keep the little people in mind. The internet is filled with the world’s technology-proletariats. Implementing this will probably have long-lasting effects much like the “SEO” of 2001. Those who get it right (and those will be few) will be visible, others will not be. As tech-savvy marketers, we may actually like that prospect, but it’s not in the interest of Google, or the internet as a whole.

———————————————

Yehuda Katz is a member of the Ruby on Rails core team, and lead developer of the Merb project. He is a member of the jQuery Core Team, and a core contributor to DataMapper. He contributes to many open source projects, like Rubinius and Johnson, and works on some he created himself, like Thor. He is the author of jQuery in Action, Ruby Core Developer; Engine Yard, and can be found at http://yehudakatz.com.

Jaimie Sirovich is a search marketing consultant. Officially Jaimie is a computer programmer, but he claims to enjoy marketing much more. At present, Jaimie is focused on helping clients sell everywhere, and achieve multi-channel integration with major websites such as eBay, Amazon, and even Craigslist. He is the author of Search Engine Optimization with PHP and can be found at http://www.seoegghead.com.

Add Your Comments

  • (will not be published)

4 Comments

  1. Another and better alternative to Google approach: ItsNat With ItsNat you develop a Single Page Interface (AJAX intensive) application and (almost) automatically the same is page based when JavaScript is disabled or ignored (like search engine crawlers see your site). Take a look: The Single Page Interface Manifesto http://itsnat.sourceforge.net/php/spim/spi_manifesto_en.php Single Page Interface Web Site With ItsNat http://itsnat.sourceforge.net/index.php?_page=support.tutorial.spi_site SPI web site online demo http://www.innowhere.com:8080/spitut/

  2. I think you have a bunch of misconceptions. Google does require you to have a headless browser. That is just a suggestion if you don't want to write html for each request. The way google does it is actually really simple, not confusing, and really the only way it can be done since browsers and spiders don't send text after # to servers. You just get the text after #! turned into _escaped_fragment_ get var... You parse it and add meta if you need. I think Facebook might do that too... or they might be just plainly putting it in as a get var key...

  3. It's a lot of redundant work to do so, however. Anything involving 'escaping' tends to cause problems for low-level programmers who don't know what a grammar is. There is no misconception on that. I predict Google AJAX will go the way of the dodo pretty soon, and if you do a quick Google search you'll see I'm not the only one who thinks so. I just said it a year or two ago.

  4. Justin

    A low-level programmer is just an advanced webmaster waiting to happen.