Yesterday CNet reported that Dell’s future laptop specifications were accessible by the general public through Google. Dell learned the hard way, but you don’t have to. There is a very easy way to block the GoogleBot from crawling and indexing parts of your website. I’m sure we all heard about a robot.txt file, but do you use it? If you have some sensitive information on your server that you don’t want search engines to index, a robot.txt file with instructions for spiders is the best way to do it. Here is what you do:
1. Create a file called robot.txt.
2. There are just two lines you need to know: “User-agent” and “Disallow”. “User-agent” is where you specify which spiders you would like to keep out and the “disallow” line tells that spider which directories not to crawl. So here is what the content of your robot.txt file looks like if you block all spiders:
User-agent: * Disallow: /
And here is how it looks if you block GoogleBot from crawling your images folder:
User-agent: * Disallow: /images/
User-agent: * Disallow: /forum/ Disallow: /images/
3. Save the file and upload it to your web server’s root directory.
That’s it. This will keep your sensitive data safe from the general public.