Googlebot questions answered
This was posted the other day to Google’s webmaster blog regarding how googlebot spiders web sites and the use of robots.txt files:
If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the “down for maintenance” page?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.
What should I do if Googlebot is crawling my site too much?
You can contact us — we’ll work with you to make sure we don’t overwhelm your server’s bandwidth. We’re experimenting with a feature in our webmaster tools for you to provide input on your crawl rate, and have gotten great feedback so far, so we hope to offer it to everyone soon.
Is it better to use the meta robots tag or a robots.txt file?
Googlebot obeys either, but meta tags apply to single pages only. If you have a number of pages you want to exclude from crawling, you can structure your site in such a way that you can easily use a robots.txt file to block those pages (for instance, put the pages into a single directory).
If my robots.txt file contains a directive for all bots as well as a specific directive for Googlebot, how does Googlebot interpret the line addressed to all bots?
If your robots.txt file contains a generic or weak directive plus a directive specifically for Googlebot, Googlebot obeys the lines specifically directed at it.
For instance, for this robots.txt file:
User-agent: * Disallow: / User-agent: Googlebot Disallow: /cgi-bin/
Googlebot will crawl everything in the site other than pages in the cgi-bin directory.
For this robots.txt file:
User-agent: * Disallow: /
Googlebot won’t crawl any pages of the site.
google googlebotView related posts
- Google enhances webmaster tools
- Google Pagerank update in progress
- Answers gets older and dies
- Use domain privacy
- Google's trustiness will determine indexing
- Google checkout chucks out fees until 2007
- Google launches custom search engine
- Optimize for Google
- Adsense clickfraud worm
- More discussion on the dreaded Google sandbox
- Use hyphens (not underscores) in your URLs
- Domain loophole : slip through the google sandbox
- Minimize URL querystring parameters, don't use ID=
- Search for uprotected internet cams on Google
- A look at advertising with Google Adwords
- Google's white lie
- Cambrian House Feeds Google
- Google API is not accurate
- Google's webmaster guidelines
- Domain registration length
SEO Advice