AbanteCart Community

AbanteCart Development => Customization help => Topic started by: junkyard on October 31, 2013, 05:10:42 PM

Title: Googlebot controversy
Post by: junkyard on October 31, 2013, 05:10:42 PM
Googlebot was detected to create rather high and excessive load on site.
It appears to crawl several thousand files with images and hundreds of phps of the cart so many times a day:

http://www.google.com/bot.html
"For most sites, Googlebot shouldn't access your site more than once every few seconds on average. However, due to
network delays, it's possible that the rate will appear to be slightly higher over short periods.In general, Googlebot
should download only one copy of each page at a time. If you see that Googlebot is downloading a page multiple times,
it's probably because the crawler was stopped and restarted. "

We think we have to block it from crawling certain files\pages using a robots.txt file approach:
https://support.google.com/webmasters/answer/93708
https://support.google.com/webmasters/answer/156449

Is there any recommendation as to what Cart's directories could be blocked from crawling without
doing harm for the products to be found on web using google?   Thank you
Title: Re: Googlebot controversy
Post by: abantecart on November 01, 2013, 07:36:40 PM
You do not need to expose a lot to search engines to crawl.

Only index.php on main web directory, image and resources directories should be open to search engines.
At some case you might want to open extensions directory if you have some web resources in some extension.