Please help us to make AbanteCart Ideal Open Source Ecommerce Solution for everyone.

Support AbanteCart eCommerce

Author Topic: Googlebot controversy  (Read 5168 times)

Offline junkyard

  • Full Member
  • ***
  • Posts: 127
  • Karma: +25/-0
    • View Profile
Googlebot controversy
« on: October 31, 2013, 05:10:42 PM »
Googlebot was detected to create rather high and excessive load on site.
It appears to crawl several thousand files with images and hundreds of phps of the cart so many times a day:

http://www.google.com/bot.html
"For most sites, Googlebot shouldn't access your site more than once every few seconds on average. However, due to
network delays, it's possible that the rate will appear to be slightly higher over short periods.In general, Googlebot
should download only one copy of each page at a time. If you see that Googlebot is downloading a page multiple times,
it's probably because the crawler was stopped and restarted. "

We think we have to block it from crawling certain files\pages using a robots.txt file approach:
https://support.google.com/webmasters/answer/93708
https://support.google.com/webmasters/answer/156449

Is there any recommendation as to what Cart's directories could be blocked from crawling without
doing harm for the products to be found on web using google?   Thank you

Offline abantecart

  • Administrator
  • Hero Member
  • *****
  • Posts: 4358
  • Karma: +298/-10
    • View Profile
    • Ideal Open Source Ecommerce Solution
Re: Googlebot controversy
« Reply #1 on: November 01, 2013, 07:36:40 PM »
You do not need to expose a lot to search engines to crawl.

Only index.php on main web directory, image and resources directories should be open to search engines.
At some case you might want to open extensions directory if you have some web resources in some extension.
Please  rate your experience or leave your review
We need your help to build better free open source ecommerce platform for everyone. See how you can help

 

Powered by SMFPacks Social Login Mod