www.inspired-designco.comMy site is pretty simple, 4 "levels".
Level 1 = Home page = Category List
Level 2 = Product Listing
Level 3 = Product Page
Level 4 = Cart/Checkout/etc
On Level1 Home page, you choose a Category which takes you to Level2 Product Listing where you choose a Product which takes you to Level3 Product Page.
So the url for any product page should be sitename/listname/product, or "level1/level2/level3".
There are no Product links on the Level1 Home Category page. You cannot jump from Level1 to Level3.
So there should be no such thing as a level1/level3 url. My AC site code should never generate a level1/level3 url.
I have just learned that googlebot indexed all of my product pages as level1/level3 urls.
How did I learn this?
I asked google support why the googlebot was only indexing thumbnail images but no fullsize product images from my site.
The support team explored my site and told me that the first problem they found was that none of my product pages were being indexed because all of my product pages were duplicates of pages that the bot already indexed.
Let's use aprons for example.
Go to home page. This will be the level1 url:
https://www.inspired-designco.com/index.php?rt=index/homeChoose the Apron category. This will take you to the proper level1/level2 url:
https://www.inspired-designco.com/better-bib-linen-chefs-apronThen choose Navy-colored apron. This will take you to the proper level1/level2/level3 url:
https://www.inspired-designco.com/better-bib-linen-chefs-apron/better-bib-linen-chefs-apron-navyNow, go to this link:
https://www.inspired-designco.com/better-bib-linen-chefs-apron-navyYou cannot access this url from any link or page on my site.
The only way to access this url is to manually enter it into the address bar, or click the convenient hot link I provided above.
This is a level1/level3 url.
It should not exist.
Google tells me that it indexed all of my product pages as this type of level1/level3 url.
Google tells me that the reason I cannot find any of my expected level1/level2/level3 Product Page urls indexed on Google is because the bot considers those L1/L2/L3 urls to be duplicates of the L1/L3 urls that it already indexed, and that this "canonical problem may be contributing to the absence of indexed fullsize images which only appear on your Level3 pages".
They are correct about the fullsize image location: the only place to see fullsize product images is on a product page.
I checked my xml sitemap generated by
https://xmlsitemapgenerator.org/sitemap-generator.aspx.
To my surprise, all of the product page urls in my sitemap are of the incorrect type of level1/level3 urls.
So I used a different sitemap generator to create another sitemap:
www.xml-sitemaps.comThis generator output
three types of Product Page urls! They are:
Correct type: level1/level2/level3 url.
Incorrect type: level1/level3 url.
Strange new incorrect type: level1/
new-weird-level2/level3 url.
The strange new url is
https://www.inspired-designco.com/id-inspired-design-co/proper-stuff-pillow-herringbone .
I have no idea how the sitemap crawler came up with the red part. But the link actually works!
Even stranger, Google has actually indexed that page!!
Enter this into your google search bar: "site:https://www.inspired-designco.com/id-inspired-design-co/proper-stuff-pillow-herringbone"
You see? That url has actually been indexed by Google!
There is no way for the site to create that url, and yet robots find it and index it instead of the desired and predictable L1/L2/L3 url that should be created during normal navigation around the site. .
Solutions:I think its' possible to manually clean up a sitemap so that the map only features the desired L1/L2/L3 type of url for Product Pages, then submit the map and ask google to recrawl the site and hope that the duplicate content / canonical / weird urls problem vanishes.
The clean-up will be a labor-intensive and time-consuming process, even for my small site. Impossible for a large site.
And there is no guarantee that it will work.
A sitemap crawler identified and defined strange urls that should not exist.
Even if I clean up the xml sitemap by hand, there is no guarantee that the google bot won't find and index the same strange urls.
So that leads to my questions.
Why is a bot crawling my basic AC site and coming up with urls that would otherwise never be created?
There is no way for a site visitor doing normal navigation to create a level1/level3 url. You have to go through level2 to see a level3 url.
Likewise, there is no way for a site visitor doing normal navigation to create a level1/red-text/level3 url.
Did I do something wrong when I built this AC site?
Is there some button or feature I need to set to avoid having all these different and undesired pathways to a Product Page?
Has anyone else encountered strange urls discovered by spiders?
What can I do to the back or front end so that bots do not discover strange, undesired urls?
Thx.