In the last few days I’ve encountered a surprising number of clients and even SEOs who don’t fully understand XML sitemaps, so I’m here to clear up some things.
Let’s say you half read a blog post somewhere that said “if your site doesn’t have an XML sitemap, your site will never be indexed and you will be poor, miserable and die lonely”. So, you got your developer or SEO to make an XML sitemap for your website, or maybe you did it yourself with a free tool (because you’re cheap). All giddy and excited, you submit your sitemap through Google Webmaster Tools and wait for the magical day for Google to crawl it. Like Xmas morning, you creep down the stairs, log into GWT and start to cry because you see a report that looks like this:
“Only 262 pages indexed!” you scream. ”Why does Googlez hate me? Imma fire my SEO and kick a baby!”
In a fevered response, you (or your SEO) goes line by line through your sitemap.xml file to make sure there are no broken links, or malformed URLs (good for you!), but you can’t find anything. So instead, you resign yourself to being poor, miserable and dying lonely.
Well.. here’s something you may not have considered..
All URLs in a sitemap.xml file must return a 200 OK response
I find myself constantly amused by the number of XML Sitemaps I come across that have URLs that either 404 or redirect with a 301 or 302. What’s even more amusing, is when I find URLs that have been disallowed via robots.txt.
So, to help you all understand why the URLs in your XML sitemap may not be indexing fully, I’ve made some easy-to-follow pictures! Why? Because I know how much you hate reading.
URLs in XML Sitemap returning 404 Not Found responses
URLs in XML Sitemap returning 301 or 302 redirect responses
URLs in XML Sitemap disallowed via Robots.txt
URLs in XML Sitemap returning 200 OK responses