Google Webmaster Tools huge discrepancy between submitted and indexed pages
A sitemap is a file that contains a list of pages found on a website. Webmasters submit their sitemap to Google and other search engines in order to inform them about the organization of their website's content. While the main purpose of submitting a sitemap is to help search engine web crawlers like Googlebot to crawl a website more intelligently, doing so will also give web pages a higher chance to get indexed. However, submitting a sitemap through Google Webmaster Tools does not guarantee that all the URLs in it will get indexed.
Days after a sitemap is submitted, Google will begin to show the number of pages that it managed to index so far. But what if there is a huge difference between the number of submitted and indexed pages? It is important to know that indexed pages are not permanent; its number may constantly change over time. Depending on how a website as well as its content is managed, the number of indexed pages may gradually increase or relentlessly drop.
Here are some possible reasons why Google Webmaster Tools is not indexing your posts
These are common issues affecting indexation that are often missed when troubleshooting. For most cases, addressing the issues listed on this article will more likely fix the problem regarding indexation in Google Webmaster Tools.
1. There's an issue about duplicate content
Duplication happens when web crawlers find an exact copy of a particular content. Duplicate content may give Google an impression that it is a spam, poorly optimized, plagiarized, or just a low quality content. As a result, it will not get indexed.A.) The source of problem can possibly be in the website theme. Some websites may encounter content duplication because of pagination, labels, and other website elements that are not properly configured. Web crawlers are seeing your pages more than once therefore they do not index it. Be careful when tweaking your website's codes and make sure it has no coding error.
For Blogspot users who are using Emporio theme, avoid implementing a numbered pagination. Based on my experience, it definitely caused content duplication and there was a huge drop in my indexed pages on Google Webmaster's Tools. I did a lot of troubleshooting and asked for help on Google forum but it took me long to find out that this is the reason why my indexed pages were decreasing. Just keep "more post" button for page navigation.
B.) Web crawlers found duplicates from external sources. It is probably because you or other webmasters have posted a copied content. While extremely rare, it can also be a matter of coincidence. Remember, Google puts a strong emphasis on originality as it does on quality.
2. It does not give 200 (OK) response
In other words, the content is not accessible. Probably because it was deleted, reverted to draft, the URL was changed, or due to a server error. Make sure the URLs are accessible and the pages are properly loading.
3. Some pages are blocked by Robots.txt
If the number of indexed posts starts dropping on Google Webmaster Tools, it can be due to the recent changes made to the website's settings. Although uncommon, it is still possible especially for newbie webmasters to unknowingly mess up the website's Robots.txt. Keenness to detail is imperative when working on Robots.txt because Google may unable to find the website's pages due to a mistake. Always review the website's Robots.txt and make sure it is configured in the way it is intended to be.
Note: You can check if your Robots.txt is blocking a specific page in your website using the "Fetch and Render" function on Google Webmaster Tools. For beginners, as much as possible just use the default robots.txt (especially if you're using google's blogspot) to avoid complications.
4. "Noindex” Meta Robots Tag
Configuring the meta robots tag for each post is possible. It's purpose is to let publishers choose a specific setting for each post. Better review your post's meta tags because you might have accidentally tagged your posts as "noindex".
5. Those are "Orphaned" pages
If there are no links to your content, either on your site or an external site, then Google will not crawl it. Orphaned pages do not have any internal links pointing to it and can only be accessed through its URL. Furthermore, because of its "dead-end" nature, Google does not index this kind of page.
Implementing features such as "related posts" to lead your audience to another content will help in connecting pages in your website. Just make sure that all your pages are connected to another to avoid having orphaned pages.
6. The quality of content failed to pass Google's standards
As harsh as it may seem, but it is how Google separate quality content from rubbish ones. I used to create posts with only a generic title and an embedded video in one of my websites. Then I would submit my sitemap to Google with high hopes of having it indexed. It was a bit late when I realized that google will have a hard time understanding what my posts. I had to edit approximately 150+ posts due to this shortcoming in my part. Focus on making high quality content and Google will love you for it. For Google, content is king. But you have to make sure Google will understand your post.
Comments
Post a Comment
We appreciate your input but please be RESPECTFUL and DO NOT SPAM!
Make sure you complete the CAPTCHA before publishing.