The ‘Crawled – not yet indexed’ status is one of the most important and often seen in Google Search Console (GSC) accounts. Simply put, the status indicates that Google has crawled certain pages of your website but has decided that they are not good enough to be indexed.
To determine if your website has any ‘Crawled – currently not indexed’ status errors, you should access the ‘Excluded’ section of the Index Coverage report on your GSC account.
What’s the issue, you ask? It’s this: if your content isn’t indexed, it won’t appear in the search results. Which begs the question, what proportion of the money and energy you spend on your website is wasted?
Here’s an example: an e-commerce company has 10%-15% of its product pages crawled but not indexed because Googlebot deems them unworthy of its search results (we’ll examine the possible causes below). Imagine the potential revenue lost? Most companies without technical SEO expertise aren’t even aware they’re having such problems.
Frustratingly, the Google developer documentation leaves much to be desired. Google states the following: “The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling”. Thanks, Google! *facepalm*.
So if you noticed the status error in the ‘Excluded’ section of your Index Coverage report, now is the time to roll up your sleeves and get to work. In this article, we will explore the different reasons why Google thinks your content isn’t good enough to be shown to users and what you can do to fix it.
Is it a bug?
First of all, make sure the status is accurate. After all, even Google can make mistakes from time to time. There have been many instances where SEOs have discovered that certain pages have been flagged with the status error but are actually indexed and showing up in the SERPs.
According to John Muller, a Webmaster Trends Analyst at Google, this could be an issue related to a delay between the time Google crawls the page and the time it takes to filter down the indexing queue.
Regardless of the issue, the first step is to test the page. You can view the current status of the page by clicking each URL and selecting the ‘Inspect URL’ option (the magnifying glass). You’ll also be able to find information such as whether the URL was found on the sitemap, last crawl date, canonical status and more.
If the status remains as ‘Crawled – currently not indexed’, you can perform a Google search with the quotation marks operator using a bit of text from your page to see if the page is indexed. If it is, you should see the page shown in the search results. If not, more investigation is needed.
Lack of quality content
Ask yourself, does this page provide value for my users? If not, Google is probably thinking the same.
If the page is “thin” – meaning it has little content – then you have discovered the problem. Remember that Google’s number one goal (aside from making money, of course) is to display the most relevant search results. If your page doesn’t offer any real value, you’re in trouble.
Additionally, if the content on the page is too similar to other pages on your website, Google might not display it in the search results.
The best solution for both situations is adding more content or tweaking the existing copy to make it more original and valuable for users.
Googlebot will more than likely shelve duplicate content in the no-index bin if it identifies your content as a duplicate of another page on your site. Generally speaking, you should avoid having too much duplicate content on your website, which may negatively affect it.
To fix the problem, add original content to the page. If you do not want to add original content, you can always add a canonical tag referencing the original content source and remove the page from your sitemap.
As a side note, I’ve seen instances where a staging website is indexed by mistake and shows up in search results. This can potentially cause duplicate content issues. In such an event, ask your developers to set a no-index directive to the staging website and add a login to prevent bots from accessing and indexing it.
RSS feed URLs
RSS feeds are old school. How long has it been since you subscribed to a blog using its RSS feed? Maybe 2007? The problem with popular CMS platforms like WordPress is that they duplicate your posts with the ‘/feed’ suffix at the end of the URL.
On the one hand, Google is smart enough these days to ignore ‘/feed’ URLs, so they shouldn’t directly impact your website’s SEO.
At the same time, ‘/feed’ URLs provide an excellent opportunity for spammers to crawl your website and possibly send low-quality directory links or other malicious activities to your website. Ideally, you want to prevent feed URLs from being created by installing a plugin and setting up a robots.txt directive to stop crawlers from crawling ‘/feed’ URLs.
Pagination can be a real issue for SEO if implemented incorrectly. Most importantly, check what paginated pages are showing up in the ‘Crawled – currently not indexed’ report. For example, if it’s Author paginated pages, I wouldn’t generally worry about it too much unless it’s critical for your business to feature Author pages. You can always set a robots meta tag to noindex the page, followed by a directive within your robots.txt file if you want bots to stop crawling author pages.
Out of stock products
You should check the availability of product pages on your website if you run an e-commerce shop. Occasionally, Google can decide to deindex unavailable products to keep SERPs as relevant as possible for users.
In this scenario, make sure the products are unavailable or out of stock. If not, try submitting an index request for your product page using your website’s GSC account. In addition, make sure the page is included in your shop’s product sitemap.
If you find a PDF, Excel, Word or Powerpoint file type URL in a status report, consider whether it should be indexed. That’s entirely up to you and your website’s objectives.
You can use the following query on Search (minus the quotation marks) to check if your website has any files indexed by Google:
“site:yourdomain.com filetype:pdf OR filetype:xls OR filetype:xlsx OR filetype:doc OR filetype:docx OR filetype:svg OR filetype:txt OR filetype:ppt OR filetype:pptx”
If any files appear, you must decide whether they should be indexed. Hopefully, it won’t be an internal board report or your company’s financials for the quarter. You should definitely be removing sensitive information as soon as possible.
URLs with query strings
Different implementations of your website can result in URLs that have query strings. Most commonly, these are internal search, faceted navigation, or pagination.
For instance, a clothing eCommerce site may allow users to filter products using a faceted menu. In other words, if I’m looking for a t-shirt, I can use a faceted menu to filter down the products on the page. By selecting red, medium, slim-fit, the following parameter may be added to the URL:
It’s easy to imagine that the parameters can create millions of URL variations, depending on how many filters are in the faceted menu. From a technical SEO perspective, that’s not a good thing.
Therefore, it is crucial to identify which query string URLs appear in the ‘Crawled – currently not indexed’ report. In general, it is wise to have a plan of action for dealing with query string URLs rather than allow Google to decide for you. After all, if Google decides to index query string parameters for your website, it will negatively impact your crawl budget and cause a substantial bloating issue.
In the article, you learned that it is essential to determine if your website pages are being crawled by Google but are not indexed.
As soon as you find the status error in your GSC Index Coverage section, it is crucial that you identify what pages are affected and whether they need to be re-indexed. Follow the steps outlined above to identify the root of the problem and resolve it.
Furthermore, this issue highlights the importance of technical SEO. By building a website on a solid foundation, you can maximise the traffic you receive from search engines.