Let’s talk about the ‘Crawled – currently not indexed’ status

The ‘Crawled – not yet indexed’ status is one of the most important and often seen in Google Search Console (GSC) accounts. Simply put, the status indicates that Google has crawled certain pages of your website but has decided that they are not good enough to be indexed.

To determine if your website has any ‘Crawled – currently not indexed’ status errors, you should access the ‘Excluded’ section of the Index Coverage report on your GSC account.

What’s the issue, you ask? It’s this: if your content isn’t indexed, it won’t appear in the search results. Which begs the question, what proportion of the money and energy you spend on your website is wasted?

Here’s an example: an e-commerce company has 10%-15% of its product pages crawled but not indexed because Googlebot deems them unworthy of its search results (we’ll examine the possible causes below). Imagine the potential revenue lost? Most companies without technical SEO expertise aren’t even aware they’re having such problems.

Frustratingly, the Google developer documentation leaves much to be desired. Google states the following: “The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling”. Thanks, Google! *facepalm*.

So if you noticed the status error in the ‘Excluded’ section of your Index Coverage report, now is the time to roll up your sleeves and get to work. In this article, we will explore the different reasons why Google thinks your content isn’t good enough to be shown to users and what you can do to fix it.

Let’s begin.

Is it a bug?

First of all, make sure the status is accurate. After all, even Google can make mistakes from time to time. There have been many instances where SEOs have discovered that certain pages have been flagged with the status error but are actually indexed and showing up in the SERPs.

According to John Muller, a Webmaster Trends Analyst at Google, this could be an issue related to a delay between the time Google crawls the page and the time it takes to filter down the indexing queue.

Regardless of the issue, the first step is to test the page. You can view the current status of the page by clicking each URL and selecting the ‘Inspect URL’ option (the magnifying glass). You’ll also be able to find information such as whether the URL was found on the sitemap, last crawl date, canonical status and more.

If the status remains as ‘Crawled – currently not indexed’, you can perform a Google search with the quotation marks operator using a bit of text from your page to see if the page is indexed. If it is, you should see the page shown in the search results. If not, more investigation is needed.

Lack of quality content

Ask yourself, does this page provide value for my users? If not, Google is probably thinking the same.

If the page is “thin” – meaning it has little content – then you have discovered the problem. Remember that Google’s number one goal (aside from making money, of course) is to display the most relevant search results. If your page doesn’t offer any real value, you’re in trouble.

Additionally, if the content on the page is too similar to other pages on your website, Google might not display it in the search results.

The best solution for both situations is adding more content or tweaking the existing copy to make it more original and valuable for users.

Duplicate content

Googlebot will more than likely shelve duplicate content in the no-index bin if it identifies your content as a duplicate of another page on your site. Generally speaking, you should avoid having too much duplicate content on your website, which may negatively affect it.

To fix the problem, add original content to the page. If you do not want to add original content, you can always add a canonical tag referencing the original content source and remove the page from your sitemap.

As a side note, I’ve seen instances where a staging website is indexed by mistake and shows up in search results. This can potentially cause duplicate content issues. In such an event, ask your developers to set a no-index directive to the staging website and add a login to prevent bots from accessing and indexing it.

Javascript rendering

Javascript is a tricky one to diagnose. In addition to understanding the code and how a crawler works, it’s still hard to determine exactly how Google is rendering your page and the Javascript render time available to your website.

Generally, you should ensure that critical components and content are rendered within 5 seconds. It’s likely to be problematic if your page is very resource-intensive, contains a large number of render-blocking scripts, or needs to make multiple API requests. In essence, you may not be able to render essential parts of your website before the crawler times out. This is also known as partial rendering. You can work with a technical SEO specialist and developer to identify how you can reduce Javascript resources on page load if you don’t have the technical know-how.

Assuming the Javascript is rendered on the client-side, you can stop Javascript from running in the browser settings or install one of many popular Chrome extensions to see what the page looks like without Javascript. Are there any stark differences? If, for example, the navigation or large blocks of content are missing, it could be that Google isn’t rendering the page properly.

You can also use different tools to test Javascript on your website. Onely’s WWJD (What would Javascript do?) is one such tool that compares the initial HTML source code and the rendered DOM.

Additionally, you can use Google’s Page Speed Insights and the Network tab found in the Chrome Developer Tools to understand what Javascript resources are being loaded. The waterfall visual gives you a clear indication of loading priorities and file/script sizes. There are many ways to fix Javascript issues. However, you’ll always find that your developer is your best friend when it comes to fixing Javascript errors.

RSS feed URLs

RSS feeds are old school. How long has it been since you subscribed to a blog using its RSS feed? Maybe 2007? The problem with popular CMS platforms like WordPress is that they duplicate your posts with the ‘/feed’ suffix at the end of the URL. 

On the one hand, Google is smart enough these days to ignore ‘/feed’ URLs, so they shouldn’t directly impact your website’s SEO.

At the same time, ‘/feed’ URLs provide an excellent opportunity for spammers to crawl your website and possibly send low-quality directory links or other malicious activities to your website. Ideally, you want to prevent feed URLs from being created by installing a plugin and setting up a robots.txt directive to stop crawlers from crawling ‘/feed’ URLs.

Paginated URLs

Pagination can be a real issue for SEO if implemented incorrectly. Most importantly, check what paginated pages are showing up in the ‘Crawled – currently not indexed’ report. For example, if it’s Author paginated pages, I wouldn’t generally worry about it too much unless it’s critical for your business to feature Author pages. You can always set a robots meta tag to noindex the page, followed by a directive within your robots.txt file if you want bots to stop crawling author pages. 

Above all, you want to ensure that your website’s pagination is accessible to web crawlers using standard <a href=””> HTML tags. And not overly reliant on Javascript rendering. And please, for the love of everything good in this world, avoid using infinite scrolling. Search Bots don’t like em’.

Out of stock products

You should check the availability of product pages on your website if you run an e-commerce shop. Occasionally, Google can decide to deindex unavailable products to keep SERPs as relevant as possible for users.

In this scenario, make sure the products are unavailable or out of stock. If not, try submitting an index request for your product page using your website’s GSC account. In addition, make sure the page is included in your shop’s product sitemap.

Document files

If you find a PDF, Excel, Word or Powerpoint file type URL in a status report, consider whether it should be indexed. That’s entirely up to you and your website’s objectives.

You can use the following query on Search (minus the quotation marks) to check if your website has any files indexed by Google:

“site:yourdomain.com filetype:pdf OR filetype:xls OR filetype:xlsx OR filetype:doc OR filetype:docx OR filetype:svg OR filetype:txt OR filetype:ppt OR filetype:pptx”

If any files appear, you must decide whether they should be indexed. Hopefully, it won’t be an internal board report or your company’s financials for the quarter. You should definitely be removing sensitive information as soon as possible.

URLs with query strings

Different implementations of your website can result in URLs that have query strings. Most commonly, these are internal search, faceted navigation, or pagination.

For instance, a clothing eCommerce site may allow users to filter products using a faceted menu. In other words, if I’m looking for a t-shirt, I can use a faceted menu to filter down the products on the page. By selecting red, medium, slim-fit, the following parameter may be added to the URL:

https://www.clothingshop.com/t-shirts?colour=red&size=medium&fit=slim-fit

It’s easy to imagine that the parameters can create millions of URL variations, depending on how many filters are in the faceted menu. From a technical SEO perspective, that’s not a good thing.

Therefore, it is crucial to identify which query string URLs appear in the ‘Crawled – currently not indexed’ report. In general, it is wise to have a plan of action for dealing with query string URLs rather than allow Google to decide for you. After all, if Google decides to index query string parameters for your website, it will negatively impact your crawl budget and cause a substantial bloating issue.

Summary

In the article, you learned that it is essential to determine if your website pages are being crawled by Google but are not indexed.

As soon as you find the status error in your GSC Index Coverage section, it is crucial that you identify what pages are affected and whether they need to be re-indexed. Follow the steps outlined above to identify the root of the problem and resolve it.

Furthermore, this issue highlights the importance of technical SEO. By building a website on a solid foundation, you can maximise the traffic you receive from search engines.