Cloudflare is a popular web infrastructure and website security provider that provides content delivery network services, DDoS mitigation, and distributed domain name server services. However, sometimes, after enabling certain security features on Cloudflare, webmasters might notice that Googlebot is blocked from crawling their site. They can receive notifications from Google Search Console like the one shown in the image below. This can be a significant issue as it can affect the site’s visibility on Google Search.
When you run a live test on Google Search Console for the affected URLs you will see an error like the below image.
While there are several methods available to address this issue, many of them involve compromising on security. In this blog post, I’ll first discuss a more secure method to fix the Googlebot blocked issue without removing any security measures. Then, I’ll summarize the common solutions that are based on removing actual securities from Cloudflare.
A Secure Method to Fix Googlebot Blocked by Cloudflare:
Cloudflare’s firewall rules, part of its Web Application Firewall (WAF), are designed to protect websites. However, some of these rules might inadvertently block Googlebot. Free Cloudflare users can add up to 5 custom rules while Premium account users have access to Cloudflare managed rules which can be reviewed and adjusted from the “Managed rules” under “Security > WAF”. For WordPress users, there’s also a specific “Cloudflare WordPress” ruleset that can be adjusted. Although you can consider turning off certain rules that might be causing the issue but here is a better method that does not compromise on the security:
- Navigate to the CloudFlare Events Tab: First, go to the CloudFlare Events tab under the Security dashboard.
- Locate the Google’s Crawler Event: Filter the results based on, for example, a user agent filter containing the word “Google”. This will help you identify the event related to Google’s crawler.
- Identify the ASN Line: Once you’ve located the event, find the ASN (Autonomous System Number) related to it. This number is crucial for the next steps.
- Edit the WAF Rule: Head over to the WAF (Web Application Firewall) page and locate the rule related to the JS challenge or Captcha challenge. Click to edit this rule.
- Add a New Condition: In the rule editor, add a new “and” condition. For the matching condition, select “AS name”. Then, input the ASN number you identified in step 3. Save your changes.
- Test with Google Search Console: Finally, return to the Google Search Console and conduct a live test on the URL. This will help you verify if Google can now crawl the URL without any issues.
If the above method did not fix the issue or if you do not have a custom rule in your CloudFlare WAF then you might need to see what else might have blocked Googlebot access to the site. Try by pausing CloudFlare for the site and test using Google Search Console URL inspection live test again to see if Google is still blocked. If even after pausing CloudFlare you see that Google is still blocked, the issue could be something else.
If the Problem Is Not Caused by a Custom WAF Rule
While the above method ensures that you don’t have to compromise on security, the issue might not be simply caused by a custom WAF rule. If that is the case and pausing the CloudFlare fixes the issues, try the following solutions that involve tweaking or disabling certain security features on Cloudflare.
Solutions Based on Removing Cloudflare Securities:
- Disable Bot Mode: Cloudflare’s bot mode is designed to block unknown and potentially malicious bots. However, sometimes it might block legitimate bots like Googlebot. To prevent this, navigate to the “Security > Bots” section and turn off the “Bot Fight Mode” option.
Super Bot Mode for Premium Accounts: For those using Pro and other premium plans on Cloudflare, there’s an additional “Super Bot Fight Mode” option. In the “Security > Bots” section, click on the “Configure Super Bot Fight Mode” link. Ideally, Cloudflare should recognize Googlebot as a verified bot and allow access by default. If you find that Googlebot is being blocked, adjust the settings for the “Definitely automated” item to “Allow”.
- Check IP/User Agent Blocking: Some users might manually block specific User Agents, IP addresses or even entire countries on Cloudflare. Since Googlebot uses a range of IP addresses, there’s a chance that some of these might have been blocked inadvertently. To review and adjust these settings, go to “Security > WAF” and click on the “Tools” tab. If you find specific Google IPs or User Agents being blocked, you can whitelist those to ensure Googlebot isn’t blocked.
In this link you can find a list of Google crawlers user agents and here you can find the list of Google crawler’s IP addresses.
Cloudflare’s primary goal is to protect websites from malicious bots. However, due to the vast number of bots online, there might be instances where legitimate bots like search engine crawlers get blocked as a false positive. While real users can bypass challenges like captchas, search engine bots cannot. Therefore, it’s essential to be cautious when adjusting Cloudflare settings and to monitor site traffic to ensure no unforeseen issues arise.
With the combination of the above methods, you should be able to make an informed decision on how to best address the Googlebot blocked issue on Cloudflare without compromising your website’s security.