Cloudflare Just Made AI Bots Pay to Read the Web. What It Means for Your Business Website
Cloudflare just changed the default rules of the open web. Under its new policy, the company blocks known AI crawlers by default on new sites it protects, and offers a pay per crawl system that lets website owners charge AI companies for access to their content. Cloudflare sits in front of close to one in five websites, so this is not a niche setting. If you run a business website, you now have a real say in whether AI tools read, train on, or resell your content.
What did Cloudflare actually announce?
Two things. First, for new domains that sign up, known AI crawler bots are blocked unless the owner chooses to allow them. Second, a marketplace where AI companies can pay publishers to crawl their pages instead of taking the content for free. The backdrop is a running fight between the companies that build AI models and the people who make the content those models are trained on. News publishers, forums, and small sites have spent two years watching their words get vacuumed up with nothing in return. Cloudflare, as the pipe a lot of that traffic runs through, is putting a tollbooth on the road.
Why does this matter if you run a small business?
Two reasons, pulling in opposite directions.
On the protective side, your website is an asset you paid to build. Product descriptions, guides, pricing pages, the FAQ you wrote at 11pm. AI crawlers copy that to train models or to answer questions inside tools like ChatGPT, often without ever sending a visitor back to you. Blocking or charging for that access is a way to stop giving away work for free.
On the visibility side, there is a catch. More and more people now ask an AI assistant a question instead of typing into a search box. If your site is invisible to those systems, you can vanish from the answers they give. That is the new version of not showing up in search. So the question is not simply block or allow. It is which bots, and for what.
Which AI bots are crawling your site right now?
Most crawling comes from a short list of named bots you can actually see in your server logs. GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google's AI training), PerplexityBot, and others each identify themselves with a user agent string. There are two rough categories worth separating. Training crawlers pull your content to build future models. Retrieval crawlers fetch a page in real time to answer a live user question, and often cite you. You may feel very differently about those two. Blocking the trainer while allowing the one that sends you traffic and credit is a reasonable stance.
Should you block AI crawlers or let them in?
It depends on how you make money from your site.
If your content is the product (a media site, a paid research library, original guides you sell or gate), leaning toward blocking or charging makes sense. You are protecting inventory.
If your website is a shopfront that exists to bring you customers (most local and service businesses), being quoted by an AI assistant is closer to free advertising. You probably want the retrieval bots in, so that when someone asks who does X near me, your business is in the answer.
Most owners land in the middle: allow the assistants that cite and refer, discourage the ones that only train. The point is that this is now a decision you get to make on purpose, not a default that happens to you.
How do you actually control AI access to your site?
You have a few levers, from crude to precise.
The oldest is robots.txt, a plain text file at the root of your site that lists which bots may crawl what. It is simple and free, but it is a request, not a wall. Well behaved bots honor it. Others ignore it.
The stronger lever is at the network layer. If your site runs behind Cloudflare or a similar service, you can enforce rules that actually block or meter bot traffic, which is what the new policy automates. Some content platforms and hosts now expose a simple toggle for this too.
The honest answer is that most small teams set this once and never revisit it. That is the real risk, not the technology.
How would a pragmatic automation shop think about this?
Skip the philosophy and treat it as an operations task with three steps.
First, look before you decide. Pull a week of your server or Cloudflare logs and see which AI bots are actually hitting you and how often. Most owners are guessing. The log tells the truth.
Second, split the decision by intent, not by fear. Allow the crawlers that send traffic or cite you, restrict the ones that only harvest. One rule for trainers, another for retrievers.
Third, make it a standing check, not a one time panic. Bots change names, new ones appear, the AI answer engines keep growing. A quarterly ten minute review of your logs and rules keeps you in control. That is the whole game with automation generally: the boring, repeatable check is where the value is, not the dramatic one off project.
You do not need Cloudflare's marketplace or a lawyer to start. You need to know who is reading your site, and to decide, on purpose, who you want there.
Want this handled for you?
Odyssey builds AI-powered automation for Australian businesses. We map the workflow, build the system, and keep it running.
GET A FREE AUDIT →