Why I Built on Cloudflare Top-to-Bottom

30-second version

A typical AI product today runs on three or four clouds: Vercel/Netlify for frontend, AWS or Render for backend, OpenAI for the model, Supabase or Postgres-somewhere for storage. I run all of it on Cloudflare — Workers for compute, D1 for SQL, R2 for blobs, KV for cache, Vectorize for embeddings, AI Gateway for routing, Pages for static, Durable Objects for state. The bill is roughly 1/5 of the AWS equivalent (rough estimate, see breakdown below) and the deploy story collapses from “it depends” to “wrangler deploy.”

Why I converged on this

I tried the multi-cloud version first. The architecture diagram for a modest AI product looked like this:

Vercel for frontend
Railway for backend
OpenAI for inference
Supabase for Postgres + auth
Pinecone for vector search
S3 for files
Sentry for monitoring

Seven services, seven bills, seven failure modes, seven sets of secrets. Each was a “best of breed” choice that, taken together, was incoherent. Three problems I felt within a month:

Latency cost between clouds was real. Frontend in one region, backend in another, vector search in a third — every API call accumulated round-trips.
The mental model was a tax. Each service had its own dashboard, own pricing model, own retention quirks. Onboarding a collaborator took a week.
Bills compound non-linearly. Each service has a free tier, but the moment you cross any threshold on any of them, the bill grows. And the egress between them was the silent killer.

I rebuilt on Cloudflare top-to-bottom. The architecture diagram became:

Pages for frontend
Workers for backend
AI Gateway → OpenAI/Claude for inference
D1 for SQL
R2 for files
KV for cache
Vectorize for embeddings
Durable Objects for state
Workers Logs for monitoring

One bill. One CLI (wrangler). One dashboard. Latency between components is sub-millisecond because they are in the same edge.

Six reasons this adds up

1. Compute lives at the edge, not in a region. Workers runs in hundreds of cities. A user in Tokyo hits a Worker in Tokyo, which talks to D1 (which has read replicas globally) and Vectorize (also edge-deployed). The whole stack is geographically distributed by default. The AWS equivalent requires you to pick regions, configure read replicas, set up CloudFront, and then explain why the API Gateway is in us-east-1 anyway.

2. The runtime is small enough to actually understand. Workers is V8 isolates with a constrained API surface. You cannot run native binaries; you have a small set of bindings (KV, R2, D1, Queue, Vectorize). The constraints are restrictive. The constraints are also the reason it scales. There is no “cold start” the way Lambda has cold starts, because there is no node_modules to load.

3. Egress is free between Cloudflare services. This is the part that breaks the AWS pricing model. R2 has zero egress. Workers calling D1 or KV has zero egress. The “lots of small services talking to each other” architecture, which is expensive on AWS because of inter-service data transfer, is free on Cloudflare.

4. Deployments are atomic and instant. wrangler deploy builds your worker and pushes it globally in seconds. No CI pipeline, no container registry, no rolling deploy. If you need staging, you push to a different name. If you need to roll back, you redeploy the previous artifact.

5. The auth story is finally bearable. Workers + KV + a JWT library is enough for most apps. For more, Cloudflare Access for internal apps and better-auth (or any JWT-based library) for end users. None of this requires Auth0 or Cognito or rolling your own.

6. AI Gateway is the under-marketed killer feature. All my LLM calls go through AI Gateway. It gives me caching, retries, fallback to alternative providers, cost tracking per route, and per-feature analytics — for free, with one binding. The amount of code I deleted when I added this was embarrassing.

What you give up

Three real constraints:

1. No long-running compute. Workers have a 30-second CPU limit (or 5 minutes on the paid tier). For anything longer (large model inference, video processing, long batch jobs), you need a different service. I use Modal or Replicate for those. Cloudflare is for the edge logic and storage; the heavy ML still happens elsewhere.

2. No native binaries. WASM is supported, but if your dependency is a C++ library with no JS bindings, you are out of luck. The list of things this excludes is shorter than you’d think but worth checking up front.

3. The ecosystem is younger. AWS has been around for 17 years and has an answer for every weird requirement. Cloudflare has been serious about developer-platform for about 5 years and is missing some niche services (managed search beyond FTS5, managed ML training, full-fledged graph DB). For 90% of products this is fine. For the other 10%, you pay the AWS tax.

The two cases where I wouldn’t pick Cloudflare

Heavy ML training pipelines. If your product is training a model rather than calling one, Cloudflare’s compute model isn’t built for that. Use whichever cloud has the GPU/TPU pricing you need.
Strict regulatory data residency. Workers run at the edge by design, which means your data can be processed in any of hundreds of cities. For HIPAA, GDPR-strict, or government workloads with data-residency requirements, you need a more confined deployment model.

For everything else — and that’s most AI products — start on Cloudflare. You can always add a service in another cloud for the specific case it solves; you don’t have to start there.

How the industry actually allocates clouds

The dominant 2026 stacks for AI products: (1) **Vercel + AWS + OpenAI

Pinecone** — high developer ergonomics, multi-cloud bill; (2) AWS all-in — Bedrock for inference, OpenSearch for vectors, Lambda for compute, expensive but enterprise-grade; (3) Cloudflare top-to- bottom — what I describe here, niche but converging fast. The Cloudflare path is the only one where edge-native architecture is the default, not an optimization. The trade-off is the GPU/training story, which is why heavy-ML teams add Modal or Replicate. For agent-style workloads where inference is the bottleneck and storage is the cost, Cloudflare is the cleanest single-cloud answer in 2026. Cross- reference: Agent Framework Landscape for where compute fits in the agent stack.

How I would pitch this in an interview

“Why did you pick this stack?” — a softball that most candidates miss by listing services. The better answer:

I optimized for two metrics: time from idea to deploy, and bill at 10× scale. Both pointed to Cloudflare top-to-bottom. The deploy story is wrangler deploy. The bill is dominated by AI inference, not infra. The constraints (no long-running compute, no native binaries) cost me less than the AWS multi-region complexity would have. When I need heavy ML, I add Modal — but the edge logic and storage all stay on Cloudflare.

That answer is opinionated, has trade-offs, names specific constraints. The candidate who can name what they gave up is more trustworthy than the one who can’t.