Why I Built on Cloudflare Top-to-Bottom
Most AI products run on AWS or Vercel + a database somewhere. I run every layer on Cloudflare — Workers, D1, R2, KV, Vectorize, AI Gateway, Pages. Six reasons it adds up to less code and lower bills, and the two cases where I would not.
30-second version
A typical AI product today runs on three or four clouds: Vercel/Netlify for frontend, AWS or Render for backend, OpenAI for the model, Supabase or Postgres-somewhere for storage. I run all of it on Cloudflare — Workers for compute, D1 for SQL, R2 for blobs, KV for cache, Vectorize for embeddings, AI Gateway for routing, Pages for static, Durable Objects for state. The bill is roughly 1/5 of the AWS equivalent (rough estimate, see breakdown below) and the deploy story collapses from “it depends” to “wrangler deploy.”
Why I converged on this
I tried the multi-cloud version first. The architecture diagram for a modest AI product looked like this:
- Vercel for frontend
- Railway for backend
- OpenAI for inference
- Supabase for Postgres + auth
- Pinecone for vector search
- S3 for files
- Sentry for monitoring
Seven services, seven bills, seven failure modes, seven sets of secrets. Each was a “best of breed” choice that, taken together, was incoherent. Three problems I felt within a month:
- Latency cost between clouds was real. Frontend in one region, backend in another, vector search in a third — every API call accumulated round-trips.
- The mental model was a tax. Each service had its own dashboard, own pricing model, own retention quirks. Onboarding a collaborator took a week.
- Bills compound non-linearly. Each service has a free tier, but the moment you cross any threshold on any of them, the bill grows. And the egress between them was the silent killer.
I rebuilt on Cloudflare top-to-bottom. The architecture diagram became:
- Pages for frontend
- Workers for backend
- AI Gateway → OpenAI/Claude for inference
- D1 for SQL
- R2 for files
- KV for cache
- Vectorize for embeddings
- Durable Objects for state
- Workers Logs for monitoring
One bill. One CLI (wrangler). One dashboard. Latency between
components is sub-millisecond because they are in the same edge.
Six reasons this adds up
1. Compute lives at the edge, not in a region. Workers runs in hundreds of cities. A user in Tokyo hits a Worker in Tokyo, which talks to D1 (which has read replicas globally) and Vectorize (also edge-deployed). The whole stack is geographically distributed by default. The AWS equivalent requires you to pick regions, configure read replicas, set up CloudFront, and then explain why the API Gateway is in us-east-1 anyway.
2. The runtime is small enough to actually understand. Workers is V8 isolates with a constrained API surface. You cannot run native binaries; you have a small set of bindings (KV, R2, D1, Queue, Vectorize). The constraints are restrictive. The constraints are also the reason it scales. There is no “cold start” the way Lambda has cold starts, because there is no node_modules to load.
3. Egress is free between Cloudflare services. This is the part that breaks the AWS pricing model. R2 has zero egress. Workers calling D1 or KV has zero egress. The “lots of small services talking to each other” architecture, which is expensive on AWS because of inter-service data transfer, is free on Cloudflare.
4. Deployments are atomic and instant. wrangler deploy builds
your worker and pushes it globally in seconds. No CI pipeline, no
container registry, no rolling deploy. If you need staging, you push
to a different name. If you need to roll back, you redeploy the
previous artifact.
5. The auth story is finally bearable. Workers + KV + a JWT
library is enough for most apps. For more, Cloudflare Access for
internal apps and better-auth (or any JWT-based library) for end
users. None of this requires Auth0 or Cognito or rolling your own.
6. AI Gateway is the under-marketed killer feature. All my LLM calls go through AI Gateway. It gives me caching, retries, fallback to alternative providers, cost tracking per route, and per-feature analytics — for free, with one binding. The amount of code I deleted when I added this was embarrassing.
What you give up
Three real constraints:
1. No long-running compute. Workers have a 30-second CPU limit (or 5 minutes on the paid tier). For anything longer (large model inference, video processing, long batch jobs), you need a different service. I use Modal or Replicate for those. Cloudflare is for the edge logic and storage; the heavy ML still happens elsewhere.
2. No native binaries. WASM is supported, but if your dependency is a C++ library with no JS bindings, you are out of luck. The list of things this excludes is shorter than you’d think but worth checking up front.
3. The ecosystem is younger. AWS has been around for 17 years and has an answer for every weird requirement. Cloudflare has been serious about developer-platform for about 5 years and is missing some niche services (managed search beyond FTS5, managed ML training, full-fledged graph DB). For 90% of products this is fine. For the other 10%, you pay the AWS tax.
The two cases where I wouldn’t pick Cloudflare
- Heavy ML training pipelines. If your product is training a model rather than calling one, Cloudflare’s compute model isn’t built for that. Use whichever cloud has the GPU/TPU pricing you need.
- Strict regulatory data residency. Workers run at the edge by design, which means your data can be processed in any of hundreds of cities. For HIPAA, GDPR-strict, or government workloads with data-residency requirements, you need a more confined deployment model.
For everything else — and that’s most AI products — start on Cloudflare. You can always add a service in another cloud for the specific case it solves; you don’t have to start there.
How the industry actually allocates clouds
The dominant 2026 stacks for AI products: (1) **Vercel + AWS + OpenAI
- Pinecone** — high developer ergonomics, multi-cloud bill; (2) AWS all-in — Bedrock for inference, OpenSearch for vectors, Lambda for compute, expensive but enterprise-grade; (3) Cloudflare top-to- bottom — what I describe here, niche but converging fast. The Cloudflare path is the only one where edge-native architecture is the default, not an optimization. The trade-off is the GPU/training story, which is why heavy-ML teams add Modal or Replicate. For agent-style workloads where inference is the bottleneck and storage is the cost, Cloudflare is the cleanest single-cloud answer in 2026. Cross- reference: Agent Framework Landscape for where compute fits in the agent stack.
How I would pitch this in an interview
“Why did you pick this stack?” — a softball that most candidates miss by listing services. The better answer:
I optimized for two metrics: time from idea to deploy, and bill at 10× scale. Both pointed to Cloudflare top-to-bottom. The deploy story is
wrangler deploy. The bill is dominated by AI inference, not infra. The constraints (no long-running compute, no native binaries) cost me less than the AWS multi-region complexity would have. When I need heavy ML, I add Modal — but the edge logic and storage all stay on Cloudflare.
That answer is opinionated, has trade-offs, names specific constraints. The candidate who can name what they gave up is more trustworthy than the one who can’t.