Cloudach is now in public beta
Today we're opening Cloudach to everyone. Any developer can sign up, deploy a model, and hit an OpenAI-compatible API endpoint in under 60 seconds — no waitlist, no credit card, no GPU setup.
Why we built this
Every developer we talked to during private beta had the same frustration: running open-source LLMs in production is unreasonably hard. You either pay a managed API tax to a closed-model provider, or you spend days wrangling CUDA drivers, vLLM configs, and auto-scaling logic before you can ship anything.
Cloudach is the third path. You get the economics and privacy of self-hosted open-source models with the operational simplicity of a managed API. No GPU ops. No infrastructure overhead. Just a deploy command and an endpoint.
What's in the beta
Everything you need to run LLMs in production:
- 40+ models available — Llama 3 (8B, 70B), Mistral 7B, Mixtral 8×7B, Qwen 2, Phi-3, Code Llama, and more
- OpenAI-compatible API — drop-in replacement for the OpenAI SDK. Change one line of code
- Sub-100ms TTFT on our A100 fleet for 8B-class models
- Auto-scaling — scale to zero when idle, scale up instantly on traffic
- Usage dashboard — token counts, latency histograms, cost breakdown per model
- Free tier — 1M tokens/month free, no credit card required
How it works
The deploy flow takes about 45 seconds:
If you already use the OpenAI SDK, you just change the base URL and your API key. Everything else — streaming, function calling, embeddings — works identically.
Private beta learnings
Over 500 developers used Cloudach in private beta. A few things we learned:
- The most-deployed model is Llama 3 8B — it hits the right balance of quality and cost for most chat and completion workloads
- 60% of users switched from the OpenAI API specifically for cost reasons. Average savings reported: ~70%
- The second biggest reason was data privacy — many teams can't send customer data to third-party APIs. Cloudach processes nothing and stores nothing
- Most common feedback: "I expected this to be harder." That's the goal
What's coming next
We're working on fine-tuning support (bring your own LoRA adapter), embeddings endpoints for RAG pipelines, and a private VPC deployment option for enterprise teams. If any of those are blockers for you, reach out — we prioritize based on what the community is building.
Sign up free → and deploy your first model in 60 seconds.