← Back to blog
ProductApr 5, 2026

Cloudach is now in public beta

Today we're opening Cloudach to everyone. Any developer can sign up, deploy a model, and hit an OpenAI-compatible API endpoint in under 60 seconds — no waitlist, no credit card, no GPU setup.

Why we built this

Every developer we talked to during private beta had the same frustration: running open-source LLMs in production is unreasonably hard. You either pay a managed API tax to a closed-model provider, or you spend days wrangling CUDA drivers, vLLM configs, and auto-scaling logic before you can ship anything.

Cloudach is the third path. You get the economics and privacy of self-hosted open-source models with the operational simplicity of a managed API. No GPU ops. No infrastructure overhead. Just a deploy command and an endpoint.

What's in the beta

Everything you need to run LLMs in production:

  • 40+ models available — Llama 3 (8B, 70B), Mistral 7B, Mixtral 8×7B, Qwen 2, Phi-3, Code Llama, and more
  • OpenAI-compatible API — drop-in replacement for the OpenAI SDK. Change one line of code
  • Sub-100ms TTFT on our A100 fleet for 8B-class models
  • Auto-scaling — scale to zero when idle, scale up instantly on traffic
  • Usage dashboard — token counts, latency histograms, cost breakdown per model
  • Free tier — 1M tokens/month free, no credit card required

How it works

The deploy flow takes about 45 seconds:

# 1. Install the CLI
npm install -g cloudach
# 2. Deploy any model
cloudach deploy --model meta-llama/Llama-3-8B-Instruct
# 3. Call it like OpenAI
✓ Live → api.cloudach.com/your-endpoint

If you already use the OpenAI SDK, you just change the base URL and your API key. Everything else — streaming, function calling, embeddings — works identically.

Private beta learnings

Over 500 developers used Cloudach in private beta. A few things we learned:

  • The most-deployed model is Llama 3 8B — it hits the right balance of quality and cost for most chat and completion workloads
  • 60% of users switched from the OpenAI API specifically for cost reasons. Average savings reported: ~70%
  • The second biggest reason was data privacy — many teams can't send customer data to third-party APIs. Cloudach processes nothing and stores nothing
  • Most common feedback: "I expected this to be harder." That's the goal

What's coming next

We're working on fine-tuning support (bring your own LoRA adapter), embeddings endpoints for RAG pipelines, and a private VPC deployment option for enterprise teams. If any of those are blockers for you, reach out — we prioritize based on what the community is building.

Sign up free → and deploy your first model in 60 seconds.