From the team
New models on Cloudach: Llama 3.1, Command R+, and DBRX
Four new models are now available — Llama 3.1 8B and 70B with 128K context, Cohere Command R+ for RAG and agents, and Databricks DBRX for coding and reasoning.
Fine-tune Llama 3 on your own data with Cloudach
A practical guide to fine-tuning Llama 3 with LoRA on Cloudach. Why fine-tuning beats prompting for domain tasks, how LoRA works under the hood, and how to get from raw data to a deployed model.
How to choose the right open-source LLM
A practical decision framework for picking the right open-source LLM. Decision tree, use case matrix, benchmark comparisons, and cost tradeoffs for Mistral, Llama 3, Mixtral, and more.
How we achieve sub-100ms TTFT on Llama 3 with vLLM
A deep dive into our inference stack — continuous batching, flash attention, and tensor parallelism tuning.
Cloudach is now in public beta
We're opening up the platform to all developers. Deploy your first model free — no credit card required.
Building an OpenAI-compatible API gateway from scratch
How we designed our API layer to be a drop-in replacement for the OpenAI SDK with any open-source model.