API Documentation

Cloudach is an OpenAI-compatible LLM API. Drop in your API key and base URL — no code changes needed.

Quickstart

Get from zero to your first API call in under 5 minutes.

Step 1 — Sign up

Go to app.cloudach.com/signup
Enter your email and create a password
You are now logged in and ready

Step 2 — Create an API key

Open the API Keys page in your dashboard
Click Create new key and give it a name (e.g. my-first-key)
Copy and store the key — it is shown only once. Format: sk-cloudach-...

Step 3 — Make your first call

curl https://api.cloudach.com/v1/chat/completions \
  -H "Authorization: Bearer sk-cloudach-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3-8b",
    "messages": [{"role": "user", "content": "Hello from Cloudach!"}]
  }'

Time to first token: ~1 second once your key is in hand.

Authentication

All requests require an Authorization header with your API key as a Bearer token.

Authorization: Bearer sk-cloudach-YOUR_KEY

API key properties

Prefix: sk-cloudach-
Stored as a SHA-256 hash server-side — the raw key is never recoverable after creation
Can be revoked instantly from the dashboard
Multiple keys per account are supported (one per integration recommended)
Auth cache TTL: 60 seconds — revocation propagates within 60s

Auth errors

// 401 — missing Authorization header
{"error": {"message": "Missing credentials. Include 'Authorization: Bearer <api-key>'.", "type": "invalid_request_error"}}

// 401 — invalid or revoked key
{"error": {"message": "Invalid or revoked API key.", "type": "authentication_error"}}

Endpoints

Base URL: https://api.cloudach.com/v1

Endpoint	Method	Description	Auth
`/v1/chat/completions`	POST	Chat messages (streaming supported)	Required
`/v1/completions`	POST	Text completions (legacy format)	Required
`/v1/models`	GET	List available models	Required
`/v1/models/{model_id}`	GET	Get a specific model	Required
`/health`	GET	Health check	None

POST /v1/chat/completions

OpenAI-compatible chat endpoint. Supports streaming via Server-Sent Events.

Request body

{
  "model": "llama3-8b",           // required — see /v1/models for available models
  "messages": [                   // required — non-empty array
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user",   "content": "What is 2 + 2?"}
  ],
  "stream": false,                // optional — true for SSE streaming
  "temperature": 0.7,             // optional — 0.0–2.0 (default 1.0)
  "max_tokens": 512               // optional — max completion tokens
}

Non-streaming response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "llama3-8b",
  "choices": [{
    "index": 0,
    "message": {"role": "assistant", "content": "The answer is 4."},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 22, "completion_tokens": 8, "total_tokens": 30}
}

Streaming (SSE)

Set "stream": true. Response is a stream of data: ... lines, ending with data: [DONE].

data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant","content":""},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"The "},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"answer is 4."},"index":0}],"usage":{"prompt_tokens":22,"completion_tokens":8,"total_tokens":30}}

data: [DONE]

POST /v1/completions

Legacy text completion endpoint (OpenAI format). Use chat completions for new integrations.

// Request
{
  "model": "llama3-8b",
  "prompt": "The capital of France is",
  "max_tokens": 20
}

// Response
{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "model": "llama3-8b",
  "choices": [{"text": " Paris.", "index": 0, "finish_reason": "stop"}],
  "usage": {"prompt_tokens": 7, "completion_tokens": 3, "total_tokens": 10}
}

GET /v1/models

// Response
{
  "object": "list",
  "data": [
    {"id": "llama3-8b",  "object": "model", "owned_by": "cloudach"},
    {"id": "mistral-7b", "object": "model", "owned_by": "cloudach"}
  ]
}

Available models

Model ID	Context	Best for
`llama3-8b`	8K	Fast chat, Q&A, summarization
`llama3-70b`	8K	Complex reasoning, analysis
`llama31-8b`	128K	Long-context chat, fast inference
`llama31-70b`	128K	State-of-the-art open model, long context
`mistral-7b`	32K	Long context, code, EU-hosted
`mixtral-8x7b`	32K	Best accuracy, complex tasks
`command-r-plus`	128K	RAG, tool use, multi-step agents
`dbrx`	32K	Coding, reasoning, MoE efficiency

Rate Limits

Rate limits apply per API key. All limits reset on a rolling window (RPM) or at midnight UTC (TPD).

Limits by plan

Plan	Requests / min (RPM)	Tokens / day (TPD)	Notes
`Free`	60	1,000,000	Default on sign-up
`Pro`	600	10,000,000	Available after plan upgrade
`Enterprise`	Custom	Custom	Contact sales@cloudach.com

Per-key overrides

You can set a custom rate_limit_rpm on individual API keys from the dashboard. Useful for restricting keys used in untrusted environments or increasing limits for high-throughput integrations.

Rate-limit response headers

Every API response includes these headers so you can track your usage proactively:

Header	Example value	Meaning
`X-RateLimit-Limit-Requests`	60	Your RPM ceiling
`X-RateLimit-Remaining-Requests`	42	Requests left in this 60-second window
`X-RateLimit-Reset-Requests`	2026-04-14T12:01:00Z	UTC timestamp when the window resets
`X-RateLimit-Limit-Tokens`	1000000	Your daily token ceiling
`X-RateLimit-Remaining-Tokens`	987432	Tokens left today
`X-RateLimit-Reset-Tokens`	2026-04-15T00:00:00Z	UTC timestamp of next daily reset
`Retry-After`	60	Seconds to wait before retrying (only on 429 responses)

429 error responses

Use the type field to distinguish RPM from TPD errors:

// RPM exceeded — type: "requests"
{"error": {"message": "Rate limit exceeded: 60 requests per minute.", "type": "requests", "code": "rate_limit_exceeded"}}

// TPD exceeded — type: "tokens"
{"error": {"message": "You have exceeded your daily token limit of 1,000,000 tokens. Tokens reset at midnight UTC.", "type": "tokens", "code": "rate_limit_exceeded"}}

Handling 429s — exponential backoff

Always respect the Retry-After header when present. Fall back to exponential backoff (1 s → 2 s → 4 s → 8 s) when the header is absent.

Python

import time
from openai import OpenAI, APIStatusError

client = OpenAI(api_key="sk-cloudach-YOUR_KEY", base_url="https://api.cloudach.com/v1")

RETRYABLE = {429, 500, 502, 503}

def chat_with_backoff(messages, model="llama3-8b", max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(model=model, messages=messages)
        except APIStatusError as e:
            if e.status_code not in RETRYABLE or attempt == max_retries - 1:
                raise
            retry_after = e.response.headers.get("Retry-After")
            wait = float(retry_after) if retry_after else 2 ** attempt
            print(f"Attempt {attempt + 1} failed ({e.status_code}). Retrying in {wait}s...")
            time.sleep(wait)

Node.js

import OpenAI from "openai";

const client = new OpenAI({ apiKey: "sk-cloudach-YOUR_KEY", baseURL: "https://api.cloudach.com/v1" });

const RETRYABLE = new Set([429, 500, 502, 503]);

async function chatWithBackoff(messages, model = "llama3-8b", maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({ model, messages });
    } catch (err) {
      if (!(err instanceof OpenAI.APIError) || !RETRYABLE.has(err.status) || attempt === maxRetries - 1) {
        throw err;
      }
      const retryAfter = err.headers?.["retry-after"];
      const wait = retryAfter ? parseFloat(retryAfter) * 1000 : Math.pow(2, attempt) * 1000;
      console.log(`Attempt ${attempt + 1} failed (${err.status}). Retrying in ${wait}ms...`);
      await new Promise((r) => setTimeout(r, wait));
    }
  }
}

Need higher limits? Contact sales to discuss enterprise quotas.

Error Codes

All errors follow the OpenAI error schema: {"error": {"message": "...", "type": "...", "code": "...", "param": "..."}}. param is only included when the error is tied to a specific request field.

Error reference

HTTP	code	type	Cause	Fix
`400`	invalid_request	invalid_request_error	Malformed JSON body	Validate JSON; set Content-Type: application/json
`400`	missing_required_param	invalid_request_error	model or messages missing	Include both model and messages in every request
`400`	invalid_param_value	invalid_request_error	temperature out of [0,2], empty messages, etc.	Validate parameter values before sending
`400`	context_length_exceeded	invalid_request_error	Prompt + max_tokens exceeds model context	Trim history or switch to a larger-context model
`401`	missing_credentials	invalid_request_error	No Authorization header	Add Authorization: Bearer <key>
`401`	invalid_api_key	authentication_error	Key is wrong, expired, or revoked	Check or rotate key in the dashboard
`403`	insufficient_quota	permission_error	Monthly token cap reached	Upgrade plan or wait for reset
`404`	model_not_found	invalid_request_error	Model ID not recognised	Call GET /v1/models for valid IDs
`404`	not_found	invalid_request_error	Route does not exist	Check base URL and path
`413`	request_too_large	invalid_request_error	Body > 1 MB	Chunk large payloads
`429`	rate_limit_exceeded	requests	RPM limit hit	Wait Retry-After seconds; use exponential backoff
`429`	rate_limit_exceeded	tokens	Daily token limit hit	Wait until midnight UTC for reset
`500`	internal_server_error	api_error	Unexpected server fault	Retry with backoff; contact support if persistent
`502`	model_backend_unavailable	api_error	Inference backend down or overloaded	Retry with exponential backoff
`503`	service_unavailable	api_error	Maintenance window	Check status.cloudach.com

Example error responses

400 — Context length exceeded

{
  "error": {
    "message": "This model's maximum context length is 8192 tokens, but your request has 9500 tokens (8100 prompt + 1400 max_tokens). Shorten your messages or reduce max_tokens.",
    "type": "invalid_request_error",
    "code": "context_length_exceeded"
  }
}

401 — Invalid key

{
  "error": {
    "message": "Invalid or revoked API key.",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}

404 — Model not found

{
  "error": {
    "message": "The model 'gpt-4' does not exist or you do not have access to it.",
    "type": "invalid_request_error",
    "code": "model_not_found",
    "param": "model"
  }
}

429 — Rate limited

// RPM exceeded (type: "requests")
{"error": {"message": "Rate limit exceeded: 60 requests per minute.", "type": "requests", "code": "rate_limit_exceeded"}}

// Daily token cap (type: "tokens")
{"error": {"message": "You have exceeded your daily token limit of 1,000,000 tokens. Tokens reset at midnight UTC.", "type": "tokens", "code": "rate_limit_exceeded"}}

Do NOT retry 400, 401, 403, or 404 errors — they indicate a bug in the request, not a transient fault. Retry 429, 500, 502, and 503 with exponential backoff.

Streaming error handling

Errors in streaming responses fall into two categories:

Pre-stream errors — the request fails before any data: events are sent. You receive a normal HTTP error response (non-200 status, JSON body). Handle identically to non-streaming errors.
Mid-stream errors — the backend fails after the stream has started. The data: sequence is cut short; the final event is an error object instead of data: [DONE].

Mid-stream error event

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"The answer is"},"index":0}]}

data: {"error": {"message": "Stream interrupted by server.", "type": "api_error", "code": "stream_error"}}

// Connection closes — [DONE] is NOT sent

Python — streaming with error handling

from openai import OpenAI, APIStatusError

client = OpenAI(api_key="sk-cloudach-YOUR_KEY", base_url="https://api.cloudach.com/v1")

collected = []
try:
    stream = client.chat.completions.create(
        model="llama3-8b",
        messages=[{"role": "user", "content": "Tell me a story."}],
        stream=True,
    )
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        collected.append(delta)
        print(delta, end="", flush=True)
except APIStatusError as e:
    # Covers both pre-stream HTTP errors and mid-stream 5xx faults
    print(f"\nStream error ({e.status_code}): {e.message}")
    # Retry the full request if e.status_code in {429, 500, 502, 503}

Node.js — streaming with error handling

import OpenAI from "openai";

const client = new OpenAI({ apiKey: "sk-cloudach-YOUR_KEY", baseURL: "https://api.cloudach.com/v1" });

const collected = [];
try {
  const stream = await client.chat.completions.create({
    model: "llama3-8b",
    messages: [{ role: "user", content: "Tell me a story." }],
    stream: true,
  });
  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content ?? "";
    collected.push(delta);
    process.stdout.write(delta);
  }
} catch (err) {
  if (err instanceof OpenAI.APIError) {
    console.error(`\nStream error (${err.status}): ${err.message}`);
    // Retry if err.status is in [429, 500, 502, 503]
  } else {
    throw err; // Network-level error (TCP reset, proxy timeout)
  }
}

Network-level interruptions (TCP resets, proxy timeouts) surface as connection errors from the HTTP client, not as API error JSON. Always wrap stream consumption in a try/catch and implement a retry strategy for the full request.

Webhooks

Webhooks let you receive real-time HTTP POST notifications when events happen in your Cloudach account. Register an endpoint URL in the dashboard and subscribe to the event types you care about.

Event types

Event	When it fires
`usage.threshold`	Cumulative spend for the billing period crosses a threshold
`api_key.created`	A new API key is created
`api_key.revoked`	An API key is revoked
`request.failed`	An API request returns a 4xx or 5xx status code

Verifying signatures

Every delivery includes an X-Cloudach-Signature header formatted as sha256=<hex>. Verify it by computing HMAC-SHA256(secret, rawBody) with your webhook signing secret and comparing the result. Reject requests where signatures do not match.

# Python
import hmac, hashlib

def verify(secret: str, body: bytes, header: str) -> bool:
    expected = "sha256=" + hmac.new(
        secret.encode(), body, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, header)

// Node.js
const crypto = require('crypto');

function verify(secret, rawBody, header) {
  const expected = 'sha256=' + crypto
    .createHmac('sha256', secret)
    .update(rawBody)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(header)
  );
}

Payload structure

{
  "id": "evt_01abc...",
  "event": "api_key.created",
  "created": 1713100800,
  "data": { ... }
}

Retry policy

Non-2xx responses or timeouts (10 s) trigger up to 3 retries with exponential back-off (0.5 s → 1 s → 2 s). View delivery history in the Webhooks dashboard.

SDK Reference

Cloudach is drop-in compatible with any OpenAI SDK. Change two values: base_url and api_key. All request/response shapes are identical.

SDK quickstart guides

Pythonpip install openai

Install, configure, first call in 5 lines

Node.jsnpm install openai

ESM, CommonJS, and TypeScript setup

SDK	Install	base_url / baseURL
`Python openai ≥ 1.0`	pip install openai	https://api.cloudach.com/v1
`Node.js openai ≥ 4.0`	npm install openai	https://api.cloudach.com/v1
`LangChain (Python)`	pip install langchain-openai	openai_api_base env var
`LiteLLM`	pip install litellm	api_base config
`Direct HTTP / curl`	—	Authorization: Bearer header

chat.completions.create — parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	Model ID. See /v1/models for available IDs.
`messages`	array	Yes	—	Non-empty array of {role, content} objects. Roles: system, user, assistant.
`stream`	boolean	No	false	If true, response is SSE stream of delta chunks ending with data: [DONE].
`temperature`	number	No	1.0	Sampling temperature. 0.0 = deterministic, 2.0 = very random.
`max_tokens`	number	No	model max	Maximum tokens to generate. Caps completion length.
`top_p`	number	No	1.0	Nucleus sampling. Alternative to temperature. Use one, not both.
`n`	number	No	1	Number of completions to generate. Higher values multiply token cost.
`stop`	string \| array	No	null	Stop sequence(s). Generation halts when one is produced.
`presence_penalty`	number	No	0.0	-2.0 to 2.0. Positive values penalise repeated topics.
`frequency_penalty`	number	No	0.0	-2.0 to 2.0. Positive values penalise repeated tokens.
`user`	string	No	—	Stable end-user identifier for abuse monitoring.

chat.completions.create — response (non-streaming)

{
  "id": "chatcmpl-abc123",           // unique completion ID
  "object": "chat.completion",
  "created": 1712345678,             // Unix timestamp
  "model": "llama3-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help?"
      },
      "finish_reason": "stop"        // "stop" | "length" | "content_filter"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 7,
    "total_tokens": 25
  }
}

chat.completions.create — streaming chunk

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1712345678,
  "model": "llama3-8b",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",  // only in first chunk
        "content": "Hello"   // null in first and last chunks
      },
      "finish_reason": null   // "stop" | "length" in final chunk
    }
  ],
  // usage only in the last content chunk (before [DONE])
  "usage": { "prompt_tokens": 18, "completion_tokens": 1, "total_tokens": 19 }
}

models.list — response

// GET /v1/models
{
  "object": "list",
  "data": [
    {
      "id": "llama3-8b",
      "object": "model",
      "created": 1712000000,
      "owned_by": "cloudach"
    }
    // ... more models
  ]
}

Python — full client reference

from openai import OpenAI, AsyncOpenAI

# Sync client
client = OpenAI(
    base_url="https://api.cloudach.com/v1",
    api_key="sk-cloudach-YOUR_KEY",
    timeout=60.0,       # request timeout in seconds (default: 60)
    max_retries=2,      # automatic retries on 429/5xx (default: 2)
)

# Async client (asyncio / FastAPI)
async_client = AsyncOpenAI(
    base_url="https://api.cloudach.com/v1",
    api_key="sk-cloudach-YOUR_KEY",
)

# Chat completion (sync)
response = client.chat.completions.create(model="llama3-8b", messages=[...])
text = response.choices[0].message.content
usage = response.usage  # .prompt_tokens, .completion_tokens, .total_tokens

# Chat completion (async)
response = await async_client.chat.completions.create(model="llama3-8b", messages=[...])

# List models
models = client.models.list()
for m in models.data:
    print(m.id)

# Retrieve a model
model = client.models.retrieve("llama3-8b")

Node.js — full client reference

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.cloudach.com/v1",
  apiKey: "sk-cloudach-YOUR_KEY",
  timeout: 60_000,    // ms (default: 60000)
  maxRetries: 2,      // automatic retries on 429/5xx (default: 2)
});

// Chat completion
const response = await client.chat.completions.create({ model: "llama3-8b", messages: [...] });
const text = response.choices[0].message.content;
const usage = response.usage; // .promptTokens, .completionTokens, .totalTokens

// Streaming
const stream = await client.chat.completions.create({
  model: "llama3-8b", messages: [...], stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

// List models
const models = await client.models.list();
for (const m of models.data) console.log(m.id);

Full guides: Python quickstart · Node.js quickstart · Streaming guide · Migrate from OpenAI

Integrations

Use Cloudach with popular LLM frameworks. Drop-in compatible — change the base URL and API key, nothing else.

LangChainPython

Use Cloudach as a ChatOpenAI provider. Covers basic chat, streaming, and LCEL chains.

LlamaIndexPython

Use Cloudach as the LLM backend in LlamaIndex. Covers completions, chat, streaming, and RAG pipelines.

Fine-Tuning

Fine-tuning lets you adapt a base model to your domain, tone, or task using your own labelled examples. Cloudach exposes fine-tuning through a simple REST API and serves the resulting LoRA adapters on top of vLLM with the same sub-100ms latency as base models.

Quickstart

The workflow has four steps: prepare a JSONL dataset → upload → create job → infer.

# 1. Upload your dataset
curl https://api.cloudach.com/v1/fine-tuning/datasets \
  -H "Authorization: Bearer $CLOUDACH_API_KEY" \
  -F "file=@training_data.jsonl" \
  -F "purpose=fine-tune"
# → {"id": "ds-8f3a2b1c", ...}

# 2. Create a LoRA fine-tuning job
curl https://api.cloudach.com/v1/fine-tuning/jobs \
  -H "Authorization: Bearer $CLOUDACH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3-8b",
    "training_file": "ds-8f3a2b1c",
    "method": {"type": "lora", "lora": {"rank": 16, "alpha": 32}},
    "hyperparameters": {"n_epochs": 3},
    "suffix": "my-model"
  }'
# → {"id": "ftjob-a1b2c3", "status": "queued"}

# 3. Poll until succeeded
curl https://api.cloudach.com/v1/fine-tuning/jobs/ftjob-a1b2c3 \
  -H "Authorization: Bearer $CLOUDACH_API_KEY"
# → {"status": "succeeded", "fine_tuned_model": "llama3-8b:ft:my-model:ftjob-a1b2c3"}

# 4. Infer — use fine_tuned_model as the model ID
curl https://api.cloudach.com/v1/chat/completions \
  -H "Authorization: Bearer $CLOUDACH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3-8b:ft:my-model:ftjob-a1b2c3", "messages": [...]}'

Dataset format: each line of your .jsonl file must be a JSON object with a messages array (identical to the OpenAI fine-tuning format). Minimum 100 examples.

{"messages": [
  {"role": "system", "content": "You are a helpful support agent for Acme Corp."},
  {"role": "user",   "content": "How do I reset my password?"},
  {"role": "assistant", "content": "Go to Settings → Security → Reset Password. You'll receive a link within 2 minutes."}
]}

LoRA adapters

Cloudach uses vLLM with multi-adapter LoRA support. LoRA trains lightweight adapter weights (≈ 0.1–1% of model size) rather than updating the full model. Key properties:

Adapter loading adds < 50 ms on first request; warm requests have zero overhead
Multiple adapters for the same base model share one GPU replica — you pay base model rates, not a new GPU per fine-tune
Adapters can be downloaded for self-hosted vLLM deployments

Parameter	Default	Description
`method.type`	lora	Training method: lora or full (8B models only)
`lora.rank`	16	Adapter capacity — 8/16/32/64. Higher = more expressive, higher cost
`lora.alpha`	2 × rank	Scaling factor. Usually set to 2× rank
`lora.target_modules`	q_proj, v_proj	Weight matrices to train
`n_epochs`	3	Training passes over the dataset
`batch_size`	16	Examples per gradient step

Supported base models

Model	Method	LoRA rank options
`llama3-8b`	Full fine-tune + LoRA	8, 16, 32, 64
`llama3-70b`	LoRA only	8, 16, 32
`llama31-8b`	Full fine-tune + LoRA	8, 16, 32, 64
`llama31-70b`	LoRA only	8, 16, 32
`mistral-7b`	Full fine-tune + LoRA	8, 16, 32, 64
`mixtral-8x7b`	LoRA only	8, 16, 32

Full reference: Fine-Tuning API Reference — all endpoints, parameters, error codes, and pricing. See also the step-by-step tutorial and the Data Preparation Guide.

Tutorials

Step-by-step guides for common use cases.

Python SDK quickstartBeginnerPython

Install, configure, and make your first chat completion in Python. 5 minutes.

Node.js SDK quickstartBeginnerNode.js

ESM, CommonJS, and TypeScript setup. First call in under 5 minutes.

Migrate from OpenAI to Cloudach in 2 minutesBeginner

Change base_url and api_key. Keep your existing OpenAI SDK and code unchanged.

Streaming guideIntermediate

How SSE works, collecting chunks, error handling, async Python, Next.js API routes, and React UI patterns.

Build a customer support bot with Llama 3Intermediate

Stream responses, handle context across turns, and deploy to production.

Fine-tune Llama 3 on your own dataBeginner

Prepare a JSONL dataset, launch a LoRA job, monitor training, and run inference on your custom model. End-to-end in 30 minutes.

API Playground

Try the API directly from your browser — no terminal needed. Fill in the inputs, click Run, and watch the response stream in real time. The code panel stays in sync as you type.

API Key

Create a key in your dashboard — it is pre-filled if you are already signed in.

ModelSystem PromptUser Message

Temperature 0.7

Max Tokens

Generated Code

Updates live as you change inputs.

curl https://api.cloudach.com/v1/chat/completions \
  -H "Authorization: Bearer sk-cloudach-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3-8b",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hello! Tell me about yourself."
        }
    ],
    "temperature": 0.7,
    "max_tokens": 256,
    "stream": true
  }'

FAQ

Is Cloudach fully OpenAI-compatible? +

Yes. Cloudach implements the OpenAI REST API spec for chat completions, text completions, and model listing. Any SDK or tool that targets OpenAI works with Cloudach by changing the base URL and API key.

Do I need to change my code to switch from OpenAI? +

No. Set base_url to https://api.cloudach.com/v1 and swap your API key. That's it. All request/response shapes are identical.

What are the current rate limits? +

60 requests per minute and 1,000,000 tokens per day per API key on the free tier. Contact sales@cloudach.com for enterprise limits.

What happens when I exceed the token quota? +

Requests will return a 429 with code rate_limit_exceeded and a Retry-After header. Tokens reset at midnight UTC. You can upgrade or purchase additional token packs from the dashboard.

Can I use Cloudach in production? +

Yes. Cloudach is production-ready with 99.9% uptime SLA on paid plans, sub-100ms median TTFT, and autoscaling infrastructure. See the Status page for live metrics.

Is my data private? +

Cloudach does not log prompt or completion content. Request metadata (token counts, model, timestamp) is stored for billing. Your data is never used to train models.

Which models are available? +

Llama 3 8B, Llama 3 70B, Mistral 7B, and Mixtral 8×7B are available today. New models are announced on the blog.

support@cloudach.com sales@cloudach.com Dashboard Changelog RSS