API Documentation
Cloudach is an OpenAI-compatible LLM API. Drop in your API key and base URL — no code changes needed.
Quickstart
Get from zero to your first API call in under 5 minutes.
Step 1 — Sign up
- Go to app.cloudach.com/signup
- Enter your email and create a password
- You are now logged in and ready
Step 2 — Create an API key
- Open the API Keys page in your dashboard
- Click Create new key and give it a name (e.g.
my-first-key) - Copy and store the key — it is shown only once. Format:
sk-cloudach-...
Step 3 — Make your first call
curl https://api.cloudach.com/v1/chat/completions \
-H "Authorization: Bearer sk-cloudach-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3-8b",
"messages": [{"role": "user", "content": "Hello from Cloudach!"}]
}'Authentication
All requests require an Authorization header with your API key as a Bearer token.
Authorization: Bearer sk-cloudach-YOUR_KEY
API key properties
- Prefix:
sk-cloudach- - Stored as a SHA-256 hash server-side — the raw key is never recoverable after creation
- Can be revoked instantly from the dashboard
- Multiple keys per account are supported (one per integration recommended)
- Auth cache TTL: 60 seconds — revocation propagates within 60s
Auth errors
// 401 — missing Authorization header
{"error": {"message": "Missing credentials. Include 'Authorization: Bearer <api-key>'.", "type": "invalid_request_error"}}
// 401 — invalid or revoked key
{"error": {"message": "Invalid or revoked API key.", "type": "authentication_error"}}Endpoints
Base URL: https://api.cloudach.com/v1
| Endpoint | Method | Description | Auth |
|---|---|---|---|
/v1/chat/completions | POST | Chat messages (streaming supported) | Required |
/v1/completions | POST | Text completions (legacy format) | Required |
/v1/models | GET | List available models | Required |
/v1/models/{model_id} | GET | Get a specific model | Required |
/health | GET | Health check | None |
POST /v1/chat/completions
OpenAI-compatible chat endpoint. Supports streaming via Server-Sent Events.
Request body
{
"model": "llama3-8b", // required — see /v1/models for available models
"messages": [ // required — non-empty array
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2 + 2?"}
],
"stream": false, // optional — true for SSE streaming
"temperature": 0.7, // optional — 0.0–2.0 (default 1.0)
"max_tokens": 512 // optional — max completion tokens
}Non-streaming response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1712345678,
"model": "llama3-8b",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "The answer is 4."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 22, "completion_tokens": 8, "total_tokens": 30}
}Streaming (SSE)
Set "stream": true. Response is a stream of data: ... lines, ending with data: [DONE].
data: {"id":"chatcmpl-abc","choices":[{"delta":{"role":"assistant","content":""},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"The "},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"answer is 4."},"index":0}],"usage":{"prompt_tokens":22,"completion_tokens":8,"total_tokens":30}}
data: [DONE]POST /v1/completions
Legacy text completion endpoint (OpenAI format). Use chat completions for new integrations.
// Request
{
"model": "llama3-8b",
"prompt": "The capital of France is",
"max_tokens": 20
}
// Response
{
"id": "cmpl-abc123",
"object": "text_completion",
"model": "llama3-8b",
"choices": [{"text": " Paris.", "index": 0, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 7, "completion_tokens": 3, "total_tokens": 10}
}GET /v1/models
// Response
{
"object": "list",
"data": [
{"id": "llama3-8b", "object": "model", "owned_by": "cloudach"},
{"id": "mistral-7b", "object": "model", "owned_by": "cloudach"}
]
}Available models
| Model ID | Context | Best for |
|---|---|---|
llama3-8b | 8K | Fast chat, Q&A, summarization |
llama3-70b | 8K | Complex reasoning, analysis |
llama31-8b | 128K | Long-context chat, fast inference |
llama31-70b | 128K | State-of-the-art open model, long context |
mistral-7b | 32K | Long context, code, EU-hosted |
mixtral-8x7b | 32K | Best accuracy, complex tasks |
command-r-plus | 128K | RAG, tool use, multi-step agents |
dbrx | 32K | Coding, reasoning, MoE efficiency |
Rate Limits
Rate limits apply per API key. All limits reset on a rolling window (RPM) or at midnight UTC (TPD).
Limits by plan
| Plan | Requests / min (RPM) | Tokens / day (TPD) | Notes |
|---|---|---|---|
Free | 60 | 1,000,000 | Default on sign-up |
Pro | 600 | 10,000,000 | Available after plan upgrade |
Enterprise | Custom | Custom | Contact sales@cloudach.com |
Per-key overrides
You can set a custom rate_limit_rpm on individual API keys from the dashboard. Useful for restricting keys used in untrusted environments or increasing limits for high-throughput integrations.
Rate-limit response headers
Every API response includes these headers so you can track your usage proactively:
| Header | Example value | Meaning |
|---|---|---|
X-RateLimit-Limit-Requests | 60 | Your RPM ceiling |
X-RateLimit-Remaining-Requests | 42 | Requests left in this 60-second window |
X-RateLimit-Reset-Requests | 2026-04-14T12:01:00Z | UTC timestamp when the window resets |
X-RateLimit-Limit-Tokens | 1000000 | Your daily token ceiling |
X-RateLimit-Remaining-Tokens | 987432 | Tokens left today |
X-RateLimit-Reset-Tokens | 2026-04-15T00:00:00Z | UTC timestamp of next daily reset |
Retry-After | 60 | Seconds to wait before retrying (only on 429 responses) |
429 error responses
Use the type field to distinguish RPM from TPD errors:
// RPM exceeded — type: "requests"
{"error": {"message": "Rate limit exceeded: 60 requests per minute.", "type": "requests", "code": "rate_limit_exceeded"}}
// TPD exceeded — type: "tokens"
{"error": {"message": "You have exceeded your daily token limit of 1,000,000 tokens. Tokens reset at midnight UTC.", "type": "tokens", "code": "rate_limit_exceeded"}}Handling 429s — exponential backoff
Always respect the Retry-After header when present. Fall back to exponential backoff (1 s → 2 s → 4 s → 8 s) when the header is absent.
Python
import time
from openai import OpenAI, APIStatusError
client = OpenAI(api_key="sk-cloudach-YOUR_KEY", base_url="https://api.cloudach.com/v1")
RETRYABLE = {429, 500, 502, 503}
def chat_with_backoff(messages, model="llama3-8b", max_retries=5):
for attempt in range(max_retries):
try:
return client.chat.completions.create(model=model, messages=messages)
except APIStatusError as e:
if e.status_code not in RETRYABLE or attempt == max_retries - 1:
raise
retry_after = e.response.headers.get("Retry-After")
wait = float(retry_after) if retry_after else 2 ** attempt
print(f"Attempt {attempt + 1} failed ({e.status_code}). Retrying in {wait}s...")
time.sleep(wait)Node.js
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "sk-cloudach-YOUR_KEY", baseURL: "https://api.cloudach.com/v1" });
const RETRYABLE = new Set([429, 500, 502, 503]);
async function chatWithBackoff(messages, model = "llama3-8b", maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await client.chat.completions.create({ model, messages });
} catch (err) {
if (!(err instanceof OpenAI.APIError) || !RETRYABLE.has(err.status) || attempt === maxRetries - 1) {
throw err;
}
const retryAfter = err.headers?.["retry-after"];
const wait = retryAfter ? parseFloat(retryAfter) * 1000 : Math.pow(2, attempt) * 1000;
console.log(`Attempt ${attempt + 1} failed (${err.status}). Retrying in ${wait}ms...`);
await new Promise((r) => setTimeout(r, wait));
}
}
}Need higher limits? Contact sales to discuss enterprise quotas.
Error Codes
All errors follow the OpenAI error schema: {"error": {"message": "...", "type": "...", "code": "...", "param": "..."}}. param is only included when the error is tied to a specific request field.
Error reference
| HTTP | code | type | Cause | Fix |
|---|---|---|---|---|
400 | invalid_request | invalid_request_error | Malformed JSON body | Validate JSON; set Content-Type: application/json |
400 | missing_required_param | invalid_request_error | model or messages missing | Include both model and messages in every request |
400 | invalid_param_value | invalid_request_error | temperature out of [0,2], empty messages, etc. | Validate parameter values before sending |
400 | context_length_exceeded | invalid_request_error | Prompt + max_tokens exceeds model context | Trim history or switch to a larger-context model |
401 | missing_credentials | invalid_request_error | No Authorization header | Add Authorization: Bearer <key> |
401 | invalid_api_key | authentication_error | Key is wrong, expired, or revoked | Check or rotate key in the dashboard |
403 | insufficient_quota | permission_error | Monthly token cap reached | Upgrade plan or wait for reset |
404 | model_not_found | invalid_request_error | Model ID not recognised | Call GET /v1/models for valid IDs |
404 | not_found | invalid_request_error | Route does not exist | Check base URL and path |
413 | request_too_large | invalid_request_error | Body > 1 MB | Chunk large payloads |
429 | rate_limit_exceeded | requests | RPM limit hit | Wait Retry-After seconds; use exponential backoff |
429 | rate_limit_exceeded | tokens | Daily token limit hit | Wait until midnight UTC for reset |
500 | internal_server_error | api_error | Unexpected server fault | Retry with backoff; contact support if persistent |
502 | model_backend_unavailable | api_error | Inference backend down or overloaded | Retry with exponential backoff |
503 | service_unavailable | api_error | Maintenance window | Check status.cloudach.com |
Example error responses
400 — Context length exceeded
{
"error": {
"message": "This model's maximum context length is 8192 tokens, but your request has 9500 tokens (8100 prompt + 1400 max_tokens). Shorten your messages or reduce max_tokens.",
"type": "invalid_request_error",
"code": "context_length_exceeded"
}
}401 — Invalid key
{
"error": {
"message": "Invalid or revoked API key.",
"type": "authentication_error",
"code": "invalid_api_key"
}
}404 — Model not found
{
"error": {
"message": "The model 'gpt-4' does not exist or you do not have access to it.",
"type": "invalid_request_error",
"code": "model_not_found",
"param": "model"
}
}429 — Rate limited
// RPM exceeded (type: "requests")
{"error": {"message": "Rate limit exceeded: 60 requests per minute.", "type": "requests", "code": "rate_limit_exceeded"}}
// Daily token cap (type: "tokens")
{"error": {"message": "You have exceeded your daily token limit of 1,000,000 tokens. Tokens reset at midnight UTC.", "type": "tokens", "code": "rate_limit_exceeded"}}Streaming error handling
Errors in streaming responses fall into two categories:
- Pre-stream errors — the request fails before any
data:events are sent. You receive a normal HTTP error response (non-200 status, JSON body). Handle identically to non-streaming errors. - Mid-stream errors — the backend fails after the stream has started. The
data:sequence is cut short; the final event is an error object instead ofdata: [DONE].
Mid-stream error event
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"The answer is"},"index":0}]}
data: {"error": {"message": "Stream interrupted by server.", "type": "api_error", "code": "stream_error"}}
// Connection closes — [DONE] is NOT sentPython — streaming with error handling
from openai import OpenAI, APIStatusError
client = OpenAI(api_key="sk-cloudach-YOUR_KEY", base_url="https://api.cloudach.com/v1")
collected = []
try:
stream = client.chat.completions.create(
model="llama3-8b",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
collected.append(delta)
print(delta, end="", flush=True)
except APIStatusError as e:
# Covers both pre-stream HTTP errors and mid-stream 5xx faults
print(f"\nStream error ({e.status_code}): {e.message}")
# Retry the full request if e.status_code in {429, 500, 502, 503}Node.js — streaming with error handling
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "sk-cloudach-YOUR_KEY", baseURL: "https://api.cloudach.com/v1" });
const collected = [];
try {
const stream = await client.chat.completions.create({
model: "llama3-8b",
messages: [{ role: "user", content: "Tell me a story." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? "";
collected.push(delta);
process.stdout.write(delta);
}
} catch (err) {
if (err instanceof OpenAI.APIError) {
console.error(`\nStream error (${err.status}): ${err.message}`);
// Retry if err.status is in [429, 500, 502, 503]
} else {
throw err; // Network-level error (TCP reset, proxy timeout)
}
}Network-level interruptions (TCP resets, proxy timeouts) surface as connection errors from the HTTP client, not as API error JSON. Always wrap stream consumption in a try/catch and implement a retry strategy for the full request.
Webhooks
Webhooks let you receive real-time HTTP POST notifications when events happen in your Cloudach account. Register an endpoint URL in the dashboard and subscribe to the event types you care about.
Event types
| Event | When it fires |
|---|---|
usage.threshold | Cumulative spend for the billing period crosses a threshold |
api_key.created | A new API key is created |
api_key.revoked | An API key is revoked |
request.failed | An API request returns a 4xx or 5xx status code |
Verifying signatures
Every delivery includes an X-Cloudach-Signature header formatted as sha256=<hex>. Verify it by computing HMAC-SHA256(secret, rawBody) with your webhook signing secret and comparing the result. Reject requests where signatures do not match.
# Python
import hmac, hashlib
def verify(secret: str, body: bytes, header: str) -> bool:
expected = "sha256=" + hmac.new(
secret.encode(), body, hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, header)// Node.js
const crypto = require('crypto');
function verify(secret, rawBody, header) {
const expected = 'sha256=' + crypto
.createHmac('sha256', secret)
.update(rawBody)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(expected),
Buffer.from(header)
);
}Payload structure
{
"id": "evt_01abc...",
"event": "api_key.created",
"created": 1713100800,
"data": { ... }
}Retry policy
Non-2xx responses or timeouts (10 s) trigger up to 3 retries with exponential back-off (0.5 s → 1 s → 2 s). View delivery history in the Webhooks dashboard.
SDK Reference
Cloudach is drop-in compatible with any OpenAI SDK. Change two values: base_url and api_key. All request/response shapes are identical.
SDK quickstart guides
pip install openainpm install openai| SDK | Install | base_url / baseURL |
|---|---|---|
Python openai ≥ 1.0 | pip install openai | https://api.cloudach.com/v1 |
Node.js openai ≥ 4.0 | npm install openai | https://api.cloudach.com/v1 |
LangChain (Python) | pip install langchain-openai | openai_api_base env var |
LiteLLM | pip install litellm | api_base config |
Direct HTTP / curl | — | Authorization: Bearer header |
chat.completions.create — parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model ID. See /v1/models for available IDs. |
messages | array | Yes | — | Non-empty array of {role, content} objects. Roles: system, user, assistant. |
stream | boolean | No | false | If true, response is SSE stream of delta chunks ending with data: [DONE]. |
temperature | number | No | 1.0 | Sampling temperature. 0.0 = deterministic, 2.0 = very random. |
max_tokens | number | No | model max | Maximum tokens to generate. Caps completion length. |
top_p | number | No | 1.0 | Nucleus sampling. Alternative to temperature. Use one, not both. |
n | number | No | 1 | Number of completions to generate. Higher values multiply token cost. |
stop | string | array | No | null | Stop sequence(s). Generation halts when one is produced. |
presence_penalty | number | No | 0.0 | -2.0 to 2.0. Positive values penalise repeated topics. |
frequency_penalty | number | No | 0.0 | -2.0 to 2.0. Positive values penalise repeated tokens. |
user | string | No | — | Stable end-user identifier for abuse monitoring. |
chat.completions.create — response (non-streaming)
{
"id": "chatcmpl-abc123", // unique completion ID
"object": "chat.completion",
"created": 1712345678, // Unix timestamp
"model": "llama3-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help?"
},
"finish_reason": "stop" // "stop" | "length" | "content_filter"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 7,
"total_tokens": 25
}
}chat.completions.create — streaming chunk
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"created": 1712345678,
"model": "llama3-8b",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant", // only in first chunk
"content": "Hello" // null in first and last chunks
},
"finish_reason": null // "stop" | "length" in final chunk
}
],
// usage only in the last content chunk (before [DONE])
"usage": { "prompt_tokens": 18, "completion_tokens": 1, "total_tokens": 19 }
}models.list — response
// GET /v1/models
{
"object": "list",
"data": [
{
"id": "llama3-8b",
"object": "model",
"created": 1712000000,
"owned_by": "cloudach"
}
// ... more models
]
}Python — full client reference
from openai import OpenAI, AsyncOpenAI
# Sync client
client = OpenAI(
base_url="https://api.cloudach.com/v1",
api_key="sk-cloudach-YOUR_KEY",
timeout=60.0, # request timeout in seconds (default: 60)
max_retries=2, # automatic retries on 429/5xx (default: 2)
)
# Async client (asyncio / FastAPI)
async_client = AsyncOpenAI(
base_url="https://api.cloudach.com/v1",
api_key="sk-cloudach-YOUR_KEY",
)
# Chat completion (sync)
response = client.chat.completions.create(model="llama3-8b", messages=[...])
text = response.choices[0].message.content
usage = response.usage # .prompt_tokens, .completion_tokens, .total_tokens
# Chat completion (async)
response = await async_client.chat.completions.create(model="llama3-8b", messages=[...])
# List models
models = client.models.list()
for m in models.data:
print(m.id)
# Retrieve a model
model = client.models.retrieve("llama3-8b")Node.js — full client reference
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.cloudach.com/v1",
apiKey: "sk-cloudach-YOUR_KEY",
timeout: 60_000, // ms (default: 60000)
maxRetries: 2, // automatic retries on 429/5xx (default: 2)
});
// Chat completion
const response = await client.chat.completions.create({ model: "llama3-8b", messages: [...] });
const text = response.choices[0].message.content;
const usage = response.usage; // .promptTokens, .completionTokens, .totalTokens
// Streaming
const stream = await client.chat.completions.create({
model: "llama3-8b", messages: [...], stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
// List models
const models = await client.models.list();
for (const m of models.data) console.log(m.id);Full guides: Python quickstart · Node.js quickstart · Streaming guide · Migrate from OpenAI
Integrations
Use Cloudach with popular LLM frameworks. Drop-in compatible — change the base URL and API key, nothing else.
Fine-Tuning
Fine-tuning lets you adapt a base model to your domain, tone, or task using your own labelled examples. Cloudach exposes fine-tuning through a simple REST API and serves the resulting LoRA adapters on top of vLLM with the same sub-100ms latency as base models.
Quickstart
The workflow has four steps: prepare a JSONL dataset → upload → create job → infer.
# 1. Upload your dataset
curl https://api.cloudach.com/v1/fine-tuning/datasets \
-H "Authorization: Bearer $CLOUDACH_API_KEY" \
-F "file=@training_data.jsonl" \
-F "purpose=fine-tune"
# → {"id": "ds-8f3a2b1c", ...}
# 2. Create a LoRA fine-tuning job
curl https://api.cloudach.com/v1/fine-tuning/jobs \
-H "Authorization: Bearer $CLOUDACH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3-8b",
"training_file": "ds-8f3a2b1c",
"method": {"type": "lora", "lora": {"rank": 16, "alpha": 32}},
"hyperparameters": {"n_epochs": 3},
"suffix": "my-model"
}'
# → {"id": "ftjob-a1b2c3", "status": "queued"}
# 3. Poll until succeeded
curl https://api.cloudach.com/v1/fine-tuning/jobs/ftjob-a1b2c3 \
-H "Authorization: Bearer $CLOUDACH_API_KEY"
# → {"status": "succeeded", "fine_tuned_model": "llama3-8b:ft:my-model:ftjob-a1b2c3"}
# 4. Infer — use fine_tuned_model as the model ID
curl https://api.cloudach.com/v1/chat/completions \
-H "Authorization: Bearer $CLOUDACH_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "llama3-8b:ft:my-model:ftjob-a1b2c3", "messages": [...]}'Dataset format: each line of your .jsonl file must be a JSON object with a messages array (identical to the OpenAI fine-tuning format). Minimum 100 examples.
{"messages": [
{"role": "system", "content": "You are a helpful support agent for Acme Corp."},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "Go to Settings → Security → Reset Password. You'll receive a link within 2 minutes."}
]}LoRA adapters
Cloudach uses vLLM with multi-adapter LoRA support. LoRA trains lightweight adapter weights (≈ 0.1–1% of model size) rather than updating the full model. Key properties:
- Adapter loading adds < 50 ms on first request; warm requests have zero overhead
- Multiple adapters for the same base model share one GPU replica — you pay base model rates, not a new GPU per fine-tune
- Adapters can be downloaded for self-hosted vLLM deployments
| Parameter | Default | Description |
|---|---|---|
method.type | lora | Training method: lora or full (8B models only) |
lora.rank | 16 | Adapter capacity — 8/16/32/64. Higher = more expressive, higher cost |
lora.alpha | 2 × rank | Scaling factor. Usually set to 2× rank |
lora.target_modules | q_proj, v_proj | Weight matrices to train |
n_epochs | 3 | Training passes over the dataset |
batch_size | 16 | Examples per gradient step |
Supported base models
| Model | Method | LoRA rank options |
|---|---|---|
llama3-8b | Full fine-tune + LoRA | 8, 16, 32, 64 |
llama3-70b | LoRA only | 8, 16, 32 |
llama31-8b | Full fine-tune + LoRA | 8, 16, 32, 64 |
llama31-70b | LoRA only | 8, 16, 32 |
mistral-7b | Full fine-tune + LoRA | 8, 16, 32, 64 |
mixtral-8x7b | LoRA only | 8, 16, 32 |
Full reference: Fine-Tuning API Reference — all endpoints, parameters, error codes, and pricing. See also the step-by-step tutorial and the Data Preparation Guide.
Tutorials
Step-by-step guides for common use cases.
Install, configure, and make your first chat completion in Python. 5 minutes.
ESM, CommonJS, and TypeScript setup. First call in under 5 minutes.
Change base_url and api_key. Keep your existing OpenAI SDK and code unchanged.
How SSE works, collecting chunks, error handling, async Python, Next.js API routes, and React UI patterns.
Stream responses, handle context across turns, and deploy to production.
Prepare a JSONL dataset, launch a LoRA job, monitor training, and run inference on your custom model. End-to-end in 30 minutes.
API Playground
Try the API directly from your browser — no terminal needed. Fill in the inputs, click Run, and watch the response stream in real time. The code panel stays in sync as you type.
Create a key in your dashboard — it is pre-filled if you are already signed in.
Generated Code
Updates live as you change inputs.
curl https://api.cloudach.com/v1/chat/completions \
-H "Authorization: Bearer sk-cloudach-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3-8b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello! Tell me about yourself."
}
],
"temperature": 0.7,
"max_tokens": 256,
"stream": true
}'FAQ
Is Cloudach fully OpenAI-compatible? +
Yes. Cloudach implements the OpenAI REST API spec for chat completions, text completions, and model listing. Any SDK or tool that targets OpenAI works with Cloudach by changing the base URL and API key.
Do I need to change my code to switch from OpenAI? +
No. Set base_url to https://api.cloudach.com/v1 and swap your API key. That's it. All request/response shapes are identical.
What are the current rate limits? +
60 requests per minute and 1,000,000 tokens per day per API key on the free tier. Contact sales@cloudach.com for enterprise limits.
What happens when I exceed the token quota? +
Requests will return a 429 with code rate_limit_exceeded and a Retry-After header. Tokens reset at midnight UTC. You can upgrade or purchase additional token packs from the dashboard.
Can I use Cloudach in production? +
Yes. Cloudach is production-ready with 99.9% uptime SLA on paid plans, sub-100ms median TTFT, and autoscaling infrastructure. See the Status page for live metrics.
Is my data private? +
Cloudach does not log prompt or completion content. Request metadata (token counts, model, timestamp) is stored for billing. Your data is never used to train models.
Which models are available? +
Llama 3 8B, Llama 3 70B, Mistral 7B, and Mixtral 8×7B are available today. New models are announced on the blog.