Beginner~5 minPython
Python SDK Quickstart
Cloudach is OpenAI-compatible. You use the standard openai Python package — just point it at Cloudach's base URL. No new SDK to learn.
1. Install
Install the OpenAI Python SDK (v1.0 or later):
pip install openai
If you're using a virtual environment or conda, activate it first.
2. Configure
Create the client with your Cloudach API key and base URL. Store your key in an environment variable — never hard-code it in source files.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.cloudach.com/v1",
api_key=os.environ["CLOUDACH_API_KEY"],
)Your key looks like
sk-cloudach-.... Get one from the Dashboard → API Keys page.3. First call
Make your first chat completion — 5 lines of logic:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.cloudach.com/v1",
api_key=os.environ["CLOUDACH_API_KEY"],
)
response = client.chat.completions.create(
model="llama3-8b",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
# → "The capital of France is Paris."
print(f"Tokens used: {response.usage.total_tokens}")Run it:
CLOUDACH_API_KEY=sk-cloudach-... python your_script.py
4. Add a system prompt
Use the system role to give the model a persona or set of instructions. It always comes first in the messages array.
response = client.chat.completions.create(
model="llama3-8b",
messages=[
{"role": "system", "content": "You are a concise technical assistant. Reply in plain text only."},
{"role": "user", "content": "Explain what a REST API is in one sentence."},
],
)
print(response.choices[0].message.content)5. Key parameters
The most useful parameters for chat.completions.create:
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | — | Required. Model ID, e.g. "llama3-8b", "mixtral-8x7b" |
messages | list | — | Required. List of {"role", "content"} dicts |
temperature | float | 1.0 | Randomness: 0.0 = deterministic, 2.0 = very random |
max_tokens | int | model max | Hard cap on response length in tokens |
stream | bool | False | Set True to receive tokens as they are generated |
top_p | float | 1.0 | Nucleus sampling threshold (alternative to temperature) |
# Example with optional parameters
response = client.chat.completions.create(
model="llama3-70b",
messages=[{"role": "user", "content": "Write a haiku about distributed systems."}],
temperature=0.8,
max_tokens=100,
)6. Streaming
Set stream=True to receive tokens as they are generated. The response becomes an iterator of chunks instead of a single object.
stream = client.chat.completions.create(
model="llama3-8b",
messages=[{"role": "user", "content": "Count from 1 to 5 slowly."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print() # newline after stream endsStreaming dramatically improves perceived latency — users see the first token in ~1s instead of waiting for the full response. Use it in any interactive UI.