Docs/Tutorials/Python quickstart
Beginner~5 minPython

Python SDK Quickstart

Cloudach is OpenAI-compatible. You use the standard openai Python package — just point it at Cloudach's base URL. No new SDK to learn.

1. Install

Install the OpenAI Python SDK (v1.0 or later):

pip install openai

If you're using a virtual environment or conda, activate it first.

2. Configure

Create the client with your Cloudach API key and base URL. Store your key in an environment variable — never hard-code it in source files.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.cloudach.com/v1",
    api_key=os.environ["CLOUDACH_API_KEY"],
)
Your key looks like sk-cloudach-.... Get one from the Dashboard → API Keys page.

3. First call

Make your first chat completion — 5 lines of logic:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.cloudach.com/v1",
    api_key=os.environ["CLOUDACH_API_KEY"],
)

response = client.chat.completions.create(
    model="llama3-8b",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
# → "The capital of France is Paris."
print(f"Tokens used: {response.usage.total_tokens}")

Run it:

CLOUDACH_API_KEY=sk-cloudach-... python your_script.py

4. Add a system prompt

Use the system role to give the model a persona or set of instructions. It always comes first in the messages array.

response = client.chat.completions.create(
    model="llama3-8b",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant. Reply in plain text only."},
        {"role": "user",   "content": "Explain what a REST API is in one sentence."},
    ],
)

print(response.choices[0].message.content)

5. Key parameters

The most useful parameters for chat.completions.create:

ParameterTypeDefaultDescription
modelstrRequired. Model ID, e.g. "llama3-8b", "mixtral-8x7b"
messageslistRequired. List of {"role", "content"} dicts
temperaturefloat1.0Randomness: 0.0 = deterministic, 2.0 = very random
max_tokensintmodel maxHard cap on response length in tokens
streamboolFalseSet True to receive tokens as they are generated
top_pfloat1.0Nucleus sampling threshold (alternative to temperature)
# Example with optional parameters
response = client.chat.completions.create(
    model="llama3-70b",
    messages=[{"role": "user", "content": "Write a haiku about distributed systems."}],
    temperature=0.8,
    max_tokens=100,
)

6. Streaming

Set stream=True to receive tokens as they are generated. The response becomes an iterator of chunks instead of a single object.

stream = client.chat.completions.create(
    model="llama3-8b",
    messages=[{"role": "user", "content": "Count from 1 to 5 slowly."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

print()  # newline after stream ends
Streaming dramatically improves perceived latency — users see the first token in ~1s instead of waiting for the full response. Use it in any interactive UI.

Next steps