LangChain integration
Use Cloudach as a ChatOpenAI provider in LangChain. Because Cloudach is fully OpenAI-compatible, you only need to change two config values. This guide covers basic chat, streaming, and building LCEL chains.
Overview
What you'll learn:
- Configure
ChatOpenAIto use Cloudach models (Llama 3, Mistral, Mixtral) - Stream tokens as they're generated
- Build a prompt → model → parser chain with LangChain Expression Language (LCEL)
- Run a complete multi-question demo script
Install
pip install langchain langchain-openai
Set your API key in the environment (recommended) or pass it directly in code:
export CLOUDACH_API_KEY="sk-cloudach-YOUR_KEY"
Step 1 — Basic usage
Instantiate ChatOpenAI with openai_api_base pointing at Cloudach and your Cloudach API key. Everything else — invoke, batch, tool calling — works identically to the OpenAI version.
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
llm = ChatOpenAI(
model="llama3-70b",
openai_api_key=os.environ["CLOUDACH_API_KEY"],
openai_api_base="https://api.cloudach.com/v1",
temperature=0.7,
)
messages = [
SystemMessage(content="You are a concise technical assistant."),
HumanMessage(content="What is the difference between a process and a thread?"),
]
response = llm.invoke(messages)
print(response.content)llama3-8b for fast, high-volume pipelines and llama3-70b or mixtral-8x7b for complex reasoning.Step 2 — Streaming
Pass streaming=True to the constructor, then call .stream() on your chain or LLM. Each chunk is a BaseMessageChunk with a .content string.
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(
model="llama3-70b",
openai_api_key=os.environ["CLOUDACH_API_KEY"],
openai_api_base="https://api.cloudach.com/v1",
streaming=True,
)
for chunk in llm.stream([HumanMessage(content="Write a haiku about open source software.")]):
print(chunk.content, end="", flush=True)
print()Step 3 — LCEL chains
LangChain Expression Language (LCEL) lets you compose prompts, models, and parsers with the | pipe operator. The chain is lazy and composable — the same chain can be invoked, streamed, or batched.
Basic chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(
model="llama3-70b",
openai_api_key=os.environ["CLOUDACH_API_KEY"],
openai_api_base="https://api.cloudach.com/v1",
)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a concise technical writer. Answer in 2–3 sentences."),
("human", "{question}"),
])
chain = prompt | llm | StrOutputParser()
# Invoke (returns a string)
answer = chain.invoke({"question": "What is retrieval-augmented generation?"})
print(answer)Stream a chain
for token in chain.stream({"question": "Explain LLM temperature in plain English."}):
print(token, end="", flush=True)
print()Batch multiple inputs
questions = [
{"question": "What is a vector database?"},
{"question": "What is a transformer?"},
{"question": "What is fine-tuning?"},
]
answers = chain.batch(questions)
for q, a in zip(questions, answers):
print(f"Q: {q['question']}")
print(f"A: {a}\n")Step 4 — Complete working script
Save as cloudach_langchain.py and run with:
CLOUDACH_API_KEY=sk-cloudach-YOUR_KEY python cloudach_langchain.py
#!/usr/bin/env python3
"""Cloudach + LangChain integration demo.
Install:
pip install langchain langchain-openai
Run:
CLOUDACH_API_KEY=sk-cloudach-YOUR_KEY python cloudach_langchain.py
"""
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# ── Configure ───────────────────────────────────────────────────────────────
llm = ChatOpenAI(
model="llama3-70b",
openai_api_key=os.environ["CLOUDACH_API_KEY"],
openai_api_base="https://api.cloudach.com/v1",
temperature=0.7,
streaming=True,
)
# ── Build a chain ───────────────────────────────────────────────────────────
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Be concise and direct."),
("human", "{question}"),
])
chain = prompt | llm | StrOutputParser()
# ── Run it ───────────────────────────────────────────────────────────────────
questions = [
"What makes Llama 3 different from GPT-4?",
"Give me a Python one-liner to flatten a list of lists.",
"Explain tokens in 30 words.",
]
for q in questions:
print(f"\nQ: {q}\nA: ", end="", flush=True)
for token in chain.stream({"question": q}):
print(token, end="", flush=True)
print()
Available models
| Model ID | Context | Best for |
|---|---|---|
llama3-8b | 8K | Fast responses, high-volume pipelines |
llama3-70b | 8K | Complex reasoning, nuanced answers |
mistral-7b | 32K | Long documents, code generation |
mixtral-8x7b | 32K | Highest accuracy, complex tasks |
What's next
- LlamaIndex integration — use Cloudach in RAG pipelines and query engines
- Rate limits — plan your retry logic
- SDK compatibility — other frameworks that work with Cloudach
- support@cloudach.com — questions or feedback