Fine-tune Llama 3 on your own data
In this tutorial you'll fine-tune Llama 3 8B on a customer support dataset using LoRA. By the end you'll have a custom model that replies in your brand's voice, deployed and ready for inference on Cloudach's API.
Overview
What you'll do:
- Prepare a JSONL training dataset with chat-format examples
- Upload the dataset to Cloudach
- Launch a LoRA fine-tuning job on Llama 3 8B
- Monitor training loss in real time
- Run inference on your deployed custom model
- Evaluate the results against the base model
Fine-tuning with LoRA trains only a small set of adapter weights (≈ 0.1–1% of model parameters) rather than the full model. This is faster, cheaper, and often achieves equal or better results for domain adaptation tasks. Cloudach serves the adapter on top of the base model using vLLM, so you get the same sub-100ms latency as the base model.
Prerequisites
- A Cloudach account and API key (sign up)
- Python 3.9+ or a terminal with
curl - At least 100 training examples in JSONL format (we provide a sample below)
Step 1 — Prepare your dataset
Training data must be a .jsonl file where each line is a JSON object with a messages array. The format is identical to the OpenAI fine-tuning format.
Example line
{"messages": [
{"role": "system", "content": "You are a helpful support agent for Acme Corp."},
{"role": "user", "content": "How do I reset my password?"},
{"role": "assistant", "content": "Go to Settings → Security → Reset Password. You'll receive a reset link by email within 2 minutes."}
]}Download our sample dataset
We provide a 50-example customer support dataset to use as a starting point or to test the workflow end-to-end before using your own data.
# Download the Cloudach sample dataset (50 customer-support examples) curl -L https://raw.githubusercontent.com/cloudach/examples/main/fine-tuning/sample_dataset.jsonl \ -o training_data.jsonl
Validate your file
Run this quick check before uploading to catch format errors:
python3 - <<'EOF'
import json, sys
errors = 0
with open("training_data.jsonl") as f:
for i, line in enumerate(f, 1):
try:
obj = json.loads(line)
assert "messages" in obj
roles = [m["role"] for m in obj["messages"]]
assert "user" in roles and "assistant" in roles
except Exception as e:
print(f"Line {i}: {e}")
errors += 1
print(f"Done. {errors} error(s).")
EOFStep 2 — Upload your dataset
Upload the JSONL file to get a dataset ID to reference in the fine-tuning job.
curl https://api.cloudach.com/v1/fine-tuning/datasets \ -H "Authorization: Bearer $CLOUDACH_API_KEY" \ -F "file=@training_data.jsonl" \ -F "purpose=fine-tune"
The response looks like:
{
"id": "ds-8f3a2b1c",
"object": "dataset",
"filename": "training_data.jsonl",
"bytes": 142891,
"line_count": 50,
"status": "processed"
}Save the id field (e.g. ds-8f3a2b1c) — you'll need it in the next step.
Step 3 — Launch the fine-tuning job
Create a LoRA fine-tuning job targeting llama3-8b. Replace ds-8f3a2b1c with your dataset ID.
curl https://api.cloudach.com/v1/fine-tuning/jobs \
-H "Authorization: Bearer $CLOUDACH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3-8b",
"training_file": "ds-8f3a2b1c",
"method": {
"type": "lora",
"lora": { "rank": 16, "alpha": 32, "dropout": 0.05 }
},
"hyperparameters": {
"n_epochs": 3,
"batch_size": 16
},
"suffix": "support-bot"
}'Response:
{
"id": "ftjob-a1b2c3",
"object": "fine_tuning.job",
"model": "llama3-8b",
"status": "queued",
"created_at": 1712001200,
"estimated_finish": null,
"fine_tuned_model": null
}Save the job id (e.g. ftjob-a1b2c3). The job enters the queued state and starts within a few minutes.
Key parameters explained
| Parameter | Our value | What it does |
|---|---|---|
lora.rank | 16 | Controls adapter capacity. Higher = more expressive, higher cost. 16 is a good default. |
lora.alpha | 32 | Scaling factor (typically 2× rank). |
n_epochs | 3 | Training passes over the dataset. Start with 3; increase if loss is still falling. |
batch_size | 16 | Examples per gradient update. Larger = faster but uses more GPU memory. |
suffix | support-bot | Appended to the deployed model ID so you can identify it. |
Step 4 — Monitor training
Poll the job endpoint to see status, training loss, and estimated finish time. Training a 50-example dataset typically takes 3–5 minutes; larger datasets scale linearly.
# Poll every 30 seconds curl https://api.cloudach.com/v1/fine-tuning/jobs/ftjob-a1b2c3 \ -H "Authorization: Bearer $CLOUDACH_API_KEY"
{
"id": "ftjob-a1b2c3",
"status": "running",
"trained_tokens": 48200,
"estimated_finish": 1712003600,
"fine_tuned_model": null,
"events": [
{ "step": 0, "train_loss": 2.38, "train_mean_token_accuracy": 0.44 },
{ "step": 50, "train_loss": 1.62, "train_mean_token_accuracy": 0.63 },
{ "step": 100, "train_loss": 1.14, "train_mean_token_accuracy": 0.74 }
]
}When status changes to succeeded, the fine_tuned_model field will contain your deployed model ID.
Stream live events
For a real-time loss curve, stream events instead of polling:
curl "https://api.cloudach.com/v1/fine-tuning/jobs/ftjob-a1b2c3/events?stream=true" \ -H "Authorization: Bearer $CLOUDACH_API_KEY"
train_loss falling from ~2.0–2.5 in step 0 down to 0.5–1.2 by the final step. If it plateaus above 1.5, try more epochs or a higher LoRA rank. If it drops below 0.2, you may be overfitting — add more diverse examples.Step 5 — Run inference on your model
Once status is succeeded, use the fine_tuned_model ID exactly like any other model. No code changes needed beyond swapping the model ID.
# Replace with your fine_tuned_model value
curl https://api.cloudach.com/v1/chat/completions \
-H "Authorization: Bearer $CLOUDACH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3-8b:ft:support-bot:ftjob-a1b2c3",
"messages": [
{"role": "system", "content": "You are a helpful support agent for Acme Corp."},
{"role": "user", "content": "What is your return policy?"}
]
}'The response is identical to a standard chat completion:
{
"id": "chatcmpl-xyz",
"object": "chat.completion",
"model": "llama3-8b:ft:support-bot:ftjob-a1b2c3",
"choices": [{
"message": {
"role": "assistant",
"content": "We accept returns within 30 days of purchase. Items must be unopened and in original packaging. To start a return, email support@acme.com with your order number."
},
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 42, "completion_tokens": 38, "total_tokens": 80 }
}Step 6 — Evaluate results
Compare your fine-tuned model against the base model on a held-out test set:
import openai, json
client = openai.OpenAI(
api_key="YOUR_CLOUDACH_API_KEY",
base_url="https://api.cloudach.com/v1"
)
test_questions = [
"How do I cancel my subscription?",
"I was charged twice, what do I do?",
"Do you have a free trial?",
]
for q in test_questions:
base = client.chat.completions.create(
model="llama3-8b",
messages=[{"role": "user", "content": q}]
)
finetuned = client.chat.completions.create(
model="llama3-8b:ft:support-bot:ftjob-a1b2c3",
messages=[
{"role": "system", "content": "You are a helpful support agent for Acme Corp."},
{"role": "user", "content": q}
]
)
print(f"Q: {q}")
print(f"Base: {base.choices[0].message.content[:120]}")
print(f"Fine-tuned: {finetuned.choices[0].message.content[:120]}")
print()Things to look for:
- Tone: Does the model respond in your brand's voice?
- Factual accuracy: Does it cite correct product details?
- Format: Does it follow your preferred response structure?
- Hallucination rate: Does it make up information less than the base model?
- Refusal rate: Does it handle out-of-scope questions gracefully?
If results are not satisfactory, the most effective improvements are: (1) adding more diverse training examples, (2) increasing n_epochs to 5, (3) increasing lora.rank to 32 or 64 for more complex tasks.
Next steps
Full documentation for all fine-tuning endpoints, parameters, and error codes.
Best practices for collecting, cleaning, and formatting training data.
In-depth discussion of LoRA, base model selection, and real-world fine-tuning results.