Usage-based.
No surprises.
Start free and pay only for what you use. Every plan runs on the same production-grade infrastructure — pick the tier that fits your scale.
Free
For developers exploring and building side projects.
- 1M tokens included per month
- 1 active deployment
- Shared GPU infrastructure
- OpenAI-compatible API
- 3 API keys with rate limits
- Community support
No credit card required
Pro
For production apps and growing teams that need more.
- 25M tokens included per month
- 10 active deployments
- Dedicated GPU bursting + fine-tuning
- Per-key spend caps + cost tagging
- Request log viewer (30-day retention)
- 50 GB model storage
- Priority support
14-day free trial — no card required
Business
For production workloads with SLA and scale requirements.
- 250M tokens included per month
- Unlimited deployments
- 99.9% uptime SLA with service credits
- A/B fine-tune traffic split
- Request log viewer (90-day retention)
- Team-level budget rollup
- Dedicated solutions engineer
Per-model token rates
Pro and Business plans get progressively lower rates as your volume commitment grows.
| Model | Provider | Context | Available on | Free ($/M tokens) | Pro ($/M tokens) | Business ($/M tokens) |
|---|---|---|---|---|---|---|
| Phi-3 Mini | Microsoft | 4K | Free+ | $0.12 | $0.09 | $0.06 |
| Mistral 7B | Mistral AI | 32K | Free+ | $0.18 | $0.13 | $0.09 |
| Llama 3.1 8B | Meta | 128K | Free+ | $0.20 | $0.15 | $0.10 |
| DeepSeek R1 7B Distill | DeepSeek | 64K | Free+ | $0.24 | $0.18 | $0.13 |
| CodeLlama 13B | Meta | 16K | Free+ | $0.25 | $0.18 | $0.13 |
| Mixtral 8×7B | Mistral AI | 32K | Pro+ | $0.55 | $0.40 | $0.30 |
| Llama 3.1 70B | Meta | 128K | Pro+ | $0.85 | $0.65 | $0.45 |
| Qwen 2.5 72B | Alibaba | 128K | Pro+ | $0.85 | $0.65 | $0.45 |
Rates are per million tokens (combined input + output). Reasoning models are priced with split input/output rates. Custom pricing for 6B+ tokens/month — contact sales.
See what you'd pay
Drag the slider to estimate your monthly bill based on token volume.
Estimate your monthly cost
Adjust the sliders to see a real-time cost estimate for your usage.
Estimates are approximate. Actual billing is based on metered token usage. See billing docs →