Prerequisites
You don’t need the Ollama server installed - SGLang acts as the backend. You only need the
ollama CLI or Python library as the client.Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ | GET, HEAD | Health check for Ollama CLI |
/api/tags | GET | List available models |
/api/chat | POST | Chat completions (streaming & non-streaming) |
/api/generate | POST | Text generation (streaming & non-streaming) |
/api/show | POST | Model information |
Quick Start
1. Launch SGLang Server
The model name used with
ollama run must match exactly what you passed to --model.2. Use Ollama CLI
3. Use Ollama Python Library
Example
Smart Router
For intelligent routing between local Ollama (fast) and remote SGLang (powerful) using an LLM judge, see the Smart Router documentation.Summary
| Component | Purpose |
|---|---|
| Ollama API | Familiar CLI/API that developers already know |
| SGLang Backend | High-performance inference engine |
| Smart Router | Intelligent routing - fast local for simple tasks, powerful remote for complex tasks |
