Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sglang.io/llms.txt

Use this file to discover all available pages before exploring further.

SGLang provides Ollama API compatibility, allowing you to use the Ollama CLI and Python library with SGLang as the inference backend.

Prerequisites

# Install the Ollama Python library (for Python client usage)
pip install ollama
You don’t need the Ollama server installed - SGLang acts as the backend. You only need the ollama CLI or Python library as the client.

Endpoints

EndpointMethodDescription
/GET, HEADHealth check for Ollama CLI
/api/tagsGETList available models
/api/chatPOSTChat completions (streaming & non-streaming)
/api/generatePOSTText generation (streaming & non-streaming)
/api/showPOSTModel information

Quick Start

1. Launch SGLang Server

python -m sglang.launch_server \
    --model Qwen/Qwen2.5-1.5B-Instruct \
    --port 30001 \
    --host 0.0.0.0
The model name used with ollama run must match exactly what you passed to --model.

2. Use Ollama CLI

# List available models
OLLAMA_HOST=http://localhost:30001 ollama list

# Interactive chat
OLLAMA_HOST=http://localhost:30001 ollama run "Qwen/Qwen2.5-1.5B-Instruct"
If connecting to a remote server behind a firewall:
# SSH tunnel
ssh -L 30001:localhost:30001 user@gpu-server -N &

# Then use Ollama CLI as above
OLLAMA_HOST=http://localhost:30001 ollama list

3. Use Ollama Python Library

Example
import ollama

client = ollama.Client(host='http://localhost:30001')

# Non-streaming
response = client.chat(
    model='Qwen/Qwen2.5-1.5B-Instruct',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)
print(response['message']['content'])

# Streaming
stream = client.chat(
    model='Qwen/Qwen2.5-1.5B-Instruct',
    messages=[{'role': 'user', 'content': 'Tell me a story'}],
    stream=True
)
for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Smart Router

For intelligent routing between local Ollama (fast) and remote SGLang (powerful) using an LLM judge, see the Smart Router documentation.

Summary

ComponentPurpose
Ollama APIFamiliar CLI/API that developers already know
SGLang BackendHigh-performance inference engine
Smart RouterIntelligent routing - fast local for simple tasks, powerful remote for complex tasks