SGLang Diffusion

SGLang Diffusion is an inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.

Key features

Broad model support: Wan series, FastWan series, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux, Z-Image, GLM-Image, and more
Fast inference: optimized kernels, efficient scheduler loop, and Cache-DiT acceleration
Ease of use: OpenAI-compatible API, CLI, and Python SDK
Multi-platform: NVIDIA GPUs (H100, H200, A100, B200, 4090), AMD GPUs (MI300X, MI325X), and Ascend NPU (A2, A3)

Quick start

Install SGLang Diffusion

uv pip install "sglang[diffusion]" --prerelease=allow

See the installation guide for more installation methods and ROCm-specific instructions.

Run a one-off generation

sglang generate --model-path Qwen/Qwen-Image \
  --prompt "A beautiful sunset over the mountains" \
  --save-output

Serve with the OpenAI-compatible API

sglang serve --model-path Qwen/Qwen-Image --port 30010

CLI quick reference

Generate (one-off generation)

sglang generate --model-path <MODEL> --prompt "<PROMPT>" --save-output

Serve (HTTP server)

sglang serve --model-path <MODEL> --port 30010

Enable Cache-DiT acceleration

SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path <MODEL> --prompt "<PROMPT>"

​Key features

​Quick start

​CLI quick reference

​Generate (one-off generation)

​Serve (HTTP server)

​Enable Cache-DiT acceleration

​References