This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.Documentation Index
Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
| Optimization | Type | Description |
|---|---|---|
| Cache-DiT | Caching | Block-level caching with DBCache, TaylorSeer, and SCM |
| TeaCache | Caching | Timestep-level caching based on temporal similarity |
| Attention Backends | Kernel | Optimized attention implementations (FlashAttention, SageAttention, etc.) |
| Inference Batching | Scheduler | Request batching for native diffusion serving |
| Profiling | Diagnostics | PyTorch Profiler and Nsight Systems guidance |
Start Here
- Use Attention Backends to choose the best backend for your model and hardware.
- Use Inference Batching to improve throughput for compatible concurrent requests.
- Use Caching Acceleration to reduce denoising cost with Cache-DiT or TeaCache.
- Use Profiling when you need to diagnose a bottleneck rather than guess.
Caching at a Glance
- Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
- TeaCache is timestep-level caching built into SGLang model families.
