Overview
SGLang supports two complementary caching approaches:| Strategy | Scope | Mechanism | Best For |
|---|---|---|---|
| Cache-DiT | Block-level | Skip individual transformer blocks dynamically | Advanced, higher speedup |
| TeaCache | Timestep-level | Skip entire denoising steps based on L1 similarity | Simple, built-in |
Cache-DiT
Cache-DiT provides block-level caching with advanced strategies like DBCache and TaylorSeer. It can achieve up to 1.69x speedup. See Cache-DiT for detailed configuration.Quick Start
Key Features
- DBCache: Dynamic block-level caching based on residual differences
- TaylorSeer: Taylor expansion-based calibration for optimized caching
- SCM: Step-level computation masking for additional speedup
TeaCache
TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely. See TeaCache for detailed documentation.Quick Overview
- Tracks L1 distance between modulated inputs across timesteps
- When accumulated distance is below threshold, reuses cached residual
- Supports CFG with separate positive/negative caches
Supported Models
- Wan (wan2.1, wan2.2)
- Hunyuan (HunyuanVideo)
- Z-Image
