Overview
SGLang supports two complementary caching approaches:| Strategy | Scope | Mechanism | Best For |
|---|---|---|---|
| Cache-DiT | Block-level | Skip individual transformer blocks dynamically | Advanced, higher speedup |
| TeaCache | Timestep-level | Skip entire denoising steps based on L1 similarity | Simple, built-in |
Cache-DiT
Cache-DiT provides block-level caching with advanced strategies like DBCache and TaylorSeer. It can achieve up to 1.69x speedup. See cache_dit.md for detailed configuration.Quick Start
Key Features
- DBCache: Dynamic block-level caching based on residual differences
- TaylorSeer: Taylor expansion-based calibration for optimized caching
- SCM: Step-level computation masking for additional speedup
TeaCache
TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely. See teacache.md for detailed documentation.Quick Overview
- Tracks L1 distance between modulated inputs across timesteps
- When accumulated distance is below threshold, reuses cached residual
- Uses separate positive/negative caches for supported CFG model families
Supported Models
- Wan2.1
- Z-Image
- Wan2.2: coefficients are not calibrated yet; enabling TeaCache is accepted but currently no-ops
- HunyuanVideo: not supported yet
