SGLang provides two complementary caching strategies for Diffusion Transformer (DiT) models. Both reduce denoising cost by skipping redundant computation, but they operate at different levels.Documentation Index
Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
SGLang supports two complementary caching approaches:| Strategy | Scope | Mechanism | Best For |
|---|---|---|---|
| Cache-DiT | Block-level | Skip individual transformer blocks dynamically | Advanced, higher speedup |
| TeaCache | Timestep-level | Skip entire denoising steps based on L1 similarity | Simple, built-in |
Cache-DiT
Cache-DiT provides block-level caching with advanced strategies like DBCache and TaylorSeer. It can achieve up to 1.69x speedup. See cache_dit.md for detailed configuration.Quick Start
Key Features
- DBCache: Dynamic block-level caching based on residual differences
- TaylorSeer: Taylor expansion-based calibration for optimized caching
- SCM: Step-level computation masking for additional speedup
TeaCache
TeaCache (Temporal similarity-based caching) accelerates diffusion inference by detecting when consecutive denoising steps are similar enough to skip computation entirely. See teacache.md for detailed documentation.Quick Overview
- Tracks L1 distance between modulated inputs across timesteps
- When accumulated distance is below threshold, reuses cached residual
- Supports CFG with separate positive/negative caches
Supported Models
- Wan (wan2.1, wan2.2)
- Hunyuan (HunyuanVideo)
- Z-Image
