Performance Optimization - SGLang Documentation

This section covers the main performance levers for SGLang Diffusion: attention backends, caching acceleration, and profiling.

Overview

Optimization	Type	Description
Cache-DiT	Caching	Block-level caching with DBCache, TaylorSeer, and SCM
TeaCache	Caching	Timestep-level caching based on temporal similarity
Attention Backends	Kernel	Optimized attention implementations (FlashAttention, SageAttention, etc.)
Inference Batching	Scheduler	Request batching for native diffusion serving
Profiling	Diagnostics	PyTorch Profiler and Nsight Systems guidance

Start Here

Use Attention Backends to choose the best backend for your model and hardware.
Use Inference Batching to improve throughput for compatible concurrent requests.
Use Caching Acceleration to reduce denoising cost with Cache-DiT or TeaCache.
Use Profiling when you need to diagnose a bottleneck rather than guess.

Caching at a Glance

Cache-DiT is block-level caching for diffusers pipelines and higher speedup-oriented tuning.
TeaCache is timestep-level caching built into SGLang model families.

Current Baseline Snapshot

For Ring SP benchmark details, see:

Ring SP Performance

References

Post-Processing

Ring SP Benchmark: Wan2.2-TI2V-5B (u1r2 vs Baseline)

⌘I