Supported model inventory
Pass theHugging Face Model ID to --model-path for sglang generate or sglang serve. Python API users can pass the same ID to SGLang Diffusion model-loading helpers.
Missing checkpoint aliases do not imply that a model family is unsupported. The runtime registry may also accept detector-based aliases or local model directories that match the same family.
Rows are grouped when a family shares the same runtime path or optimization support. Use the detailed matrix below when you need per-optimization compatibility.
- Image
- Video
- Realtime / World
| Model family | Model IDs |
|---|---|
| FLUX | black-forest-labs/FLUX.1-devblack-forest-labs/FLUX.2-devblack-forest-labs/FLUX.2-dev-NVFP4black-forest-labs/FLUX.2-klein-4Bblack-forest-labs/FLUX.2-klein-9Bblack-forest-labs/FLUX.2-klein-base-4Bblack-forest-labs/FLUX.2-klein-base-9B |
| Z-Image | Tongyi-MAI/Z-ImageTongyi-MAI/Z-Image-Turbo |
| Qwen-Image | Qwen/Qwen-ImageQwen/Qwen-Image-2512Qwen/Qwen-Image-EditQwen/Qwen-Image-Edit-2509Qwen/Qwen-Image-Edit-2511Qwen/Qwen-Image-Layered |
| SD3 / SD3.5 | stabilityai/stable-diffusion-3-mediumstabilityai/stable-diffusion-3-medium-diffusersstabilityai/stable-diffusion-3.5-mediumstabilityai/stable-diffusion-3.5-medium-diffusersstabilityai/stable-diffusion-3.5-largestabilityai/stable-diffusion-3.5-large-diffusers |
| SANA | Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusersEfficient-Large-Model/SANA1.5_4.8B_1024px_diffusersEfficient-Large-Model/Sana_1600M_1024px_diffusersEfficient-Large-Model/Sana_600M_1024px_diffusersEfficient-Large-Model/Sana_1600M_512px_diffusersEfficient-Large-Model/Sana_600M_512px_diffusers |
| FireRed-Image | FireRedTeam/FireRed-Image-Edit-1.0FireRedTeam/FireRed-Image-Edit-1.1 |
| JoyAI-Image | jdopensource/JoyAI-Image-Edit-Diffusers |
| Other image pipelines | zai-org/GLM-Imagetencent/Hunyuan3D-2baidu/ERNIE-Imagebaidu/ERNIE-Image-Turboideogram-ai/ideogram-4-fp8ideogram-ai/ideogram-4-nf4Comfy-Org/Ideogram-4 |
Wan2.2 TI2V 5B currently has known quality issues when used for I2V generation.
Optimization compatibility
The detailed video matrix uses these symbols:- ✅ = Full compatibility
- ❌ = No compatibility
- ⭕ = Does not apply to this model
Detailed video optimization matrix
Detailed video optimization matrix
Video Generation Models
Optimization columns are abbreviated to keep the matrix readable:Tea= TeaCacheTile= Sliding Tile AttentionSage= Sage AttentionVSA= Video Sparse AttentionSLA= Sparse Linear AttentionSageSLA= Sage Sparse Linear AttentionSVG2= Sparse Video Gen 2LA= Laser AttentionBSA= Block Sparse AttentionRF= Rain Fusion Attention
| Model Name | Hugging Face Model ID | Resolution | Tea | Tile | Sage | VSA | SLA | SageSLA | SVG2 | LA | BSA | RF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FastWan2.1 T2V 1.3B | FastVideo/FastWan2.1-T2V-1.3B-Diffusers | 480p | ⭕ | ⭕ | ⭕ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| FastWan2.2 TI2V 5B | FastVideo/FastWan2.2-TI2V-5B-FullAttn-DiffusersFastVideo/FastWan2.2-TI2V-5B-Diffusers | 720p | ⭕ | ⭕ | ⭕ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Wan2.2 TI2V 5B | Wan-AI/Wan2.2-TI2V-5B-Diffusers | 720p | ⭕ | ⭕ | ✅ | ⭕ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| Wan2.2 T2V A14B | Wan-AI/Wan2.2-T2V-A14B-Diffusersnvidia/Wan2.2-T2V-A14B-Diffusers-NVFP4 | 480p 720p | ❌ | ❌ | ✅ | ⭕ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| Wan2.2 I2V A14B | Wan-AI/Wan2.2-I2V-A14B-Diffusers | 480p 720p | ❌ | ❌ | ✅ | ⭕ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ |
| HunyuanVideo | hunyuanvideo-community/HunyuanVideo | 720×1280 544×960 | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
| FastHunyuan | FastVideo/FastHunyuan-diffusers | 720×1280 544×960 | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
| Wan2.1 T2V 1.3B | Wan-AI/Wan2.1-T2V-1.3B-Diffusers | 480p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Wan2.1 T2V 14B | Wan-AI/Wan2.1-T2V-14B-Diffusers | 480p, 720p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Wan2.1 I2V 480P | Wan-AI/Wan2.1-I2V-14B-480P-Diffusers | 480p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
| Wan2.1 I2V 720P | Wan-AI/Wan2.1-I2V-14B-720P-Diffusers | 720p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
| TurboWan2.1 T2V 1.3B | IPostYellow/TurboWan2.1-T2V-1.3B-Diffusers | 480p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ❌ |
| TurboWan2.1 T2V 14B | IPostYellow/TurboWan2.1-T2V-14B-Diffusers | 480p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ❌ |
| TurboWan2.1 T2V 14B 720P | IPostYellow/TurboWan2.1-T2V-14B-720P-Diffusers | 720p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ❌ |
| TurboWan2.2 I2V A14B | IPostYellow/TurboWan2.2-I2V-A14B-Diffusers | 720p | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ⭕ | ❌ | ❌ | ❌ |
| Wan2.1 Fun 1.3B InP | weizhou03/Wan2.1-Fun-1.3B-InP-Diffusers | 480p | ✅ | ✅ | ✅ | ⭕ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ |
| Helios Base | BestWishYsh/Helios-Base | 720p | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Helios Mid | BestWishYsh/Helios-Mid | 720p | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Helios Distilled | BestWishYsh/Helios-Distilled | 720p | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| LTX-2 (one/two-stage/TI2V) | Lightricks/LTX-2 | 768×512 1536×1024 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| LTX-2.3 (one/two-stage/TI2V/HQ) | Lightricks/LTX-2.3 | 768×512 1536×1024 1920×1088 (HQ default) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Cosmos3-Nano (T2V / I2V / T2I) | nvidia/Cosmos3-Nano | 720p · 480p 1024×1024 (T2I) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Cosmos3-Super (T2V / I2V / T2I) | nvidia/Cosmos3-Supernvidia/Cosmos3-Super-Text2Imagenvidia/Cosmos3-Super-Image2Video | 720p · 480p 1024×1024 (T2I) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
- Wan2.2 TI2V 5B has some quality issues when performing I2V generation. We are working on fixing this issue.
- SageSLA is based on SpargeAttn. Install it first with
pip install git+https://github.com/thu-ml/SpargeAttn.git --no-build-isolation - LTX pipeline selection:
- One-stage:
--pipeline-class-name LTX2Pipeline - Two-stage:
--pipeline-class-name LTX2TwoStagePipeline - Two-stage HQ:
--pipeline-class-name LTX2TwoStageHQPipeline(HQ defaults to 1920×1088; you can still override--width/--height) - LTX-2 and LTX-2.3 support both T2V and TI2V (
--image-path) on one-stage and two-stage pipelines (including HQ). - The spatial upsampler and distilled LoRA are auto-resolved from the model snapshot by default, and can still be overridden with
--spatial-upsampler-pathand--distilled-lora-path. - For LTX models, the
Resolutionscolumn uses output videowidth×heightsemantics, matchingsglang generate --width ... --height ....
- One-stage:
- LTX-2 / LTX-2.3 two-stage also supports
--ltx2-two-stage-device-mode {original,snapshot,resident}:snapshotis the default and recommended mode.residentusually provides the best latency/throughput but uses much more VRAM.originalkeeps official two-stage semantics without the premerged stage-2 transformer path.- Example (one prior run):
original154.67s,snapshot114.05s,resident75.71s; peak VRAM trend isoriginal < snapshot < resident.
- Cosmos3 ships in two sizes —
nvidia/Cosmos3-Nano(8B) andnvidia/Cosmos3-Super(32B). Both share the same pipeline; the only difference is transformer depth and width, picked up fromtransformer/config.jsonat load time. A single checkpoint serves T2V, I2V (--image-path), and T2I (--num-frames 1).
Supported Components
SGLang Diffusion supports overriding individual pipeline components with--<component>-path. The value can be either a Hugging Face repo ID or a local
component directory.
The same overrides can also be provided in config files through
component_paths.<component>.
Common Syntax
CLI:Command
Config
model_index.json or the native pipeline’s registered module name:
| Component Type | Supported Keys | Notes |
|---|---|---|
| VAE | vae, video_vae, audio_vae | vae is the common image-generation override |
| Transformer / DiT | transformer, video_dit, audio_dit | transformer is the standard override for the main denoiser |
| Text / Preprocess | text_encoder, text_encoder_2, tokenizer, processor, image_processor | Replacement encoders often need matching preprocessing assets |
| Auxiliary | scheduler, spatial_upsampler, vocoder, connectors, dual_tower_bridge, image_encoder, vision_language_encoder | Only valid for pipelines that expose these components |
Known Component Repos
The table below lists concrete Hugging Face component repos that are already used in SGLang Diffusion docs or tests. It is not an exhaustive catalog of all compatible component repos.| Base Model | Override Key | Example Repo | Notes |
|---|---|---|---|
black-forest-labs/FLUX.2-dev | vae | black-forest-labs/FLUX.2-small-decoder | Decoder-only FLUX.2 VAE override |
black-forest-labs/FLUX.2-dev | vae | fal/FLUX.2-Tiny-AutoEncoder | Existing tested custom VAE path |
VAE
--vae-pathis the common image-generation override.--video-vae-pathand--audio-vae-pathare only relevant for pipelines with separate video or audio VAEs.
Transformer / DiT
--transformer-pathis the standard override for the main denoising transformer.- For quantized transformers, prefer
--transformer-pathor--transformer-weights-path; seequantization.md. --video-dit-pathand--audio-dit-pathare only for pipelines that split denoisers by modality.
Text Encoders and Preprocessors
--text-encoder-pathand--text-encoder-2-pathoverride primary and secondary text encoders.--tokenizer-path,--processor-path, and--image-processor-pathare useful when the replacement encoder requires matching preprocessing assets.
Auxiliary Components
--scheduler-pathis only relevant when the pipeline exposes a scheduler component.--spatial-upsampler-pathis mainly for two-stage pipelines such asLTX2TwoStagePipeline.--vocoder-path,--connectors-path,--dual-tower-bridge-path,--image-encoder-path, and--vision-language-encoder-pathare only valid for pipelines that expose those components.
Notes
- Component overrides are only valid when the target pipeline actually uses that component.
- The override key should match the component name in the pipeline’s
model_index.jsonor the native pipeline’s registered module name.
Verified LoRA Examples
This section lists example LoRAs that have been explicitly tested and verified with each base model in the SGLang Diffusion pipeline.LoRAs that are not listed here are not necessarily incompatible.
In practice, most standard LoRAs are expected to work, especially those following common Diffusers or SD-style conventions.
The entries below simply reflect configurations that have been manually validated by the SGLang team.
Verified LoRAs by Base Model
| Base Model | Supported LoRAs |
|---|---|
| Wan2.2 | lightx2v/Wan2.2-Distill-LorasCseti/wan2.2-14B-Arcane_Jinx-lora-v1 |
| Wan2.1 | lightx2v/Wan2.1-Distill-Loras |
| Z-Image-Turbo | tarn59/pixel_art_style_lora_z_image_turbowcde/Z-Image-Turbo-DeJPEG-Lora |
| Qwen-Image | lightx2v/Qwen-Image-Lightningflymy-ai/qwen-image-realism-loraprithivMLmods/Qwen-Image-HeadshotXstarsfriday/Qwen-Image-EVA-LoRA |
| Qwen-Image-Edit | ostris/qwen_image_edit_inpaintinglightx2v/Qwen-Image-Edit-2511-Lightning |
| Flux | dvyio/flux-lora-simple-illustrationXLabs-AI/flux-furry-loraXLabs-AI/flux-RealismLora |
Special requirements
Sliding Tile Attention
- Currently, only Hopper GPUs (H100s) are supported.
