Qwen-Image-Edit-2511 - SGLang Documentation

1. Model Introduction

Qwen-Image-Edit-2511 is an enhanced version over Qwen-Image-Edit-2509, featuring multiple improvements—including notably better consistency. Built upon the 20B Qwen-Image model, Qwen-Image-Edit-2511 successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing. Key Enhancements in Qwen-Image-Edit-2511:

Mitigate Image Drift: Reduces unwanted changes in non-edited regions of the image.
Improved Character Consistency: The model can perform imaginative edits based on an input portrait while preserving the identity and visual characteristics of the subject.
Multi-Person Consistency: Enhanced consistency in multi-person group photos, enabling high-fidelity fusion of two separate person images into a coherent group shot.
Integrated LoRA Capabilities: Selected popular community-created LoRAs are integrated directly into the base model, unlocking their effects without extra tuning (e.g., lighting enhancement, viewpoint generation).
Enhanced Industrial Design Generation: Special attention to practical engineering scenarios, including batch industrial product design and material replacement for industrial components.
Strengthened Geometric Reasoning: Stronger geometric reasoning capability for generating auxiliary construction lines for design or annotation purposes.

For more details, please refer to the official Qwen-Image-Edit-2511 HuggingFace page, the Blog, and the Tech Report.

2. SGLang-diffusion Installation

SGLang-diffusion offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang-diffusion installation guide for installation instructions.

3. Model Deployment

This section provides deployment configurations optimized for different hardware platforms and use cases.

3.1 Basic Configuration

Qwen-Image-Edit-2511 is a 20B parameter model optimized for image editing tasks. The recommended launch configurations vary by hardware. Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform.

3.2 Configuration Tips

Current supported optimization all listed here.

--vae-path: Path to a custom VAE model or HuggingFace model ID (e.g., fal/FLUX.2-Tiny-AutoEncoder). If not specified, the VAE will be loaded from the main model path.
--num-gpus: Number of GPUs to use
--tp-size: Tensor parallelism size (only for the encoder; should not be larger than 1 if text encoder offload is enabled, as layer-wise offload plus prefetch is faster)
--sp-degree: Sequence parallelism size (typically should match the number of GPUs)
--ulysses-degree: The degree of DeepSpeed-Ulysses-style SP in USP
--ring-degree: The degree of ring attention-style SP in USP

4. API Usage

For complete API documentation, please refer to the official API usage guide.

4.1 Edit an Image

Example

import base64
from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:3000/v1")

response = client.images.edit(
    model="Qwen/Qwen-Image-Edit-2511",
    image=open("input.png", "rb"),
    prompt="Change the color of the taxi to black.",
    n=1,
    response_format="b64_json",
)

# Save the edited image
image_bytes = base64.b64decode(response.data[0].b64_json)
with open("output.png", "wb") as f:
    f.write(image_bytes)

4.2 Advanced Usage

4.2.1 Cache-DiT Acceleration

SGLang integrates Cache-DiT, a caching acceleration engine for Diffusion Transformers (DiT), to achieve up to 7.4x inference speedup with minimal quality loss. You can set SGLANG_CACHE_DIT_ENABLED=True to enable it. For more details, please refer to the SGLang Cache-DiT documentation. Basic Usage

Command

SGLANG_CACHE_DIT_ENABLED=true sglang serve --model-path Qwen/Qwen-Image-Edit-2511

Advanced Usage

DBCache Parameters: DBCache controls block-level caching behavior:

Parameter	Env Variable	Default	Description
Fn	`SGLANG_CACHE_DIT_FN`	1	Number of first blocks to always compute
Bn	`SGLANG_CACHE_DIT_BN`	0	Number of last blocks to always compute
W	`SGLANG_CACHE_DIT_WARMUP`	4	Warmup steps before caching starts
R	`SGLANG_CACHE_DIT_RDT`	0.24	Residual difference threshold
MC	`SGLANG_CACHE_DIT_MC`	3	Maximum continuous cached steps

TaylorSeer Configuration: TaylorSeer improves caching accuracy using Taylor expansion:

Parameter	Env Variable	Default	Description
Enable	`SGLANG_CACHE_DIT_TAYLORSEER`	false	Enable TaylorSeer calibrator
Order	`SGLANG_CACHE_DIT_TS_ORDER`	1	Taylor expansion order (1 or 2)

Combined Configuration Example:

Command

SGLANG_CACHE_DIT_ENABLED=true \
SGLANG_CACHE_DIT_FN=2 \
SGLANG_CACHE_DIT_BN=1 \
SGLANG_CACHE_DIT_WARMUP=4 \
SGLANG_CACHE_DIT_RDT=0.4 \
SGLANG_CACHE_DIT_MC=4 \
SGLANG_CACHE_DIT_TAYLORSEER=true \
SGLANG_CACHE_DIT_TS_ORDER=2 \
sglang serve --model-path Qwen/Qwen-Image-Edit-2511

4.2.2 CPU Offload

--dit-cpu-offload: Use CPU offload for DiT inference. Enable if run out of memory.
--text-encoder-cpu-offload: Use CPU offload for text encoder inference.
--image-encoder-cpu-offload: Use CPU offload for image encoder inference.
--vae-cpu-offload: Use CPU offload for VAE.
--pin-cpu-memory: Pin memory for CPU offload. Only added as a temp workaround if it throws “CUDA error: invalid argument”.

5. Benchmark

Test Environment:

Hardware: NVIDIA B200 GPU (1x)
Model: Qwen/Qwen-Image-Edit-2511
sglang diffusion version: 0.5.6.post2

5.1 Speedup Benchmark

5.1.1 Edit a image

Server Command:

Output

sglang serve --model-path Qwen/Qwen-Image-Edit-2511 --port 30000

Benchmark Command:

Output

python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-image --dataset vbench --task ti2i --num-prompts 1 --max-concurrency 1

Result:

Output

================= Serving Benchmark Result =================
Backend:                                 sglang-image
Model:                                   Qwen/Qwen-Image-Edit-2511
Dataset:                                 vbench
Task:                                    ti2i
--------------------------------------------------
Benchmark duration (s):                  35.31
Request rate:                            inf
Max request concurrency:                 1
Successful requests:                     1/1
--------------------------------------------------
Request throughput (req/s):              0.03
Latency Mean (s):                        35.3053
Latency Median (s):                      35.3053
Latency P99 (s):                         35.3053
--------------------------------------------------
Peak Memory Max (MB):                    47959.35
Peak Memory Mean (MB):                   47959.35
Peak Memory Median (MB):                 47959.35
============================================================

5.1.2 Edit a image with high concurrency

Benchmark Command:

Output

python3 -m sglang.multimodal_gen.benchmarks.bench_serving \
    --backend sglang-image --dataset vbench --task ti2i --num-prompts 20 --max-concurrency 20

Result:

Output

================= Serving Benchmark Result =================
Backend:                                 sglang-image
Model:                                   Qwen/Qwen-Image-Edit-2511
Dataset:                                 vbench
Task:                                    ti2i
--------------------------------------------------
Benchmark duration (s):                  286.11
Request rate:                            inf
Max request concurrency:                 20
Successful requests:                     20/20
--------------------------------------------------
Request throughput (req/s):              0.07
Latency Mean (s):                        150.0428
Latency Median (s):                      150.0600
Latency P99 (s):                         283.3843
--------------------------------------------------
Peak Memory Max (MB):                    47971.82
Peak Memory Mean (MB):                   47971.49
Peak Memory Median (MB):                 47971.29
============================================================

Cookbook

​1. Model Introduction

​2. SGLang-diffusion Installation

​3. Model Deployment

​3.1 Basic Configuration

​3.2 Configuration Tips

​4. API Usage

​4.1 Edit an Image

​4.2 Advanced Usage

​4.2.1 Cache-DiT Acceleration

​4.2.2 CPU Offload

​5. Benchmark

​5.1 Speedup Benchmark

​5.1.1 Edit a image

​5.1.2 Edit a image with high concurrency

1. Model Introduction

2. SGLang-diffusion Installation

3. Model Deployment

3.1 Basic Configuration

3.2 Configuration Tips

4. API Usage

4.1 Edit an Image

4.2 Advanced Usage

4.2.1 Cache-DiT Acceleration

4.2.2 CPU Offload

5. Benchmark

5.1 Speedup Benchmark

5.1.1 Edit a image

5.1.2 Edit a image with high concurrency