Skip to main content
The SGLang Diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as dynamic LoRA adapter management.

Prerequisites

  • Python 3.11+ if you plan to use the OpenAI Python SDK.
  • A running SGLang Diffusion server (see the CLI reference for launch instructions).

Start the server

SERVER_ARGS=(
  --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
  --text-encoder-cpu-offload
  --pin-cpu-memory
  --num-gpus 4
  --ulysses-degree=2
  --ring-degree=2
  --port 30010
)

sglang serve "${SERVER_ARGS[@]}"
  • --model-path — path to the model or HuggingFace model ID
  • --port — HTTP port to listen on (default: 30000)

Get model information

Endpoint: GET /models Returns model path, task type, pipeline configuration, and precision settings.
curl -sS -X GET "http://localhost:30010/models"
Response:
{
  "model_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
  "task_type": "T2V",
  "pipeline_name": "wan_pipeline",
  "pipeline_class": "WanPipeline",
  "num_gpus": 4,
  "dit_precision": "bf16",
  "vae_precision": "fp16"
}

Image generation

The server implements an OpenAI-compatible Images API under the /v1/images namespace.

Create an image

Endpoint: POST /v1/images/generations
import base64
from openai import OpenAI

client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

img = client.images.generate(
    prompt="A calico cat playing a piano on stage",
    size="1024x1024",
    n=1,
    response_format="b64_json",
)

image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
    f.write(image_bytes)
If response_format=url is used and cloud storage is not configured, the API returns a relative URL like /v1/images/<IMAGE_ID>/content.

Edit an image

Endpoint: POST /v1/images/edits Accepts a multipart form upload with input images and a text prompt. Returns either a base64-encoded image or a URL.
curl -sS -X POST "http://localhost:30010/v1/images/edits" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -F "image=@local_input_image.png" \
  -F "url=image_url.jpg" \
  -F "prompt=A calico cat playing a piano on stage" \
  -F "size=1024x1024" \
  -F "response_format=b64_json"

Download image content

When response_format=url is used, the API returns a relative URL like /v1/images/<IMAGE_ID>/content. Endpoint: GET /v1/images/{image_id}/content
curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -o output.png

Video generation

The server implements a subset of the OpenAI Videos API under the /v1/videos namespace.

Create a video

Endpoint: POST /v1/videos
from openai import OpenAI

client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

video = client.videos.create(
    prompt="A calico cat playing a piano on stage",
    size="1280x720"
)
print(f"Video ID: {video.id}, Status: {video.status}")

List videos

Endpoint: GET /v1/videos
videos = client.videos.list()
for item in videos.data:
    print(item.id, item.status)

Download video content

Endpoint: GET /v1/videos/{video_id}/content
import time

# Poll for completion
while True:
    page = client.videos.list()
    item = next((v for v in page.data if v.id == video_id), None)
    if item and item.status == "completed":
        break
    time.sleep(5)

# Download content
resp = client.videos.download_content(video_id=video_id)
with open("output.mp4", "wb") as f:
    f.write(resp.read())

LoRA management

The server supports dynamic loading, merging, and unmerging of LoRA adapters.
  • Mutual exclusion: Only one LoRA can be merged (active) at a time.
  • Switching: To switch LoRAs, you must first unmerge the current one, then set the new one.
  • Caching: The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has negligible cost.

Set LoRA adapter

Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters. Endpoint: POST /v1/set_lora Parameters:
ParameterTypeDescription
lora_nicknamestring or listA unique identifier for the LoRA adapter(s). Required
lora_pathstring or listPath to .safetensors file(s) or HuggingFace repo ID(s). Required for first load; optional when re-activating a cached nickname
targetstring or listWhich transformer(s) to apply the LoRA to: "all" (default), "transformer", "transformer_2", "critic"
strengthfloat or listLoRA strength for merge (default: 1.0). Values < 1.0 reduce the effect, > 1.0 amplify it
curl -X POST http://localhost:30010/v1/set_lora \
  -H "Content-Type: application/json" \
  -d '{
        "lora_nickname": "lora_name",
        "lora_path": "/path/to/lora.safetensors",
        "target": "all",
        "strength": 0.8
      }'
When using multiple LoRAs:
  • All list parameters (lora_nickname, lora_path, target, strength) must have the same length.
  • If target or strength is a single value, it will be applied to all LoRAs.
  • Multiple LoRAs applied to the same target will be merged in order.

Merge LoRA weights

Manually merges the currently set LoRA weights into the base model. Endpoint: POST /v1/merge_lora_weights
ParameterTypeDescription
targetstringWhich transformer(s) to merge: "all" (default), "transformer", "transformer_2", "critic"
strengthfloatLoRA strength for merge (default: 1.0)
curl -X POST http://localhost:30010/v1/merge_lora_weights \
  -H "Content-Type: application/json" \
  -d '{"strength": 0.8}'
set_lora automatically performs a merge, so this endpoint is typically only needed if you have manually unmerged but want to re-apply the same LoRA without calling set_lora again.

Unmerge LoRA weights

Unmerges the currently active LoRA weights from the base model, restoring it to its original state. Call this before setting a different LoRA. Endpoint: POST /v1/unmerge_lora_weights
curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
  -H "Content-Type: application/json"

List LoRA adapters

Returns loaded LoRA adapters and current application status per module. Endpoint: GET /v1/list_loras
curl -sS -X GET "http://localhost:30010/v1/list_loras"
Response:
{
  "loaded_adapters": [
    { "nickname": "lora_a", "path": "/weights/lora_a.safetensors" },
    { "nickname": "lora_b", "path": "/weights/lora_b.safetensors" }
  ],
  "active": {
    "transformer": [
      {
        "nickname": "lora2",
        "path": "tarn59/pixel_art_style_lora_z_image_turbo",
        "merged": true,
        "strength": 1.0
      }
    ]
  }
}

Example: switching LoRAs

  1. Set LoRA A
curl -X POST http://localhost:30010/v1/set_lora \
  -d '{"lora_nickname": "lora_a", "lora_path": "path/to/A"}'
  1. Generate with LoRA A
Run your image or video generation requests.
  1. Unmerge LoRA A
curl -X POST http://localhost:30010/v1/unmerge_lora_weights
  1. Set LoRA B
curl -X POST http://localhost:30010/v1/set_lora \
  -d '{"lora_nickname": "lora_b", "lora_path": "path/to/B"}'
  1. Generate with LoRA B
Run your image or video generation requests with the new adapter.

Output quality

Control output quality and compression for both image and video generation through the output-quality and output-compression parameters.

Parameters

ParameterTypeDescription
output-qualitystringPreset quality level. Default: "default"
output-compressionintegerDirect compression level override (0-100). When provided, takes precedence over output-quality
Quality presets:
PresetCompression value
"maximum"100
"high"90
"medium"55
"low"35
"default"Auto (50 for video, 75 for image)
  • When both output-quality and output-compression are provided, output-compression takes precedence.
  • Quality settings apply to JPEG and video formats. PNG uses lossless compression and ignores these settings.
  • Lower compression values (or "low" quality preset) produce smaller files but may show visible artifacts.