OpenAI API - SGLang Documentation

The SGLang Diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as dynamic LoRA adapter management.

Prerequisites

Python 3.11+ if you plan to use the OpenAI Python SDK.
A running SGLang Diffusion server (see the CLI reference for launch instructions).

Start the server

SERVER_ARGS=(
  --model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
  --text-encoder-cpu-offload
  --pin-cpu-memory
  --num-gpus 4
  --ulysses-degree=2
  --ring-degree=2
  --port 30010
)

sglang serve "${SERVER_ARGS[@]}"

--model-path — path to the model or HuggingFace model ID
--port — HTTP port to listen on (default: 30000)

Get model information

Endpoint: GET /models Returns model path, task type, pipeline configuration, and precision settings.

curl -sS -X GET "http://localhost:30010/models"

Response:

{
  "model_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
  "task_type": "T2V",
  "pipeline_name": "wan_pipeline",
  "pipeline_class": "WanPipeline",
  "num_gpus": 4,
  "dit_precision": "bf16",
  "vae_precision": "fp16"
}

Image generation

The server implements an OpenAI-compatible Images API under the /v1/images namespace.

Create an image

Endpoint: POST /v1/images/generations

import base64
from openai import OpenAI

client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

img = client.images.generate(
    prompt="A calico cat playing a piano on stage",
    size="1024x1024",
    n=1,
    response_format="b64_json",
)

image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
    f.write(image_bytes)

If response_format=url is used and cloud storage is not configured, the API returns a relative URL like /v1/images/<IMAGE_ID>/content.

Edit an image

Endpoint: POST /v1/images/edits Accepts a multipart form upload with input images and a text prompt. Returns either a base64-encoded image or a URL.

b64_json response
URL response

curl -sS -X POST "http://localhost:30010/v1/images/edits" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -F "image=@local_input_image.png" \
  -F "url=image_url.jpg" \
  -F "prompt=A calico cat playing a piano on stage" \
  -F "size=1024x1024" \
  -F "response_format=b64_json"

curl -sS -X POST "http://localhost:30010/v1/images/edits" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -F "image=@local_input_image.png" \
  -F "url=image_url.jpg" \
  -F "prompt=A calico cat playing a piano on stage" \
  -F "size=1024x1024" \
  -F "response_format=url"

Download image content

When response_format=url is used, the API returns a relative URL like /v1/images/<IMAGE_ID>/content. Endpoint: GET /v1/images/{image_id}/content

curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
  -H "Authorization: Bearer sk-proj-1234567890" \
  -o output.png

Video generation

The server implements a subset of the OpenAI Videos API under the /v1/videos namespace.

Create a video

Endpoint: POST /v1/videos

from openai import OpenAI

client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")

video = client.videos.create(
    prompt="A calico cat playing a piano on stage",
    size="1280x720"
)
print(f"Video ID: {video.id}, Status: {video.status}")

Endpoint: GET /v1/videos

videos = client.videos.list()
for item in videos.data:
    print(item.id, item.status)

Download video content

Endpoint: GET /v1/videos/{video_id}/content

import time

# Poll for completion
while True:
    page = client.videos.list()
    item = next((v for v in page.data if v.id == video_id), None)
    if item and item.status == "completed":
        break
    time.sleep(5)

# Download content
resp = client.videos.download_content(video_id=video_id)
with open("output.mp4", "wb") as f:
    f.write(resp.read())

LoRA management

The server supports dynamic loading, merging, and unmerging of LoRA adapters.

Mutual exclusion: Only one LoRA can be merged (active) at a time.
Switching: To switch LoRAs, you must first unmerge the current one, then set the new one.
Caching: The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has negligible cost.

Set LoRA adapter

Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters. Endpoint: POST /v1/set_lora Parameters:

Parameter	Type	Description
`lora_nickname`	string or list	A unique identifier for the LoRA adapter(s). Required
`lora_path`	string or list	Path to `.safetensors` file(s) or HuggingFace repo ID(s). Required for first load; optional when re-activating a cached nickname
`target`	string or list	Which transformer(s) to apply the LoRA to: `"all"` (default), `"transformer"`, `"transformer_2"`, `"critic"`
`strength`	float or list	LoRA strength for merge (default: `1.0`). Values < 1.0 reduce the effect, > 1.0 amplify it

Single LoRA
Multiple LoRAs
Same target

curl -X POST http://localhost:30010/v1/set_lora \
  -H "Content-Type: application/json" \
  -d '{
        "lora_nickname": "lora_name",
        "lora_path": "/path/to/lora.safetensors",
        "target": "all",
        "strength": 0.8
      }'

curl -X POST http://localhost:30010/v1/set_lora \
  -H "Content-Type: application/json" \
  -d '{
        "lora_nickname": ["lora_1", "lora_2"],
        "lora_path": ["/path/to/lora1.safetensors", "/path/to/lora2.safetensors"],
        "target": ["transformer", "transformer_2"],
        "strength": [0.8, 1.0]
      }'

curl -X POST http://localhost:30010/v1/set_lora \
  -H "Content-Type: application/json" \
  -d '{
        "lora_nickname": ["style_lora", "character_lora"],
        "lora_path": ["/path/to/style.safetensors", "/path/to/character.safetensors"],
        "target": "all",
        "strength": [0.7, 0.9]
      }'

When using multiple LoRAs:

All list parameters (lora_nickname, lora_path, target, strength) must have the same length.
If target or strength is a single value, it will be applied to all LoRAs.
Multiple LoRAs applied to the same target will be merged in order.

Merge LoRA weights

Manually merges the currently set LoRA weights into the base model. Endpoint: POST /v1/merge_lora_weights

Parameter	Type	Description
`target`	string	Which transformer(s) to merge: `"all"` (default), `"transformer"`, `"transformer_2"`, `"critic"`
`strength`	float	LoRA strength for merge (default: `1.0`)

curl -X POST http://localhost:30010/v1/merge_lora_weights \
  -H "Content-Type: application/json" \
  -d '{"strength": 0.8}'

set_lora automatically performs a merge, so this endpoint is typically only needed if you have manually unmerged but want to re-apply the same LoRA without calling set_lora again.

Unmerge LoRA weights

Unmerges the currently active LoRA weights from the base model, restoring it to its original state. Call this before setting a different LoRA. Endpoint: POST /v1/unmerge_lora_weights

curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
  -H "Content-Type: application/json"

List LoRA adapters

Returns loaded LoRA adapters and current application status per module. Endpoint: GET /v1/list_loras

curl -sS -X GET "http://localhost:30010/v1/list_loras"

Response:

{
  "loaded_adapters": [
    { "nickname": "lora_a", "path": "/weights/lora_a.safetensors" },
    { "nickname": "lora_b", "path": "/weights/lora_b.safetensors" }
  ],
  "active": {
    "transformer": [
      {
        "nickname": "lora2",
        "path": "tarn59/pixel_art_style_lora_z_image_turbo",
        "merged": true,
        "strength": 1.0
      }
    ]
  }
}

Example: switching LoRAs

Set LoRA A

curl -X POST http://localhost:30010/v1/set_lora \
  -d '{"lora_nickname": "lora_a", "lora_path": "path/to/A"}'

Generate with LoRA A

Run your image or video generation requests.

Unmerge LoRA A

curl -X POST http://localhost:30010/v1/unmerge_lora_weights

Set LoRA B

curl -X POST http://localhost:30010/v1/set_lora \
  -d '{"lora_nickname": "lora_b", "lora_path": "path/to/B"}'

Generate with LoRA B

Run your image or video generation requests with the new adapter.

Output quality

Control output quality and compression for both image and video generation through the output-quality and output-compression parameters.

Parameters

Parameter	Type	Description
`output-quality`	string	Preset quality level. Default: `"default"`
`output-compression`	integer	Direct compression level override (0-100). When provided, takes precedence over `output-quality`

Quality presets:

Preset	Compression value
`"maximum"`	100
`"high"`	90
`"medium"`	55
`"low"`	35
`"default"`	Auto (50 for video, 75 for image)

When both output-quality and output-compression are provided, output-compression takes precedence.
Quality settings apply to JPEG and video formats. PNG uses lossless compression and ignores these settings.
Lower compression values (or "low" quality preset) produce smaller files but may show visible artifacts.

SGLang Diffusion

​Prerequisites

​Start the server

​Get model information

​Image generation

​Create an image

​Edit an image

​Download image content

​Video generation

​Create a video

​List videos

​Download video content

​LoRA management

​Set LoRA adapter

​Merge LoRA weights

​Unmerge LoRA weights

​List LoRA adapters

​Example: switching LoRAs

​Output quality

​Parameters

Prerequisites

Start the server

Get model information

Image generation

Create an image

Edit an image

Download image content

Video generation

Create a video

List videos

Download video content

LoRA management

Set LoRA adapter

Merge LoRA weights

Unmerge LoRA weights

List LoRA adapters

Example: switching LoRAs

Output quality

Parameters