The SGLang Diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as dynamic LoRA adapter management.
Prerequisites
- Python 3.11+ if you plan to use the OpenAI Python SDK.
- A running SGLang Diffusion server (see the CLI reference for launch instructions).
Start the server
SERVER_ARGS=(
--model-path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
--text-encoder-cpu-offload
--pin-cpu-memory
--num-gpus 4
--ulysses-degree=2
--ring-degree=2
--port 30010
)
sglang serve "${SERVER_ARGS[@]}"
--model-path — path to the model or HuggingFace model ID
--port — HTTP port to listen on (default: 30000)
Endpoint: GET /models
Returns model path, task type, pipeline configuration, and precision settings.
curl -sS -X GET "http://localhost:30010/models"
Response:
{
"model_path": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
"task_type": "T2V",
"pipeline_name": "wan_pipeline",
"pipeline_class": "WanPipeline",
"num_gpus": 4,
"dit_precision": "bf16",
"vae_precision": "fp16"
}
Image generation
The server implements an OpenAI-compatible Images API under the /v1/images namespace.
Create an image
Endpoint: POST /v1/images/generations
import base64
from openai import OpenAI
client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")
img = client.images.generate(
prompt="A calico cat playing a piano on stage",
size="1024x1024",
n=1,
response_format="b64_json",
)
image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
f.write(image_bytes)
If response_format=url is used and cloud storage is not configured, the API returns a relative URL like /v1/images/<IMAGE_ID>/content.
Edit an image
Endpoint: POST /v1/images/edits
Accepts a multipart form upload with input images and a text prompt. Returns either a base64-encoded image or a URL.
b64_json response
URL response
curl -sS -X POST "http://localhost:30010/v1/images/edits" \
-H "Authorization: Bearer sk-proj-1234567890" \
-F "image=@local_input_image.png" \
-F "url=image_url.jpg" \
-F "prompt=A calico cat playing a piano on stage" \
-F "size=1024x1024" \
-F "response_format=b64_json"
curl -sS -X POST "http://localhost:30010/v1/images/edits" \
-H "Authorization: Bearer sk-proj-1234567890" \
-F "image=@local_input_image.png" \
-F "url=image_url.jpg" \
-F "prompt=A calico cat playing a piano on stage" \
-F "size=1024x1024" \
-F "response_format=url"
Download image content
When response_format=url is used, the API returns a relative URL like /v1/images/<IMAGE_ID>/content.
Endpoint: GET /v1/images/{image_id}/content
curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
-H "Authorization: Bearer sk-proj-1234567890" \
-o output.png
Video generation
The server implements a subset of the OpenAI Videos API under the /v1/videos namespace.
Create a video
Endpoint: POST /v1/videos
from openai import OpenAI
client = OpenAI(api_key="sk-proj-1234567890", base_url="http://localhost:30010/v1")
video = client.videos.create(
prompt="A calico cat playing a piano on stage",
size="1280x720"
)
print(f"Video ID: {video.id}, Status: {video.status}")
List videos
Endpoint: GET /v1/videos
videos = client.videos.list()
for item in videos.data:
print(item.id, item.status)
Download video content
Endpoint: GET /v1/videos/{video_id}/content
import time
# Poll for completion
while True:
page = client.videos.list()
item = next((v for v in page.data if v.id == video_id), None)
if item and item.status == "completed":
break
time.sleep(5)
# Download content
resp = client.videos.download_content(video_id=video_id)
with open("output.mp4", "wb") as f:
f.write(resp.read())
LoRA management
The server supports dynamic loading, merging, and unmerging of LoRA adapters.
- Mutual exclusion: Only one LoRA can be merged (active) at a time.
- Switching: To switch LoRAs, you must first unmerge the current one, then set the new one.
- Caching: The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has negligible cost.
Set LoRA adapter
Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters.
Endpoint: POST /v1/set_lora
Parameters:
| Parameter | Type | Description |
|---|
lora_nickname | string or list | A unique identifier for the LoRA adapter(s). Required |
lora_path | string or list | Path to .safetensors file(s) or HuggingFace repo ID(s). Required for first load; optional when re-activating a cached nickname |
target | string or list | Which transformer(s) to apply the LoRA to: "all" (default), "transformer", "transformer_2", "critic" |
strength | float or list | LoRA strength for merge (default: 1.0). Values < 1.0 reduce the effect, > 1.0 amplify it |
Single LoRA
Multiple LoRAs
Same target
curl -X POST http://localhost:30010/v1/set_lora \
-H "Content-Type: application/json" \
-d '{
"lora_nickname": "lora_name",
"lora_path": "/path/to/lora.safetensors",
"target": "all",
"strength": 0.8
}'
curl -X POST http://localhost:30010/v1/set_lora \
-H "Content-Type: application/json" \
-d '{
"lora_nickname": ["lora_1", "lora_2"],
"lora_path": ["/path/to/lora1.safetensors", "/path/to/lora2.safetensors"],
"target": ["transformer", "transformer_2"],
"strength": [0.8, 1.0]
}'
curl -X POST http://localhost:30010/v1/set_lora \
-H "Content-Type: application/json" \
-d '{
"lora_nickname": ["style_lora", "character_lora"],
"lora_path": ["/path/to/style.safetensors", "/path/to/character.safetensors"],
"target": "all",
"strength": [0.7, 0.9]
}'
When using multiple LoRAs:
- All list parameters (
lora_nickname, lora_path, target, strength) must have the same length.
- If
target or strength is a single value, it will be applied to all LoRAs.
- Multiple LoRAs applied to the same target will be merged in order.
Merge LoRA weights
Manually merges the currently set LoRA weights into the base model.
Endpoint: POST /v1/merge_lora_weights
| Parameter | Type | Description |
|---|
target | string | Which transformer(s) to merge: "all" (default), "transformer", "transformer_2", "critic" |
strength | float | LoRA strength for merge (default: 1.0) |
curl -X POST http://localhost:30010/v1/merge_lora_weights \
-H "Content-Type: application/json" \
-d '{"strength": 0.8}'
set_lora automatically performs a merge, so this endpoint is typically only needed if you have manually unmerged but want to re-apply the same LoRA without calling set_lora again.
Unmerge LoRA weights
Unmerges the currently active LoRA weights from the base model, restoring it to its original state. Call this before setting a different LoRA.
Endpoint: POST /v1/unmerge_lora_weights
curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
-H "Content-Type: application/json"
List LoRA adapters
Returns loaded LoRA adapters and current application status per module.
Endpoint: GET /v1/list_loras
curl -sS -X GET "http://localhost:30010/v1/list_loras"
Response:
{
"loaded_adapters": [
{ "nickname": "lora_a", "path": "/weights/lora_a.safetensors" },
{ "nickname": "lora_b", "path": "/weights/lora_b.safetensors" }
],
"active": {
"transformer": [
{
"nickname": "lora2",
"path": "tarn59/pixel_art_style_lora_z_image_turbo",
"merged": true,
"strength": 1.0
}
]
}
}
Example: switching LoRAs
- Set LoRA A
curl -X POST http://localhost:30010/v1/set_lora \
-d '{"lora_nickname": "lora_a", "lora_path": "path/to/A"}'
- Generate with LoRA A
Run your image or video generation requests.
- Unmerge LoRA A
curl -X POST http://localhost:30010/v1/unmerge_lora_weights
- Set LoRA B
curl -X POST http://localhost:30010/v1/set_lora \
-d '{"lora_nickname": "lora_b", "lora_path": "path/to/B"}'
- Generate with LoRA B
Run your image or video generation requests with the new adapter.
Output quality
Control output quality and compression for both image and video generation through the output-quality and output-compression parameters.
Parameters
| Parameter | Type | Description |
|---|
output-quality | string | Preset quality level. Default: "default" |
output-compression | integer | Direct compression level override (0-100). When provided, takes precedence over output-quality |
Quality presets:
| Preset | Compression value |
|---|
"maximum" | 100 |
"high" | 90 |
"medium" | 55 |
"low" | 35 |
"default" | Auto (50 for video, 75 for image) |
- When both
output-quality and output-compression are provided, output-compression takes precedence.
- Quality settings apply to JPEG and video formats. PNG uses lossless compression and ignores these settings.
- Lower compression values (or
"low" quality preset) produce smaller files but may show visible artifacts.