Skip to main content

SGLang Documentation home page

SGLang Diffusion

Basic Usage

Basic Usage
Ollama-Compatible API
Offline Engine API
SGLang Native APIs
Sampling Parameters

Advanced Features

Advanced Features
Server Arguments
Loading Models from Object Storage
Hyperparameter Tuning
Attention Backend
HiSparse: Hierarchical Sparse Attention
Speculative Decoding
Adaptive Speculative Decoding
Structured Outputs
Structured Outputs For Reasoning Models
Tool Parser
Reasoning Parser
Quantization
Quantized KV Cache
DP, DPA and SGLang DP Router
Expert Parallelism
LoRA Serving
PD Disaggregation
EPD Disaggregation
Pipeline Parallelism for Long Context
Query VLM with Offline Engine
DP for Multi-Modal Encoder in SGLang
Cuda Graph for Multi-Modal Encoder in SGLang
Breakable CUDA Graph
Piecewise CUDA Graph
SGLang Model Gateway
Deterministic Inference
Observability
Checkpoint Engine Integration
SGLang for RL Systems

Supported Models

Supported models

Developer Guide

Developer Guide
Contribution Guide
Evaluating New Models with SGLang
MSProbe Debugging Guide

References

References
Troubleshooting and Frequently Asked Questions
Environment Variables
Production Metrics
Production Request Tracing
Custom Chat Template
Post-Training Integration

Multi-Node Deployment

Multi-Node Deployment

Documentation Index
Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
Use this file to discover all available pages before exploring further.

Multi Node
Deploy On K8S
Lws Pd Deploy
Deepseekv32 Pd
Deploying DeepSeek with PD Disaggregation on 96 H100 GPUs
Deploying Kimi K2 with PD Disaggregation on 128 H200 GPUs

Was this page helpful?

Production Request Tracing

Multi-Node Deployment

⌘I

github x linkedin slack discord

Powered byThis documentation is built and hosted on Mintlify, a developer documentation platform

Assistant

Responses are generated using AI and may contain mistakes.