Skip to main content

SGLang Documentation home page

SGLang Diffusion

Basic Usage

Basic Usage
Ollama-Compatible API
Offline Engine API
SGLang Native APIs
Sampling Parameters

Advanced Features

Advanced Features
Server Arguments
Hyperparameter Tuning
Attention Backend
Speculative Decoding
Structured Outputs
Structured Outputs For Reasoning Models
Tool Parser
Reasoning Parser
Quantization
Quantized KV Cache
Expert Parallelism
LoRA Serving
PD Disaggregation
EPD Disaggregation
Pipeline Parallelism for Long Context
- Hierarchical KV Caching (HiCache)
- SGLang HiCache Best Practices
- HiCache System Design and Optimization
- Runtime Attach/Detach HiCache Storage Backend (No Restart)
Query VLM with Offline Engine
DP for Multi-Modal Encoder in SGLang
Cuda Graph for Multi-Modal Encoder in SGLang
SGLang Model Gateway
Deterministic Inference
Observability
Checkpoint Engine Integration
SGLang for RL Systems

Supported Models

Supported models

Developer Guide

Developer Guide
Contribution Guide
Evaluating New Models with SGLang

References

References
Troubleshooting and Frequently Asked Questions
Environment Variables
Production Metrics
Production Request Tracing
Custom Chat Template
Post-Training Integration

Hierarchical KV Caching (HiCache)

Hierarchical KV Caching (HiCache)

Hicache Best Practices
Hicache Design
Hicache Storage Runtime Attach Detach

Pipeline Parallelism for Long Context

SGLang HiCache Best Practices

⌘I

github x linkedin slack discord

Powered byThis documentation is built and hosted on Mintlify, a developer documentation platform

Assistant

Responses are generated using AI and may contain mistakes.