Skip to main content
Star Fork

Performance & Runtime

Designed for low-latency, high-throughput inference with RadixAttention, prefix caching, and multi-GPU parallelism.

Models & Ecosystem

Broad support for Llama, Qwen, DeepSeek, and more. Compatible with Hugging Face and OpenAI APIs.

Extensive Hardware Support

Native support across Hardware Platforms including NVIDIA, AMD, Intel Xeon, Google TPU, and Ascend NPU accelerators.

Community & Training

Open-source with widespread adoption, powering 400k+ GPUs and integrated with major RL frameworks.
SGLang powers large-scale production deployments, generating trillions of tokens each day across more than 400,000 GPUs worldwide. It is hosted under the non-profit open-source organization LMSYS.

Get Started

SGLang is an inference framework meant for production level serving. It is designed to deliver low-latency and high-throughput inference across a wide range of setups, from a single GPU to large distributed clusters.

Install SGLang

Install SGLang with pip, from source, or via Docker on your preferred hardware platform.

Quickstart

Launch your first model server and send requests in minutes with OpenAI-compatible APIs.

News and latest blogs

Highlights of SGLang at NVIDIA GTC 2026
Elastic EP in SGLang: Achieving Partial Failure Tolerance for DeepSeek MoE Deployments
ROCm Support for Miles: Large-Scale RL Post-Training on AMD Instinct\u2122 GPUs
SGLang Adds Day-0 Support for NVIDIA Nemotron 3 Super for building High-Efficiency Multi-Agent Systems
Unlocking 25x Inference Performance with SGLang on NVIDIA GB300 NVL72
Deploying DeepSeek on GB300 NVL72: Big Wins in Long-Context Inference

Learn more and join the community

Stay connected

Development roadmap to follow current priorities and upcoming work.
Weekly public development meeting to hear updates and join open discussions.
Slack for questions, feedback, and community support.
X Twitter and LinkedIn for project updates.
LMSYS blog for release notes, benchmarks, and technical deep dives.
Learning materials for blogs, slides, and videos.