Skip to main content
You can install SGLang using one of the methods below. This page primarily applies to common NVIDIA GPU platforms. For other or newer platforms, please refer to the dedicated pages for AMD GPUs, Intel Xeon CPUs, Google TPU, NVIDIA DGX Spark, NVIDIA Jetson, [Ascend NPUs](../hardware-platforms/ascend-npus/SGLang installation with NPUs support), and Intel XPU.

Install methods

It is recommended to use for faster installation:
pip install --upgrade pip
pip install uv
uv pip install "sglang"

Quick fixes to common problems

In some cases (for example, GB200), the command above might install a wrong torch version (for example, the CPU version) due to dependency resolution. Reinstall the correct PyTorch with the following:
uv pip install "torch" "torchvision" --extra-index-url https://download.pytorch.org/whl/cu129 --force-reinstall
If you do not have Docker access, install the matching sgl_kernel wheel from the sgl-project whl releases after installing SGLang. Replace X.Y.Z with the sgl_kernel version required by your SGLang (you can find this by running uv pip show sgl_kernel).x86_64
uv pip install "https://github.com/sgl-project/whl/releases/download/vX.Y.Z/sgl_kernel-X.Y.Z+cu130-cp310-abi3-manylinux2014_x86_64.whl"
aarch64
uv pip install "https://github.com/sgl-project/whl/releases/download/vX.Y.Z/sgl_kernel-X.Y.Z+cu130-cp310-abi3-manylinux2014_aarch64.whl"
Choose one of the following solutions:
  1. Set CUDA_HOME to your CUDA install root:
export CUDA_HOME=/usr/local/cuda-<your-cuda-version>
  1. Install FlashInfer first following the FlashInfer installation doc, then install SGLang as described above.

Common notes

  • FlashInfer is the default attention kernel backend. It only supports sm75 and above. If you encounter any FlashInfer-related issues on sm75+ devices (for example, T4, A10, A100, L4, L40S, H100), switch to other kernels by adding --attention-backend triton --sampling-backend pytorch and open an issue on GitHub.
  • To reinstall flashinfer locally, use the following command: pip3 install --upgrade flashinfer-python --force-reinstall --no-deps and then delete the cache with rm -rf ~/.cache/flashinfer.
  • When encountering ptxas fatal : Value 'sm_103a' is not defined for option 'gpu-name' on B300/GB300, fix it with export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas.