Install methods
- Pip or uv
- From source
- Docker
- Kubernetes
- Docker Compose
- SkyPilot
- AWS SageMaker
It is recommended to use for faster installation:
Quick fixes to common problems
Wrong torch version
Wrong torch version
In some cases (for example, GB200), the command above might install a wrong torch version (for example, the CPU version) due to dependency resolution. Reinstall the correct PyTorch with the following:
CUDA 13 without Docker
CUDA 13 without Docker
If you do not have Docker access, install the matching aarch64
sgl_kernel wheel from the sgl-project whl releases after installing SGLang. Replace X.Y.Z with the sgl_kernel version required by your SGLang (you can find this by running uv pip show sgl_kernel).x86_64CUDA_HOME not set
CUDA_HOME not set
Choose one of the following solutions:
- Set
CUDA_HOMEto your CUDA install root:
- Install FlashInfer first following the FlashInfer installation doc, then install SGLang as described above.
Common notes
- FlashInfer is the default attention kernel backend. It only supports sm75 and above. If you encounter any FlashInfer-related issues on sm75+ devices (for example, T4, A10, A100, L4, L40S, H100), switch to other kernels by adding
--attention-backend triton --sampling-backend pytorchand open an issue on GitHub. - To reinstall flashinfer locally, use the following command:
pip3 install --upgrade flashinfer-python --force-reinstall --no-depsand then delete the cache withrm -rf ~/.cache/flashinfer. - When encountering
ptxas fatal : Value 'sm_103a' is not defined for option 'gpu-name'on B300/GB300, fix it withexport TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas.
