System Settings section to ensure the clusters are roaring at max performance. Feel free to leave an issue here at sglang if you encounter any issues or have any problems.
Component Version Mapping For SGLang
| Component | Version | Obtain Way |
|---|---|---|
| HDK | 25.3.RC1 | |
| CANN | 8.5.0 | Obtain Images |
| Pytorch Adapter | 7.3.0 | |
| MemFabric | 1.0.5 | pip install memfabric-hybrid==1.0.5 |
| Triton | 3.2.0 | pip install triton-ascend |
| Bisheng | 20251121 | |
| SGLang NPU Kernel | NA |
Obtain CANN Image
Obtain CANN Image
You can obtain the dependency of a specified version of CANN through an image.
Preparing the Running Environment
- Source
- Docker
Python Version
Python Version
Only
python==3.11 is supported currently. If you don’t want to break system pre-installed python, try installing with conda.CANN
CANN
Prior to start work with SGLang on Ascend you need to install CANN Toolkit, Kernels operator package and NNAL version 8.3.RC2 or higher, check the installation guide
MemFabric-Hybrid
MemFabric-Hybrid
If you want to use PD disaggregation mode, you need to install MemFabric-Hybrid. MemFabric-Hybrid is a drop-in replacement of Mooncake Transfer Engine that enables KV cache transfer on Ascend NPU clusters.
Pytorch and Pytorch Framework Adaptor on Ascend
Pytorch and Pytorch Framework Adaptor on Ascend
torch and install torch_npu, check installation guideTriton on Ascend
Triton on Ascend
We provide our own implementation of Triton for Ascend.For installation of Triton on Ascend nightly builds or from sources, follow installation guide
SGLang Kernels NPU
SGLang Kernels NPU
We provide SGL kernels for Ascend NPU, check installation guide.
DeepEP-compatible Library
DeepEP-compatible Library
We provide a DeepEP-compatible Library as a drop-in replacement of deepseek-ai’s DeepEP library, check the installation guide.
Installing SGLang from source
Installing SGLang from source
System Settings
CPU performance power scheme
CPU performance power scheme
The default power scheme on Ascend hardware is
ondemand which could affect performance, changing it to performance is recommended.Disable NUMA balancing
Disable NUMA balancing
Prevent swapping out system memory
Prevent swapping out system memory
