System Configuration
When using AMD GPUs (such as MI300X), certain system-level optimizations help ensure stable performance. Here we take MI300X as an example. AMD provides official documentation for MI300X optimization and system tuning:- AMD MI300X Tuning Guides
- LLM inference performance validation on AMD Instinct MI300X
- AMD Instinct MI300X System Optimization
- AMD Instinct MI300X Workload Optimization
- Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X
We strongly recommend reading these docs and guides entirely to fully utilize your system.
Update GRUB Settings
In/etc/default/grub, append the following to GRUB_CMDLINE_LINUX:
sudo update-grub (or your distro’s equivalent) and reboot.
Disable NUMA Auto-Balancing
Install SGLang
- Docker (Recommended)
- From Source
The docker images are available on Docker Hub at lmsysorg/sglang, built from rocm.Dockerfile.
-
Build the docker image
If you use pre-built images, you can skip this step and replace
sglang_imagewith the pre-built image names in the steps below. -
Create a convenient alias
If you are using RDMA, please note that:
--network hostand--privilegedare required by RDMA. If you don’t need RDMA, you can remove them.- You may need to set
NCCL_IB_GID_INDEXif you are using RoCE, for example:export NCCL_IB_GID_INDEX=3.
-
Launch the server
Replace
<secret>below with your huggingface hub token. -
Verify the installation
You can run a benchmark in another terminal or refer to other docs to send requests to the engine.
Examples
Running DeepSeek-V3
The only difference when running DeepSeek-V3 is in how you start the server.Running Llama3.1
Running Llama3.1 is nearly identical to running DeepSeek-V3. The only difference is in the model specified when starting the server.Warmup Step
When the server displaysThe server is fired up and ready to roll!, it means the startup is successful.