Prerequisites
- At least two Kubernetes nodes, each with two H20 systems and eight GPUs, are required.
-
Make sure your K8S cluster has LWS correctly installed. If it hasn’t been set up yet, please follow the installation instructions. Note: For LWS versions ≤0.5.x, you must use the Downward API to obtain
LWS_WORKER_INDEX, as native support for this feature was introduced in v0.6.0.
Basic example
For the basic example documentation, refer to Deploy Distributed Inference Service with SGLang and LWS on GPUs. However, that document only covers the basic NCCL socket mode. In this section, we’ll make some simple modifications to adapt the setup to the RDMA scenario.RDMA RoCE case
- Check your env:
Command
- Prepare the
lws.yamlfile for deploying on k8s.
Config
- Then use
kubectl apply -f lws.yamlyou will get this output.
Output
sglang-0) status to change to 1/1, which indicates it is Ready.
You can use the command kubectl logs -f sglang-0 to view the logs of the leader node.
Once successful, you should see output like this:
Output
Debug
- Set
NCCL_DEBUG=TRACEto check if it is a NCCL communication problem.
RoCE scenario
- Please make sure that RDMA devices are available in the cluster environment.
- Please make sure that the nodes in the cluster have Mellanox NICs with RoCE. In this example, we use Mellanox ConnectX 5 model NICs, and the proper OFED driver has been installed. If not, please refer to the document Install OFED Driver to install the driver.
-
Check your env:
Command
-
Check the OFED driver:
Command
-
Show RDMA link status and check IB devices:
Command
-
Test RoCE network speed on the host:
Command
-
Check RDMA accessible in your container:
Command
Keys to success
- In the YAML configuration above, pay attention to the NCCL environment variable. For older versions of NCCL, you should check the NCCL_IB_GID_INDEX environment setting.
- NCCL_SOCKET_IFNAME is also crucial, but in a containerized environment, this typically isn’t an issue.
- In some cases, it’s necessary to configure GLOO_SOCKET_IFNAME correctly.
- NCCL_DEBUG is essential for troubleshooting, but I’ve found that sometimes it doesn’t show error logs within containers. This could be related to the Docker image you’re using. You may want to try switching images if needed.
- Avoid using Docker images based on Ubuntu 18.04, as they tend to have compatibility issues.
Remaining issues
- In Kubernetes, Docker, or Containerd environments, we use hostNetwork to prevent performance degradation.
- We utilize privileged mode, which isn’t secure. Additionally, in containerized environments, full GPU isolation cannot be achieved.
TODO
- Integrated with k8s-rdma-shared-dev-plugin.
