Prerequisites
Before starting, ensure the following:- NVIDIA Jetson AGX Orin Devkit is set up with JetPack 6.1 or later.
- CUDA Toolkit and cuDNN are installed.
- Verify that the Jetson AGX Orin is in high-performance mode:
Installing and Running SGLang with Jetson Containers
- Clone the jetson-containers repository
- Run the installation script
- Build the container image
- Run the container
- Using jetson-containers
- Using Docker manually
Running Inference
Launch the server:--dtype half --context-length 8192) are due to the limited computational resources in Nvidia jetson kit. A detailed explanation can be found in Server Arguments.
After launching the engine, refer to Chat completions to test the usability.
Running Quantization with TorchAO
TorchAO is suggested to NVIDIA Jetson Orin.--torchao-config int4wo-128 is also for memory efficiency.
