Supported Models
This guide applies to the following models. You only need to update the model name during deployment. The following examples use MiniMax-M2:System Requirements
The following are recommended configurations; actual requirements should be adjusted based on your use case:- 4x 96GB GPUs: Supported context length of up to 400K tokens.
- 8x 144GB GPUs: Supported context length of up to 3M tokens.
Deployment with Python
4-GPU deployment command:Command
Command
Testing Deployment
After startup, you can test the SGLang OpenAI-compatible API with the following command:Command
