Optimized Model List
A list of LLMs have been optimized on Intel GPU, and more are on the way:| Model Name | BF16 |
|---|---|
| Llama-3.2-3B | meta-llama/Llama-3.2-3B-Instruct |
| Llama-3.1-8B | meta-llama/Llama-3.1-8B-Instruct |
| Qwen2.5-1.5B | Qwen/Qwen2.5-1.5B |
The model identifiers listed in the table above have been verified on Intel® Arc™ B580 Graphics.
Installation
- Source
- Docker
Currently SGLang XPU only supports installation from source. Please refer to “Getting Started on Intel GPU” to install XPU dependency.
- Creation & Activation
- Install PyTorch and Dependencies
- Cloning
- Configure Build File
- Build and Install
Launch of the Serving Engine
Example command to launch SGLang serving:Benchmarking with Requests
You can benchmark the performance via thebench_serving script. Run the command in another terminal.
curl) or via your own script.