Skip to main content
MindSpore is a high-performance AI framework optimized for [Ascend NPUs](../hardware-platforms/ascend-npus/SGLang installation with NPUs support). This doc guides users to run MindSpore models in SGLang.

Requirements

MindSpore currently only supports Ascend NPU devices. Users need to first install Ascend CANN software packages. The CANN software packages can be downloaded from the Ascend Official Website. The recommended version is 8.3.RC2.

Supported Models

Currently, the following models are supported:

Qwen3

Dense and MoE models

DeepSeek V3/R1

DeepSeek V3 and R1 models

More Coming Soon

Additional models are on the way

Installation

Currently, MindSpore models are provided by an independent package sgl-mindspore. Support for MindSpore is built upon current SGLang support for Ascend NPU platform. Please first [install SGLang for Ascend NPU](../hardware-platforms/ascend-npus/SGLang installation with NPUs support) and then install sgl-mindspore.
git clone https://github.com/mindspore-lab/sgl-mindspore.git
cd sgl-mindspore
pip install -e .

Run Model

Current SGLang-MindSpore supports Qwen3 and DeepSeek V3/R1 models. This doc uses Qwen3-8B as an example.

Offline Infer

Use the following script for offline infer:
import sglang as sgl

# Initialize the engine with MindSpore backend
llm = sgl.Engine(
    model_path="/path/to/your/model",  # Local model path
    device="npu",                      # Use NPU device
    model_impl="mindspore",            # MindSpore implementation
    attention_backend="ascend",        # Attention backend
    tp_size=1,                         # Tensor parallelism size
    dp_size=1                          # Data parallelism size
)

# Generate text
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is"
]

sampling_params = {"temperature": 0, "top_p": 0.9}
outputs = llm.generate(prompts, sampling_params)

for prompt, output in zip(prompts, outputs):
    print(f"Prompt: {prompt}")
    print(f"Generated: {output['text']}")
    print("---")

Start Server

python3 -m sglang.launch_server \
    --model-path /path/to/your/model \
    --host 0.0.0.0 \
    --device npu \
    --model-impl mindspore \
    --attention-backend ascend \
    --tp-size 1 \
    --dp-size 1

Troubleshooting

Debug Mode

Enable sglang debug logging by log-level argument:
python3 -m sglang.launch_server \
    --model-path /path/to/your/model \
    --host 0.0.0.0 \
    --device npu \
    --model-impl mindspore \
    --attention-backend ascend \
    --log-level DEBUG
Enable MindSpore info and debug logging by setting environments:
export GLOG_v=1

Explicitly Select Devices

Use the following environment variable to explicitly select the devices to use:
export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7

Some Communication Environment Issues

In case of some environment with special communication environment, users need to set some environment variables:
export MS_ENABLE_LCCL=off # current not support LCCL communication mode in SGLang-MindSpore

Some Dependencies of Protobuf

In case of some environment with special protobuf version, users need to set some environment variables to avoid binary version mismatch:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

Support

For MindSpore-specific issues, refer to the MindSpore documentation.