DeepSeek Series Models
Low Latency
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Deepseek-R1 | Atlas 800I A3 | 32 | PD Separation | 6K+1.6K | 20ms | W8A8 INT8 | Optimal Configuration |
| Deepseek-R1 | Atlas 800I A3 | 32 | PD Separation | 3.9K+1K | 20ms | W8A8 INT8 | Optimal Configuration |
| Deepseek-R1 | Atlas 800I A3 | 32 | PD Separation | 3.5K+1.5K | 20ms | W8A8 INT8 | Optimal Configuration |
| Deepseek-R1 | Atlas 800I A3 | 32 | PD Separation | 3.5K+1K | 20ms | W8A8 INT8 | Optimal Configuration |
| DeepSeek-V3.2-Exp | Atlas 800I A3 | 32 | PD Separation | 64K+3K | 30ms | W8A8 INT8 | Optimal Configuration |
High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Deepseek-R1 | Atlas 800I A3 | 32 | PD Separation | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Deepseek-R1 | Atlas 800I A3 | 8 | PD Mixed | 2K+2K | 50ms | W4A8 INT8 | Optimal Configuration |
| Deepseek-R1 | Atlas 800I A3 | 16 | PD Separation | 2K+2K | 50ms | W4A8 INT8 | Optimal Configuration |
| Deepseek-R1 | Atlas 800I A3 | 8 | PD Mixed | 3.5K+1.5K | 50ms | W4A8 INT8 | Optimal Configuration |
| Deepseek-R1 | Atlas 800I A3 | 16 | PD Separation | 3.5K+1.5K | 50ms | W4A8 INT8 | Optimal Configuration |
Qwen Series Models
Low Latency
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-235B-A22B | Atlas 800I A3 | 8 | PD Mixed | 11K+1K | 10ms | BF16 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A3 | 4 | PD Mixed | 6K+1.5K | 18ms | BF16 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A3 | 4 | PD Mixed | 4K+1.5K | 11ms | BF16 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A3 | 8 | PD Mixed | 18K+4K | 12ms | BF16 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A2 | 8 | PD Mixed | 6K+1.5K | 18ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A2 | 8 | PD Mixed | 4K+1.5K | 11ms | BF16 | Optimal Configuration |
High Throughput
| Model | Hardware | Cards | Deploy Mode | Dataset | TPOT | Quantization | Configuration |
|---|---|---|---|---|---|---|---|
| Qwen3-235B-A22B | Atlas 800I A3 | 24 | PD Separation | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-235B-A22B | Atlas 800I A3 | 8 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-235B-A22B | Atlas 800I A3 | 8 | PD Mixed | 2K+2K | 100ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-235B-A22B | Atlas 800I A3 | 8 | PD Mixed | 2K+2K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-235B-A22B | Atlas 800I A3 | 16 | PD Mixed | 2K+2K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A3 | 2 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A3 | 2 | PD Mixed | 2K+2K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-30B-A3B | Atlas 800I A3 | 1 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-Coder-480B-A35B-Instruct | Atlas 800I A3 | 24 | PD Separation | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-Coder-480B-A35B-Instruct | Atlas 800I A3 | 16 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-Coder-480B-A35B-Instruct | Atlas 800I A3 | 8 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-Next-80B-A3B-Instruct | Atlas 800I A3 | 2 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A2 | 8 | PD Mixed | 3.5K+1.5K | 50ms | W8A8 INT8 | Optimal Configuration |
| Qwen3-32B | Atlas 800I A2 | 8 | PD Mixed | 2K+2K | 50ms | W8A8 INT8 | Optimal Configuration |
Optimal Configuration
DeepSeek-R1 3_5K-1_5K 50ms on A3 32 Cards Separation Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 32Card DeployMode: PD Separation Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 6K-1_6K 20ms on A3 32 Cards Separation Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 32Card DeployMode: PD Separation Dataset: random Input Output Length: 6K+1.6K TPOT: 20msModel Deployment
Command
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 3_9K-1K 20ms on A3 32 Cards Separation Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 32Card DeployMode: PD Separation Dataset: random Input Output Length: 3.9K+1K TPOT: 20msModel Deployment
Please Turn to DeepSeek-R1 6K-1_6K 20ms on A3 32 Cards Separation ModeBenchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 3_5K-1_5K 20ms on A3 32 Cards Separation Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 32Card DeployMode: PD Separation Dataset: random Input Output Length: 3.5K+1.5K TPOT: 20msModel Deployment
Please Turn to DeepSeek-R1 6K-1_6K 20ms on A3 32 Cards Separation ModeBenchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 3_5K-1K 20ms on A3 32 Cards Separation Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 32Card DeployMode: PD Separation Dataset: random Input Output Length: 3.5K+1K TPOT: 20msModel Deployment
Please Turn to DeepSeek-R1 6K-1_6K 20ms on A3 32 Cards Separation ModeBenchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 2K-2K 50ms on A3 8 Cards Mixed Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 2K+2K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 2K-2K 50ms on A3 16 Cards Separation Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 16Card DeployMode: PD Separation Dataset: random Input Output Length: 2K+2K TPOT: 50msModel Deployment
Command
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 3_5K-1_5K 50ms on A3 8 Cards Mixed Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-R1 3_5K-1_5K 50ms on A3 16 Cards Separation Mode
Model: Deepseek R1 Hardware: Atlas 800I A3 16Card DeployMode: PD Separation Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
DeepSeek-V3.2-Exp 64K-3K 30ms on A3 32 Cards Separation Mode
Model: DeepSeek-V3.2-Exp-W8A8 Hardware: Atlas 800I A3 32Card DeployMode: PD Separation Dataset: random Input Output Length: 64K+3K TPOT: 30msModel Deployment
Deploy Prefill InstanceCommand
Command
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-235B-A22B 3_5K-1_5K 50ms on A3 24 Cards Separation Mode
Model: Qwen3-235B-A22B-W8A8 Hardware: Atlas 800I A3 24Card DeployMode: PD Separation Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-235B-A22B 3_5K-1_5K 50ms on A3 8 Cards Mixed Mode
Model: Qwen3-235B-A22B-W8A8 Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-235B-A22B 2K-2K 100ms on A3 8 Cards Mixed Mode
Model: Qwen3-235B-A22B-W8A8 Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 2K+2K TPOT: 100msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-235B-A22B 2K-2K 50ms on A3 8 Cards Mixed Mode
Model: Qwen3-235B-A22B-W8A8 Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 2K+2K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-235B-A22B 2K-2K 50ms on A3 16 Cards Mixed Mode
Model: Qwen3-235B-A22B-W8A8 Hardware: Atlas 800I A3 16Card DeployMode: PD Mixed Dataset: random Input Output Length: 2K+2K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-235B-A22B 11K-1K 10ms on A3 8 Cards Mixed Mode
Model: Qwen3-235B-A22B-W8A8 Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 11K+1K TPOT: 10msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 6K-1_5K 18ms on A3 4 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A3 4Card DeployMode: PD Mixed Dataset: random Input Output Length: 6K+1.5K TPOT: 18msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 4K-1_5K 11ms on A3 4 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A3 4Card DeployMode: PD Mixed Dataset: random Input Output Length: 4K+1.5K TPOT: 11msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 18K-4K 12ms on A3 8 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 18K+4K TPOT: 12msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 3_5K-1_5K 50ms on A3 2 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A3 2Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 2K-2K 50ms on A3 2 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A3 2Card DeployMode: PD Mixed Dataset: random Input Output Length: 2K+2K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-30B-A3B 3_5K-1_5K 50ms on A3 1 Card Mixed Mode
Model: Qwen3-30B-A3B-Instruct-2507 Hardware: Atlas 800I A3 1Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-Coder-480B-A35B-Instruct 3_5K-1_5K 50ms on A3 24 Cards Separation Mode
Model: Qwen3-Coder-480B-A35B-Instruct Hardware: Atlas 800I A3 24Card DeployMode: PD Separation Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-Coder-480B-A35B-Instruct 3_5K-1_5K 50ms on A3 16 Cards Mixed Mode
Model: Qwen3-Coder-480B-A35B-Instruct Hardware: Atlas 800I A3 16Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-Coder-480B-A35B-Instruct 3_5K-1_5K 50ms on A3 8 Cards Mixed Mode
Model: Qwen3-Coder-480B-A35B-Instruct Hardware: Atlas 800I A3 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-Next-80B-A3B-Instruct 3_5K-1_5K 50ms on A3 2 Cards Mixed Mode
Model: Qwen3-Next-80B-A3B-Instruct Hardware: Atlas 800I A3 2Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 6K-1_5K 18ms on A2 8 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A2 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 6K+1.5K TPOT: 18msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 4K-1_5K 11ms on A2 8 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A2 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 4K+1.5K TPOT: 11msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 3_5K-1_5K 50ms on A2 8 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A2 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 3.5K+1.5K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
Qwen3-32B 2K-2K 50ms on A2 8 Cards Mixed Mode
Model: Qwen3-32B Hardware: Atlas 800I A2 8Card DeployMode: PD Mixed Dataset: random Input Output Length: 2K+2K TPOT: 50msModel Deployment
Command
Benchmark
We tested it based on theRANDOM dataset.
Command
