1. Model Introduction
The DeepSeek-V3.2 series includes three model variants, each optimized for different use cases: DeepSeek-V3.2-Exp is an upgraded version of DeepSeek-V3.1-Terminus, introducing the DeepSeek Sparse Attention (DSA) mechanism through continued training. DSA is a fine-grained sparse attention mechanism powered by a lightning indexer, enabling DeepSeek-V3.2-Exp to achieve significant efficiency improvements in long-context scenarios. As an intermediate step toward the next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios. Recommended for general conversations, long-context processing, and efficient inference. DeepSeek-V3.2 is the standard version suitable for general tasks and conversational scenarios. For local deployment, we recommend setting the sampling parameters to temperature = 1.0, top_p = 0.95. Recommended for standard conversations and general tasks. DeepSeek-V3.2-Speciale is a special variant designed exclusively for deep reasoning tasks. This model is specifically optimized for scenarios requiring complex logical reasoning and deep thinking. For local deployment, we recommend setting the sampling parameters to temperature = 1.0, top_p = 0.95. Recommended for deep reasoning tasks, complex logical problems, and mathematical reasoning.2. SGLang Installation
SGLang offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang installation guide for installation instructions.3. Model Deployment
This section provides a progressive guide from quick deployment to performance optimization, suitable for users at different levels.3.1 Basic Configuration
Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, model variant, deployment strategy, and thinking capabilities.3.2 Configuration Tips
For more detailed configuration tips, please refer to DeepSeek-V3.2 Usage.4. Model Invocation
4.1 Basic Usage
For basic API usage and request examples, please refer to:4.2 Advanced Usage
4.2.1 Reasoning Parser
DeepSeek-V3.2 supports reasoning mode. Enable the reasoning parser during deployment to separate the thinking and content sections:Command
Example
Output
4.2.2 Tool Calling
DeepSeek-V3.2 and DeepSeek-V3.2-Exp support tool calling capabilities. Enable the tool call parser: Note: DeepSeek-V3.2-Speciale does NOT support tool calling. It is designed exclusively for deep reasoning tasks. Deployment Command:Command
--tool-call-parser deepseekv32 instead.
Python Example (with Thinking Process):
Example
Output
- The reasoning parser shows how the model decides to use a tool
- Tool calls are clearly marked with the function name and arguments
- You can then execute the function and send the result back to continue the conversation
Example
5. Benchmark
5.1 Speed Benchmark
Test Environment:- Hardware: NVIDIA B200 GPU (8x)
- Model: DeepSeek-V3.2-Exp
- Tensor Parallelism: 8
- sglang version: 0.5.6
5.1.1 Latency-Sensitive Benchmark
- Model Deployment Command:
Command
- Benchmark Command:
Command
- Test Results:
Output
5.1.2 Throughput-Sensitive Benchmark
- Model Deployment Command:
Command
- Benchmark Command:
Command
- Test Results:
Output
5.2 Accuracy Benchmark
5.2.1 GSM8K Benchmark
- Benchmark Command:
Command
- Test Results:
- DeepSeek-V3.2-Exp
Output
- DeepSeek-V3.2-Exp
5.2.2 MMLU Benchmark
- Benchmark Command:
Command
- Test Results:
- DeepSeek-V3.2-Exp
Output
- DeepSeek-V3.2-Exp
