1. Model Introduction
DeepSeek V3 is a large-scale Mixture-of-Experts (MoE) language model developed by DeepSeek, designed to deliver strong general-purpose reasoning, coding, and tool-augmented capabilities with high training and inference efficiency. As the latest generation in the DeepSeek model family, DeepSeek V3 introduces systematic architectural and training innovations that significantly improve performance across reasoning, mathematics, coding, and long-context understanding, while maintaining a competitive compute cost. Key highlights include:- Efficient MoE architecture: DeepSeek V3 adopts a fine-grained Mixture-of-Experts design with a large number of experts and sparse activation, enabling high model capacity while keeping inference and training costs manageable.
- Advanced reasoning and coding: The model demonstrates strong performance on mathematical reasoning, logical inference, and real-world coding benchmarks, benefiting from improved data curation and training strategies.
- Long-context capability: DeepSeek V3 supports extended context lengths, allowing it to handle long documents, complex multi-step reasoning, and agent-style workflows more effectively.
- Tool use and function calling: The model is trained to support structured outputs and tool invocation, enabling seamless integration with external tools and agent frameworks during inference.
2. SGLang Installation
SGLang offers multiple installation methods. You can choose the most suitable installation method based on your hardware platform and requirements. Please refer to the official SGLang installation guide for installation instructions.3. Model Deployment
This section provides a progressive guide from quick deployment to performance optimization, suitable for users at different levels.3.1 Basic Configuration
Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, model variant, deployment strategy, and thinking capabilities.3.2 Configuration Tips
For more detailed configuration tips, please refer to DeepSeek-V3 Usage.4. Model Invocation
4.1 Basic Usage
For basic API usage and request examples, please refer to:4.2 Advanced Usage
4.2.1 Reasoning Parser
DeepSeek-V3 supports reasoning mode. Enable the reasoning parser during deployment to separate the thinking and content sections:Command
Example
Output
4.2.2 Tool Calling
DeepSeek-V3 supports tool calling capabilities. Enable the tool call parser: Deployment Command:Command
Example
Output
- The reasoning parser shows how the model decides to use a tool
- Tool calls are clearly marked with the function name and arguments
- You can then execute the function and send the result back to continue the conversation
Example
5. Benchmark
5.1 Speed Benchmark
Test Environment:- Hardware: AMD MI300X GPU (8x)
- Model: DeepSeek-V3
- Tensor Parallelism: 8
- sglang version: 0.5.7
5.1.1 Latency-Sensitive Benchmark
- Model Deployment Command:
Command
- Benchmark Command:
Command
- Test Results:
Output
5.1.2 Throughput-Sensitive Benchmark
- Model Deployment Command:
Command
- Benchmark Command:
Command
- Test Results:
Output
5.2 Accuracy Benchmark
5.2.1 GSM8K Benchmark
- Benchmark Command:
Command
- Test Results:
- DeepSeek-V3
Output
- DeepSeek-V3
5.2.2 MMLU Benchmark
- Benchmark Command:
Command
- Test Results:
- DeepSeek-V3
Output
- DeepSeek-V3
