1. Model Introduction
DeepSeek-OCR-2 is DeepSeek’s next-generation OCR (Optical Character Recognition) model, building on DeepSeek-OCR with improved accuracy and broader document understanding capabilities. The model is optimized for high-accuracy text extraction from images across a wide variety of document types and formats. Key Features:- Semantic-Aware Visual Encoding (DeepEncoder V2): DeepSeek-OCR-2 introduces DeepEncoder V2, which models document reading order in a more human-like, semantic-driven manner rather than relying on fixed raster scanning. This significantly improves logical reading flow in complex layouts (e.g., multi-column documents).
- Stronger Layout and Structural Understanding: DeepSeek-OCR-2 demonstrates improved performance on structured documents such as tables, forms, and dense multi-column pages. It reduces reading-order errors and improves overall document parsing robustness compared to the original version.
- Improved Accuracy While Maintaining Token Efficiency: The original DeepSeek-OCR emphasized aggressive visual token compression. OCR-2 maintains high token efficiency while delivering higher benchmark performance, particularly on document-level understanding tasks.
- Better Generalization Across Complex Document Tasks: DeepSeek-OCR-2 performs more consistently across multilingual documents, structured data extraction, and visually complex content, making it more suitable for real-world document intelligence scenarios beyond plain text OCR.
- Base Model: deepseek-ai/DeepSeek-OCR-2 - Recommended for OCR tasks
2. SGLang Installation
Please refer to the official SGLang installation guide for installation instructions.3. Model Deployment
This section provides deployment configurations optimized for different hardware platforms and use cases.3.1 Basic Configuration
Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform, quantization method, and deployment strategy. Note: DeepSeek-OCR-2 has ~3.58B parameters and easily fits on a single modern GPU. For low-latency serving, no model parallelism is needed. For high-throughput requirements, consider using data parallelism with the SGLang Model Gateway — see DP, DPA and SGLang DP Router for more details.3.2 Configuration Tips
For more detailed configuration tips, please refer to DeepSeek V3/V3.1/R1 Usage.4. Model Invocation
4.1 Basic Usage
OpenAI-compatible request example4.2 Recommended Prompts
The following prompts are recommended by the official model card. Structured document conversion — extracts text while preserving layout:5. Benchmark
5.1 Speed Benchmark
Test Environment:- Hardware: NVIDIA H200 GPU (1x)
- Model: DeepSeek-OCR-2
- Tensor Parallelism: 1
- sglang version: 0.0.0.dev1+g93fca0bbc
5.1.1 Latency-Sensitive Benchmark
- Model Deployment Command:
- Benchmark Command:
- Test Results:
5.1.2 Throughput-Sensitive Benchmark
- Model Deployment Command:
- Benchmark Command:
- Test Results:
