1. Model Introduction
Ring-2.5-1T is the world’s first open-source trillion-parameter reasoning model based on hybrid linear attention architecture, developed by InclusionAI. Building on Ring-1T, Ring-2.5-1T demonstrates substantial improvements in generation efficiency, reasoning depth, and long-horizon task execution capabilities. Key Features:- Trillion-Scale Model: ~1T total parameters with 63B activation parameters using a hybrid linear attention architecture (1:7 MLA + Lightning Linear Attention)
- Generation Efficiency: Reduces memory access overhead by over 10x and increases generation throughput by more than 3x for sequences exceeding 32K tokens
- Deep Reasoning: Achieves gold medal level for both IMO 2025 and CMO 2025, with dense rewards for rigorous reasoning process feedback
- Long-horizon Task Execution: Enhanced autonomous execution capability through large-scale fully-async agentic RL training
- Tool Calling: Supports function calling with XML-style tool call format
- Context Length: 128K -> 256K (YaRN)
- FP8 (8-bit quantized): inclusionAI/Ring-2.5-1T
2. SGLang Installation
Ring-2.5-1T requires a specific SGLang Docker image:3. Model Deployment
This section provides deployment configurations optimized for different hardware platforms.3.1 Basic Configuration
Interactive Command Generator: Use the configuration selector below to automatically generate the appropriate deployment command for your hardware platform.3.2 Configuration Tips
- The
--trust-remote-codeflag is required for this model due to custom modeling code. - The model uses FP8 quantization (compressed-tensors format).
4. Model Invocation
Deploy Ring-2.5-1T with the following command (on H200, all features enabled):4.1 Basic Usage
For basic API usage and request examples, please refer to:4.2 Advanced Usage
4.2.1 Reasoning Parser
To enable reasoning output separation, add--reasoning-parser deepseek-r1 when launching the server. The thinking process is returned via reasoning_content in the streaming response.
4.2.2 Tool Calling
To enable tool calling, add--tool-call-parser qwen when launching the server.
5. Benchmark
GSM8K
- Deployment Command
- Benchmark Command
- Test Result
