1. Model Introduction
Ling-2.5-1T is the latest flagship instant model in the Ling family. Thinking models raise the ceiling of intelligence, while instant models expand its reach by balancing efficiency and performance—making AGI not only more powerful, but also more accessible. Ling-2.5-1T delivers comprehensive upgrades across model architecture, token efficiency, and preference alignment, designed to bring universally accessible AI to a new level of quality. Key Features:- Trillion-Scale Model: 1T total parameters with 63B active parameters (up from 51B in the previous generation). Pre-training corpus expanded from 20T to 29T tokens. Leveraging an efficient hybrid linear attention architecture (1:7 MLA + Lightning Linear Attention), the model delivers exceptionally high throughput while processing context lengths of up to 1M tokens.
- Token Efficiency: By introducing a composite reward mechanism combining “Correctness” and “Process Redundancy”, Ling-2.5-1T further pushes the frontier of efficiency-performance balance in instant models. At comparable token efficiency levels, Ling-2.5-1T’s reasoning capabilities significantly outperform its predecessor, approaching the level of frontier “thinking models” that typically consume ~4x the output tokens.
- Preference Alignment: Through refined alignment strategies—such as bidirectional RL feedback and Agent-based instruction constraint verification—Ling-2.5-1T achieves substantial improvements over the previous generation in preference alignment tasks, including creative writing and instruction following.
- Agentic Capabilities: Trained with Agentic RL in large-scale high-fidelity interactive environments, Ling-2.5-1T is compatible with mainstream agent platforms such as Claude Code, OpenCode, and OpenClaw. It achieves leading open-source performance on the general tool-calling benchmark, BFCL-V4.
- Context Length: 256K -> 1M (YaRN)
- BF16: inclusionAI/Ling-2.5-1T
2. SGLang Installation
Ling-2.5-1T requires a specific SGLang Docker image:3. Model Deployment
Ling-2.5-1T is a trillion-parameter BF16 model that requires multi-node deployment (at least 2 nodes). Use the configuration selector below to generate the deployment command for your hardware platform.Configuration Tips
- The
--trust-remote-codeflag is required for this model due to custom modeling code. --tp-sizecan be set to a maximum of 8 for this model. If you have more GPUs available, increase--pp-sizeto scale across additional nodes.- Adding
--model-loader-extra-config '{"enable_multithread_load": "true","num_threads": 64}'enables faster model loading. - On H200/GB200/GB300 with 2-node deployment,
--mem-frac 0.95is required to avoid OOM since the model occupies most of the GPU memory. For better throughput, consider 4-node deployment (ref model card for more details).
4. Model Invocation
4.1 Basic Usage
For example, launch the server on 2 H200 nodes:4.2 Tool Calling Example
5. Benchmark
GSM8K
- Benchmark Command
- Test Result
