1. Model Introduction
Intern-S2-Preview is an efficient 35B scientific multimodal foundation model. Beyond conventional parameter and data scaling, Intern-S2-Preview explores task scaling: increasing the difficulty, diversity, and coverage of scientific tasks to further unlock model capabilities. Resources:- HuggingFace: internLM/Intern-S2-Preview
2. SGLang Installation
SGLang offers multiple installation methods. Please refer to the official SGLang installation guide for installation instructions. Install SGLang from source or use an NVIDIA Docker image:Command
sglang serve ... with whatever the command generator below produces):
Command
3. Model Deployment
3.1 Basic Configuration
Interactive Command Generator: Use the selector below to generate the deployment command for your hardware and parser configuration.3.2 Configuration Tips
- Use
tp>=2for the NVIDIA deployment commands. - Use
--reasoning-parser qwen3to separate reasoning content from final content in streaming responses. - Use
--tool-call-parser qwen3_coderwhen serving tool-calling workloads. - Add
--mamba-scheduler-strategy extra_bufferwith--speculative-algo 'NEXTN'to enable MTP. - If weight loading is slow, add
--model-loader-extra-config='{"enable_multithread_load": "true", "num_threads": 64}'.
4. Model Invocation
4.1 Basic Usage
For basic API usage and request examples, see:4.2 Advanced Usage
4.2.1 Vision Input
Intern-S2-Preview supports image inputs. Here is an example with an image:Example
4.2.2 Reasoning Parser
Enable streaming to read reasoning content separately from the final answer:Example
4.2.3 Tool Calling
Serve with--tool-call-parser qwen3_coder enabled, then send OpenAI-compatible tool requests:
Example
