Supported models

SGLang supports model families across text generation, retrieval, and reward workflows. Browse the sections below for the primary product paths and jump to the detail pages when you are ready to explore a specific class.

Text generation

Large language models

Production-tuned Llama and Qwen families validated for high-throughput serving.

Vision language models

Vision-text hybrids that stay responsive on multi-GPU setups.

Diffusion language models

Score-based and diffusion backbones for structured text generation workflows.

Retrieval and ranking

Embedding models

Dense and sparse embeddings optimized with FlashInfer kernels.

Rerank models

Low-latency rerankers for multi-stage retrieval pipelines.

Classification models

Lightweight classifiers covering safety, intent, and context filters.

Specialized models

Reward models

RLHF and reward scoring pipelines optimized for production latency.

SGLang for RL Systems

Large Language Models

⌘I

Basic Usage

Advanced Features

Developer Guide

References

Supported models

Text generation

Large language models

Vision language models

Diffusion language models

Retrieval and ranking

Embedding models

Rerank models

Classification models

Specialized models

Reward models

​Text generation

Large language models

Vision language models

Diffusion language models

​Retrieval and ranking

Embedding models

Rerank models

Classification models

​Specialized models

Reward models

Text generation

Retrieval and ranking

Specialized models