# SGLang Documentation ## Docs - [DeepSeek-Math-V2](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-Math-V2.md) - [DeepSeek-OCR](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-OCR.md) - [DeepSeek-OCR-2](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-OCR-2.md) - [DeepSeek-R1](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-R1.md) - [DeepSeek-V3](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V3.md) - [DeepSeek-V3.1](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V3_1.md) - [DeepSeek-V3.2](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V3_2.md) - [DeepSeek-V4](https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4.md): Deploy DeepSeek-V4 with SGLang — verified launch commands, benchmarks, and tuning for the Flash (284B) and Pro (1.6T) Mixture-of-Experts models. - [Ernie4.5](https://docs.sglang.io/cookbook/autoregressive/Ernie/Ernie4.5.md) - [Ernie4.5-VL](https://docs.sglang.io/cookbook/autoregressive/Ernie/Ernie4.5-VL.md) - [Chroma-1.0](https://docs.sglang.io/cookbook/autoregressive/FlashLabs/Chroma1.0.md) - [GLM-4.5](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.5.md) - [GLM-4.5V](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.5V.md) - [GLM-4.6](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.6.md) - [GLM-4.6V](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.6V.md) - [GLM-4.7](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.7.md) - [GLM-4.7-Flash](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-4.7-Flash.md) - [GLM-5](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-5.md) - [GLM-5.1](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-5.1.md) - [GLM Glyph](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-Glyph.md) - [GLM-OCR](https://docs.sglang.io/cookbook/autoregressive/GLM/GLM-OCR.md) - [DiffusionGemma](https://docs.sglang.io/cookbook/autoregressive/Google/DiffusionGemma.md) - [Gemma 4](https://docs.sglang.io/cookbook/autoregressive/Google/Gemma4.md) - [LLaDA 2.1](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/LLaDA-2.1.md) - [Ling-2.5-1T](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ling-2.5-1T.md) - [Ling-2.6](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ling-2.6.md) - [Ring-2.5-1T](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ring-2.5-1T.md) - [Ring-2.6-1T](https://docs.sglang.io/cookbook/autoregressive/InclusionAI/Ring-2.6-1T.md) - [Intern-S1](https://docs.sglang.io/cookbook/autoregressive/InternLM/Intern-S1.md) - [Intern-S2-Preview](https://docs.sglang.io/cookbook/autoregressive/InternLM/Intern-S2-Preview.md) - [InternVL3.5](https://docs.sglang.io/cookbook/autoregressive/InternVL/InternVL3.5.md) - [Jina-reranker-m0](https://docs.sglang.io/cookbook/autoregressive/Jina/Jina-reranker-m0.md) - [LFM2.5](https://docs.sglang.io/cookbook/autoregressive/LiquidAI/LFM2.5.md): Deploy Liquid AI's LFM2.5 with SGLang — hybrid LIV-convolution + GQA models from 350M to the 8B-A1B MoE, plus LFM2.5-VL vision, with reasoning and Pythonic tool calling. - [Llama-3.1](https://docs.sglang.io/cookbook/autoregressive/Llama/Llama3.1.md) - [Llama-3.3-70B](https://docs.sglang.io/cookbook/autoregressive/Llama/Llama3.3-70B.md) - [Llama 4](https://docs.sglang.io/cookbook/autoregressive/Llama/Llama4.md) - [MiniMax-M2](https://docs.sglang.io/cookbook/autoregressive/MiniMax/MiniMax-M2.md) - [MiniMax-M2.5](https://docs.sglang.io/cookbook/autoregressive/MiniMax/MiniMax-M2.5.md) - [MiniMax-M2.7](https://docs.sglang.io/cookbook/autoregressive/MiniMax/MiniMax-M2.7.md) - [Devstral 2 (Mistral)](https://docs.sglang.io/cookbook/autoregressive/Mistral/Devstral-2.md) - [Ministral-3](https://docs.sglang.io/cookbook/autoregressive/Mistral/Ministral-3.md) - [Mistral Medium 3.5](https://docs.sglang.io/cookbook/autoregressive/Mistral/Mistral-Medium-3.5.md) - [Mistral Small 4](https://docs.sglang.io/cookbook/autoregressive/Mistral/Mistral-Small-4.md) - [Kimi-K2](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-K2.md) - [Kimi-K2.5](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-K2.5.md) - [Kimi-K2.6](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-K2.6.md) - [Kimi-Linear](https://docs.sglang.io/cookbook/autoregressive/Moonshotai/Kimi-Linear.md) - [Nemotron3-Nano](https://docs.sglang.io/cookbook/autoregressive/NVIDIA/Nemotron3-Nano.md) - [Nemotron 3 Nano Omni](https://docs.sglang.io/cookbook/autoregressive/NVIDIA/Nemotron3-Nano-Omni.md) - [NVIDIA Nemotron3-Super](https://docs.sglang.io/cookbook/autoregressive/NVIDIA/Nemotron3-Super.md) - [NVIDIA Nemotron3-Ultra](https://docs.sglang.io/cookbook/autoregressive/NVIDIA/Nemotron3-Ultra.md): Deploy NVIDIA Nemotron3-Ultra with SGLang - 550B hybrid MoE model (55B active) with 1M context window, BF16/NVFP4 support, built for long-running autonomous agents. - [GPT-OSS](https://docs.sglang.io/cookbook/autoregressive/OpenAI/GPT-OSS.md) - [MiniCPM-V 4.6](https://docs.sglang.io/cookbook/autoregressive/OpenBMB/MiniCPM-V-4_6.md) - [Laguna-XS.2](https://docs.sglang.io/cookbook/autoregressive/Poolside/Laguna-XS.2.md) - [Qwen2.5-VL](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen2.5-VL.md) - [Qwen3](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.md) - [Qwen3-Coder](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-Coder.md) - [Qwen3-Coder-Next](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-Coder-Next.md) - [Qwen3-Next](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-Next.md) - [Qwen3-VL](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3-VL.md) - [Qwen3.5](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.5.md) - [Qwen3.6](https://docs.sglang.io/cookbook/autoregressive/Qwen/Qwen3.6.md) - [Step-3.7-Flash (new)](https://docs.sglang.io/cookbook/autoregressive/StepFun/Step-3.7-Flash.md) - [Step3-VL-10B](https://docs.sglang.io/cookbook/autoregressive/StepFun/Step3-VL-10B.md) - [Step-3.5-Flash](https://docs.sglang.io/cookbook/autoregressive/StepFun/Step3.5.md) - [Hunyuan 3 Preview](https://docs.sglang.io/cookbook/autoregressive/Tencent/Hunyuan3-Preview.md) - [MiMo-V2-Flash](https://docs.sglang.io/cookbook/autoregressive/Xiaomi/MiMo-V2-Flash.md) - [MiMo-V2.5](https://docs.sglang.io/cookbook/autoregressive/Xiaomi/MiMo-V2.5.md) - [Overview](https://docs.sglang.io/cookbook/autoregressive/intro.md): Practical guides for deploying and using large language models and vision language models with SGLang. - [Autoregressive Model Benchmark Documentation](https://docs.sglang.io/cookbook/base/benchmarks/autoregressive_model_benchmark.md) - [Diffusion Models Benchmark Documentation](https://docs.sglang.io/cookbook/base/benchmarks/diffusion_model_benchmark.md) - [Server Arguments](https://docs.sglang.io/cookbook/base/reference/server_arguments.md) - [Cosmos3](https://docs.sglang.io/cookbook/diffusion/Cosmos/Cosmos3.md) - [ERNIE-Image](https://docs.sglang.io/cookbook/diffusion/Ernie-Image/Ernie-Image.md) - [FLUX](https://docs.sglang.io/cookbook/diffusion/FLUX/FLUX.md) - [Ideogram 4](https://docs.sglang.io/cookbook/diffusion/Ideogram/Ideogram4.md) - [LTX2 & LTX2.3](https://docs.sglang.io/cookbook/diffusion/LTX/LTX2 & LTX2.3.md): Run LTX-2 and LTX-2.3 video generation pipelines with SGLang Diffusion. - [LingBot World](https://docs.sglang.io/cookbook/diffusion/LingBot-World/LingBot-World.md) - [MOVA](https://docs.sglang.io/cookbook/diffusion/MOVA/MOVA.md) - [Qwen-Image](https://docs.sglang.io/cookbook/diffusion/Qwen-Image/Qwen-Image.md) - [Qwen-Image-Edit-2511](https://docs.sglang.io/cookbook/diffusion/Qwen-Image/Qwen-Image-Edit.md) - [SANA-WM](https://docs.sglang.io/cookbook/diffusion/SANA-WM/SANA-WM.md) - [Wan2.1](https://docs.sglang.io/cookbook/diffusion/Wan/Wan2.1.md) - [Wan2.2](https://docs.sglang.io/cookbook/diffusion/Wan/Wan2.2.md) - [Z-Image-Turbo](https://docs.sglang.io/cookbook/diffusion/Z-Image/Z-Image-Turbo.md) - [Overview](https://docs.sglang.io/cookbook/diffusion/intro.md): Practical guides for deploying and using diffusion models with SGLang. - [SGLang Cookbook](https://docs.sglang.io/cookbook/intro.md) - [SpecBundle Usage](https://docs.sglang.io/cookbook/specbundle/specbundle_usage.md) - [Supported Models](https://docs.sglang.io/cookbook/specbundle/supported_models.md) - [Adaptive Speculative Decoding](https://docs.sglang.io/docs/advanced_features/adaptive_speculative_decoding.md) - [Attention Backend](https://docs.sglang.io/docs/advanced_features/attention_backend.md) - [Breakable CUDA Graph](https://docs.sglang.io/docs/advanced_features/breakable_cuda_graph.md) - [Checkpoint Engine Integration](https://docs.sglang.io/docs/advanced_features/checkpoint_engine.md) - [Cuda Graph for Multi-Modal Encoder in SGLang](https://docs.sglang.io/docs/advanced_features/cuda_graph_for_multi_modal_encoder.md) - [Deterministic Inference](https://docs.sglang.io/docs/advanced_features/deterministic_inference.md) - [DP, DPA and SGLang DP Router](https://docs.sglang.io/docs/advanced_features/dp_dpa_smg_guide.md) - [DP for Multi-Modal Encoder in SGLang](https://docs.sglang.io/docs/advanced_features/dp_for_multi_modal_encoder.md) - [EPD Disaggregation](https://docs.sglang.io/docs/advanced_features/epd_disaggregation.md) - [Expert Parallelism](https://docs.sglang.io/docs/advanced_features/expert_parallelism.md) - [Hierarchical KV Caching (HiCache)](https://docs.sglang.io/docs/advanced_features/hicache.md) - [SGLang HiCache Best Practices](https://docs.sglang.io/docs/advanced_features/hicache_best_practices.md) - [HiCache System Design and Optimization](https://docs.sglang.io/docs/advanced_features/hicache_design.md) - [Runtime Attach/Detach HiCache Storage Backend (No Restart)](https://docs.sglang.io/docs/advanced_features/hicache_storage_runtime_attach_detach.md) - [HiSparse: Hierarchical Sparse Attention](https://docs.sglang.io/docs/advanced_features/hisparse_guide.md) - [Hyperparameter Tuning](https://docs.sglang.io/docs/advanced_features/hyperparameter_tuning.md) - [LoRA Serving](https://docs.sglang.io/docs/advanced_features/lora.md) - [Loading Models from Object Storage](https://docs.sglang.io/docs/advanced_features/object_storage.md) - [Observability](https://docs.sglang.io/docs/advanced_features/observability.md) - [Advanced Features](https://docs.sglang.io/docs/advanced_features/overview.md): Advanced configuration, optimization, and deployment features for SGLang. - [PD Disaggregation](https://docs.sglang.io/docs/advanced_features/pd_disaggregation.md) - [Piecewise CUDA Graph](https://docs.sglang.io/docs/advanced_features/piecewise_cuda_graph.md) - [Pipeline Parallelism for Long Context](https://docs.sglang.io/docs/advanced_features/pipeline_parallelism.md) - [Quantization](https://docs.sglang.io/docs/advanced_features/quantization.md) - [Quantized KV Cache](https://docs.sglang.io/docs/advanced_features/quantized_kv_cache.md) - [Reasoning Parser](https://docs.sglang.io/docs/advanced_features/separate_reasoning.md) - [Server Arguments](https://docs.sglang.io/docs/advanced_features/server_arguments.md) - [SGLang Model Gateway](https://docs.sglang.io/docs/advanced_features/sgl_model_gateway.md) - [SGLang for RL Systems](https://docs.sglang.io/docs/advanced_features/sglang_for_rl.md) - [Speculative Decoding](https://docs.sglang.io/docs/advanced_features/speculative_decoding.md) - [Structured Outputs](https://docs.sglang.io/docs/advanced_features/structured_outputs.md) - [Structured Outputs For Reasoning Models](https://docs.sglang.io/docs/advanced_features/structured_outputs_for_reasoning_models.md) - [Tool Parser](https://docs.sglang.io/docs/advanced_features/tool_parser.md) - [Query VLM with Offline Engine](https://docs.sglang.io/docs/advanced_features/vlm_query.md) - [SGLang Native APIs](https://docs.sglang.io/docs/basic_usage/native_api.md) - [Offline Engine API](https://docs.sglang.io/docs/basic_usage/offline_engine_api.md) - [Ollama-Compatible API](https://docs.sglang.io/docs/basic_usage/ollama_api.md) - [OpenAI-Compatible APIs](https://docs.sglang.io/docs/basic_usage/openai_api.md): Documentation for OpenAI-Compatible APIs - [OpenAI APIs - Completions](https://docs.sglang.io/docs/basic_usage/openai_api_completions.md) - [OpenAI APIs - Embedding](https://docs.sglang.io/docs/basic_usage/openai_api_embeddings.md) - [OpenAI APIs - Vision](https://docs.sglang.io/docs/basic_usage/openai_api_vision.md) - [Basic Usage](https://docs.sglang.io/docs/basic_usage/overview.md): Core APIs and common usage patterns for SGLang. - [Sampling Parameters](https://docs.sglang.io/docs/basic_usage/sampling_params.md) - [Tutorial: Sending a request](https://docs.sglang.io/docs/basic_usage/send_request.md) - [Bench Serving Guide](https://docs.sglang.io/docs/developer_guide/bench_serving.md) - [Benchmark and Profiling](https://docs.sglang.io/docs/developer_guide/benchmark_and_profiling.md) - [Contribution Guide](https://docs.sglang.io/docs/developer_guide/contribution_guide.md) - [Development Guide Using Docker](https://docs.sglang.io/docs/developer_guide/development_guide_using_docker.md) - [Development Guide for JIT Kernels](https://docs.sglang.io/docs/developer_guide/development_jit_kernel_guide.md) - [Evaluating New Models with SGLang](https://docs.sglang.io/docs/developer_guide/evaluating_new_models.md) - [MSProbe Debugging Guide](https://docs.sglang.io/docs/developer_guide/msprobe_debugging_guide.md) - [Developer Guide](https://docs.sglang.io/docs/developer_guide/overview.md): Contributing to SGLang — development setup, benchmarking, and evaluation. - [Installation](https://docs.sglang.io/docs/get-started/install.md): Install SGLang with pip/uv, source, Docker, Kubernetes, and cloud deployment options. - [Quickstart](https://docs.sglang.io/docs/get-started/quickstart.md): Get up and running with SGLang in minutes: install, launch a server, and send your first request. - [AMD GPUs](https://docs.sglang.io/docs/hardware-platforms/amd_gpu.md) - [Apple Silicon with Metal](https://docs.sglang.io/docs/hardware-platforms/apple_metal.md) - [Contribution Guide](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_contribution_guide.md) - [SGLang installation with NPUs support](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu.md) - [Ascend NPU Accuracy Evaluation](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_accuracy_evaluation.md) - [Best Practice on Ascend NPU](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_best_practice.md) - [DeepSeek Examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_deepseek_example.md) - [Environment Variables](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_environment_variables.md) - [Ascend NPU Troubleshooting and FAQ](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_faq.md) - [GLM-5 examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_glm5_examples.md) - [Ascend NPU Operator Development Guide](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_operator_development.md): How to develop custom operators (Ascend C / Triton) for Ascend NPU and integrate them into the SGLang inference engine. - [Operator Performance Optimizing Guidance](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_operator_performance_optimizing.md) - [Ascend NPU Optimization](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_optimization.md) - [Ascend NPU Performance Testing](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_performance_testing.md) - [Ascend NPU Performance Profiling Guide](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_profiling.md) - [Quantization on Ascend](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_quantization.md) - [Ascend NPU Quickstart](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_quick_start.md) - [Qwen3.5 examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_qwen3_5_examples.md) - [Qwen3 Examples](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_qwen3_examples.md) - [Ascend NPU Ring-SP Performance (Wan2.1-T2V-1.3B)](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_ring_sp_performance.md) - [Support Features on Ascend NPU](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_support_features.md) - [Support Models on Ascend NPU](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_support_models.md) - [How to Support New Models](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/ascend_npu_support_new_models.md): This document explains how to add support for new language models and multimodal large language models (MLLMs) in SGLang. It also covers how to test new models and register external implementations. - [Mindspore backend](https://docs.sglang.io/docs/hardware-platforms/ascend-npus/mindspore_backend.md) - [CPU Servers](https://docs.sglang.io/docs/hardware-platforms/cpu_server.md) - [Moore Threads GPUs](https://docs.sglang.io/docs/hardware-platforms/mthreads_gpu.md) - [NVIDIA GPUs](https://docs.sglang.io/docs/hardware-platforms/nvidia-gpus.md) - [NVIDIA Jetson Orin](https://docs.sglang.io/docs/hardware-platforms/nvidia_jetson.md): Guide for installing and running SGLang on NVIDIA Jetson Orin devices. - [Hardware Platforms](https://docs.sglang.io/docs/hardware-platforms/overview.md): Platform-specific guides for running SGLang on GPUs, TPUs, NPUs, CPUs, and more. - [SGLang Plugin System](https://docs.sglang.io/docs/hardware-platforms/plugin.md) - [TPU](https://docs.sglang.io/docs/hardware-platforms/tpu.md): SGLang supports high-performance TPU inference through the SGLang-JAX backend, which is specifically optimized for Google Cloud TPUs. The JAX-based implementation delivers exceptional throughput and low latency for Large Language Model (LLM) serving workloads on TPU hardware. - [XPU](https://docs.sglang.io/docs/hardware-platforms/xpu.md) - [Custom Chat Template](https://docs.sglang.io/docs/references/custom_chat_template.md) - [Environment Variables](https://docs.sglang.io/docs/references/environment_variables.md) - [Troubleshooting and Frequently Asked Questions](https://docs.sglang.io/docs/references/faq.md) - [Choices Methods in SGLang](https://docs.sglang.io/docs/references/frontend/choices_methods.md) - [Frontend Language](https://docs.sglang.io/docs/references/frontend/frontend_index.md) - [SGLang Frontend Language](https://docs.sglang.io/docs/references/frontend/frontend_tutorial.md) - [Deploy On Kubernetes](https://docs.sglang.io/docs/references/multi_node_deployment/deploy_on_k8s.md) - [LWS Based PD Deploy](https://docs.sglang.io/docs/references/multi_node_deployment/lws_pd/lws_pd_deploy.md) - [Multi-Node Deployment](https://docs.sglang.io/docs/references/multi_node_deployment/multi_node.md) - [Multi-Node Deployment](https://docs.sglang.io/docs/references/multi_node_deployment/multi_node_index.md) - [DeepSeekV32-Exp RBG Based PD Deploy](https://docs.sglang.io/docs/references/multi_node_deployment/rbg_pd/deepseekv32_pd.md) - [References](https://docs.sglang.io/docs/references/overview.md): FAQ, environment variables, production metrics, deployment guides, and more. - [Post-Training Integration](https://docs.sglang.io/docs/references/post_training_integration.md) - [Production Metrics](https://docs.sglang.io/docs/references/production_metrics.md) - [Production Request Tracing](https://docs.sglang.io/docs/references/production_request_trace.md) - [CLI reference](https://docs.sglang.io/docs/sglang-diffusion/api/cli.md): Run one-off generation tasks and launch the HTTP server from the command line. - [OpenAI API](https://docs.sglang.io/docs/sglang-diffusion/api/openai_api.md): Image and video generation endpoints with LoRA adapter management. - [Post-Processing](https://docs.sglang.io/docs/sglang-diffusion/api/post_processing.md) - [Attention Backends](https://docs.sglang.io/docs/sglang-diffusion/attention_backends.md): Select and configure attention backends for SGLang diffusion pipelines. - [Cache-DiT Acceleration](https://docs.sglang.io/docs/sglang-diffusion/cache_dit.md): Configure Cache-DiT acceleration for diffusion inference. - [Caching Acceleration](https://docs.sglang.io/docs/sglang-diffusion/caching-acceleration.md): Compare caching acceleration strategies for diffusion models. - [CI Performance Baselines](https://docs.sglang.io/docs/sglang-diffusion/ci_perf.md): Generate and update diffusion performance baselines used in CI. - [Supported Models and Optimization Compatibility](https://docs.sglang.io/docs/sglang-diffusion/compatibility_matrix.md): Check supported SGLang Diffusion models and their optimization compatibility. - [Contributing to SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/contributing.md) - [Deployment and Performance Modes](https://docs.sglang.io/docs/sglang-diffusion/deployment_cookbook.md): Choose CPU offload, FSDP, CFG parallelism, SP, TP, and performance-mode presets in SGLang Diffusion. - [Disaggregated Diffusion Pipeline](https://docs.sglang.io/docs/sglang-diffusion/disaggregation.md) - [Inference Batching](https://docs.sglang.io/docs/sglang-diffusion/dynamic_batching.md): Batch compatible native SGLang-Diffusion requests during serving. - [Environment Variables](https://docs.sglang.io/docs/sglang-diffusion/environment_variables.md): Configure SGLang diffusion behavior with environment variables. - [SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/index.md): Accelerated image and video generation with diffusion models. - [Install SGLang Diffusion](https://docs.sglang.io/docs/sglang-diffusion/installation.md): Install SGLang Diffusion on NVIDIA, AMD, MUSA, and Ascend platforms. - [Performance Optimization](https://docs.sglang.io/docs/sglang-diffusion/performance-optimization.md): Choose performance levers for SGLang Diffusion by latency, throughput, memory, and quality tradeoffs. - [Profiling](https://docs.sglang.io/docs/sglang-diffusion/profiling.md): Profile SGLang diffusion workloads with PyTorch Profiler and Nsight Systems. - [Progressive Resolution Generation](https://docs.sglang.io/docs/sglang-diffusion/progressive_resolution.md): Experimental spectral progressive resolution growing for selected SGLang Diffusion pipelines. - [Quantization](https://docs.sglang.io/docs/sglang-diffusion/quantization.md) - [Sequence Parallelism](https://docs.sglang.io/docs/sglang-diffusion/ring_sp_performance.md) - [How to Support New Diffusion Models](https://docs.sglang.io/docs/sglang-diffusion/support_new_models.md) - [TeaCache Acceleration](https://docs.sglang.io/docs/sglang-diffusion/teacache.md): Configure TeaCache for temporal similarity-based diffusion acceleration. - [Supported models](https://docs.sglang.io/docs/supported-models.md): See which families of SGLang-compatible models are actively maintained. - [Classification Models](https://docs.sglang.io/docs/supported-models/classify_models.md) - [Diffusion language models](https://docs.sglang.io/docs/supported-models/diffusion_language_models.md) - [Embedding models](https://docs.sglang.io/docs/supported-models/embedding_models.md): Dense and sparse embedding models with FlashInfer acceleration and SGLang's batching infrastructure. - [Large Language Models](https://docs.sglang.io/docs/supported-models/generative_models.md) - [MindSpore Models](https://docs.sglang.io/docs/supported-models/mindspore_models.md) - [Use Models From ModelScope](https://docs.sglang.io/docs/supported-models/modelscope.md) - [Multimodal Language Models](https://docs.sglang.io/docs/supported-models/multimodal_language_models.md) - [Rerank models](https://docs.sglang.io/docs/supported-models/rerank_models.md) - [Reward models](https://docs.sglang.io/docs/supported-models/reward_models.md) - [How to Support New Models](https://docs.sglang.io/docs/supported-models/support_new_models.md): This document explains how to add support for new language models and multimodal large language models (MLLMs) in SGLang. It also covers how to test new models and register external implementations. - [Transformers Fallback in SGLang](https://docs.sglang.io/docs/supported-models/transformers_fallback.md) - [Welcome to SGLang](https://docs.sglang.io/index.md): High-performance serving framework for large language and multimodal models.