Launch A Server
Launch the server in your terminal and wait for it to initialize.Example
OpenAI or other API endpoints.
Example
Basic Usage
The most simple way of using SGLang frontend language is a simple question answer dialog between a user and an assistant.Example
Example
Multi-turn Dialog
SGLang frontend language can also be used to define multi-turn dialogs.Example
Control flow
You may use any Python code within the function to define more complex control flows.Example
Parallelism
Usefork to launch parallel prompts. Because sgl.gen is non-blocking, the for loop below issues two generation calls in parallel.
Example
Constrained Decoding
Useregex to specify a regular expression as a decoding constraint. This is only supported for local models.
Example
regex to define a JSON decoding schema.
Example
Batching
Userun_batch to run a batch of prompts.
Example
Streaming
Usestream to stream the output to the user.
Example
Complex Prompts
You may use{system|user|assistant}_{begin|end} to define complex prompts.
Example
Example
Multi-modal Generation
You may use SGLang frontend language to define multi-modal prompts. See here for supported models.Example
Example
Example
Example
