- Offline Batch Inference
- Custom Server on Top of the Engine
- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation
Nest Asyncio
Note that if you want to use Offline Engine in ipython or some other nested loop code, you need to add the following code:Example
Advanced Usage
The engine supports vlm inference as well as extracting hidden states. Please see the examples for further use cases.Offline Batch Inference
SGLang offline engine supports batch inference with efficient scheduling.Example
Non-streaming Synchronous Generation
Example
Streaming Synchronous Generation
Example
Non-streaming Asynchronous Generation
Example
Streaming Asynchronous Generation
Example
Example
