Text generation

Large language models
Production-tuned Llama and Qwen families validated for high-throughput
serving.

Vision language models
Vision-text hybrids that stay responsive on multi-GPU setups.

Diffusion language models
Score-based and diffusion backbones for structured text generation
workflows.
Retrieval and ranking

Embedding models
Dense and sparse embeddings optimized with FlashInfer kernels.

Rerank models
Low-latency rerankers for multi-stage retrieval pipelines.

Classification models
Lightweight classifiers covering safety, intent, and context filters.
Specialized models

Reward models
RLHF and reward scoring pipelines optimized for production latency.
