--is-embedding and some may require --trust-remote-code.
Example launch Command
Supported models
| Model Family (Reward) | Example HuggingFace Identifier | Description |
|---|---|---|
Llama (3.1 Reward / LlamaForSequenceClassification) | Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 | Reward model (preference classifier) based on Llama 3.1 (8B) for scoring and ranking responses for RLHF. |
Gemma 2 (27B Reward / Gemma2ForSequenceClassification) | Skywork/Skywork-Reward-Gemma-2-27B-v0.2 | Derived from Gemma‑2 (27B), this model provides human preference scoring for RLHF and multilingual tasks. |
InternLM 2 (Reward / InternLM2ForRewardMode) | internlm/internlm2-7b-reward | InternLM 2 (7B)–based reward model used in alignment pipelines to guide outputs toward preferred behavior. |
Qwen2.5 (Reward - Math / Qwen2ForRewardModel) | Qwen/Qwen2.5-Math-RM-72B | A 72B math-specialized RLHF reward model from the Qwen2.5 series, tuned for evaluating and refining responses. |
Qwen2.5 (Reward - Sequence / Qwen2ForSequenceClassification) | jason9693/Qwen2.5-1.5B-apeach | A smaller Qwen2.5 variant used for sequence classification, offering an alternative RLHF scoring mechanism. |
