Skip to main content
This document describes the /v1/classify API endpoint in SGLang, which is compatible with vLLM’s classification API format.

Overview

The classification API allows you to classify text inputs using classification models. This implementation follows the same format as vLLM’s 0.7.0 classification API.

API endpoint

Output
POST /v1/classify

Request format

Config
{
  "model": "model_name",
  "input": "text to classify"
}

Parameters

model
string
required
The name of the classification model to use.
input
string
required
The text to classify.
user
string
User identifier for tracking.
rid
string
Request ID for tracking.
priority
integer
Request priority.

Response format

Config
{
  "id": "classify-9bf17f2847b046c7b2d5495f4b4f9682",
  "object": "list",
  "created": 1745383213,
  "model": "jason9693/Qwen2.5-1.5B-apeach",
  "data": [
    {
      "index": 0,
      "label": "Default",
      "probs": [0.565970778465271, 0.4340292513370514],
      "num_classes": 2
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

Response fields

id
string
required
Unique identifier for the classification request.
object
string
required
Always "list".
created
integer
required
Unix timestamp when the request was created.
model
string
required
The model used for classification.
data
object[]
required
Array of classification results.
usage
object
required
Token usage information.

Example usage

Command
curl -v "http://127.0.0.1:8000/v1/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jason9693/Qwen2.5-1.5B-apeach",
    "input": "Loved the new café—coffee was great."
  }'

Supported models

The classification API works with any classification model supported by SGLang, including:
ModelType
LlamaForSequenceClassificationMulti-class classification
Qwen2ForSequenceClassificationMulti-class classification
Qwen3ForSequenceClassificationMulti-class classification
BertForSequenceClassificationMulti-class classification
Gemma2ForSequenceClassificationMulti-class classification
The API automatically uses the id2label mapping from the model’s config.json file to provide meaningful label names instead of generic class names. If id2label is not available, it falls back to LABEL_0, LABEL_1, etc., or Class_0, Class_1 as a last resort.

Error handling

The API returns appropriate HTTP status codes and error messages:
Status codeMeaning
400 Bad RequestInvalid request format or missing required fields
500 Internal Server ErrorServer-side processing error
Error response format:
Config
{
  "error": "Error message",
  "type": "error_type",
  "code": 400
}

Implementation details

Handles routing and request/response models in sgl-model-gateway/src/protocols/spec.rs.
Implements the actual endpoint in python/sglang/srt/entrypoints/http_server.py.
Handles the classification logic in python/sglang/srt/entrypoints/openai/serving_classify.py.

Testing

Use the provided test script to verify the implementation:
python test_classify_api.py

Compatibility

This implementation is compatible with vLLM’s classification API format, allowing seamless migration from vLLM to SGLang for classification tasks.