- Add authentication middleware with API key validation - Add request logging middleware for observability - Add audit logging middleware for admin operations - Refactor API endpoints to use centralized auth middleware - Add comprehensive unit tests for all middleware - Add API documentation and deployment guide - Update README with health endpoints and documentation links - Fix test data isolation in router tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.7 KiB
LLM Gateway API Documentation
Overview
LLM Gateway provides a unified API for interacting with multiple LLM providers. It supports three API formats:
- OpenAI-compatible Chat Completions API (
/v1/chat/completions) - Anthropic Messages API (
/v1/messages) - OpenAI Responses API (
/v1/responses)
Authentication
All API requests require authentication using a Virtual API Key. Include your key in one of two ways:
Bearer Token (Recommended)
curl -X POST https://gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer sk_your_virtual_key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'
X-API-Key Header
curl -X POST https://gateway.example.com/v1/chat/completions \
-H "X-API-Key: sk_your_virtual_key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'
Chat Completions API
POST /v1/chat/completions
OpenAI-compatible chat completions endpoint.
Request Body:
{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 1000,
"stream": false
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model alias or provider:model format |
| messages | array | Yes | Array of message objects |
| temperature | number | No | Sampling temperature (0-2) |
| max_tokens | integer | No | Maximum tokens to generate |
| stream | boolean | No | Enable streaming response |
| tools | array | No | Tool definitions for function calling |
| tool_choice | string/object | No | Tool selection behavior |
Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 15,
"total_tokens": 35
}
}
Anthropic Messages API
POST /v1/messages
Anthropic Messages API compatible endpoint.
Request Body:
{
"model": "claude-3-opus",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, Claude!"}
],
"system": "You are a helpful assistant."
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model alias or provider:model format |
| max_tokens | integer | Yes | Maximum tokens to generate |
| messages | array | Yes | Array of message objects |
| system | string | No | System prompt |
| temperature | number | No | Sampling temperature (0-1) |
| tools | array | No | Tool definitions |
| tool_choice | object | No | Tool selection behavior |
Response:
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you today?"
}
],
"model": "claude-3-opus-20240229",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 15,
"output_tokens": 10
}
}
OpenAI Responses API
POST /v1/responses
OpenAI Responses API compatible endpoint (new format).
Request Body:
{
"model": "gpt-4",
"input": "What is the capital of France?",
"instructions": "Be concise and accurate."
}
Response:
{
"id": "resp_abc123",
"object": "response",
"created": 1234567890,
"model": "gpt-4",
"output": "The capital of France is Paris.",
"usage": {
"input_tokens": 20,
"output_tokens": 10,
"total_tokens": 30
}
}
Admin API
Admin APIs are used to manage providers, projects, API keys, and model aliases.
Providers
List Providers
GET /admin/providers
Query parameters:
page(default: 1)page_size(default: 20)enabled(optional, filter by status)
Create Provider
POST /admin/providers
{
"name": "openai",
"api_base": "https://api.openai.com/v1",
"api_key": "sk-xxx",
"api_version": null,
"rpm_limit": 500,
"tpm_limit": 150000,
"enabled": true
}
Update Provider
PUT /admin/providers/{provider_id}
Delete Provider
DELETE /admin/providers/{provider_id}
Projects
List Projects
GET /admin/projects
Create Project
POST /admin/projects
{
"name": "My Project",
"description": "Project description",
"budget_limit": 100.00,
"budget_period": "monthly"
}
API Keys
List API Keys
GET /admin/keys
Create API Key
POST /admin/keys
{
"name": "Production Key",
"project_id": "project-uuid",
"rpm_limit": 100,
"tpm_limit": 50000,
"budget_limit": 50.00,
"allowed_models": ["gpt-4", "claude-3-opus"]
}
Response includes the full key (only shown once):
{
"id": "key-uuid",
"name": "Production Key",
"key": "sk_prod_abc123...",
"key_prefix": "sk_prod_abc...",
"enabled": true,
"created_at": "2026-05-01T00:00:00Z"
}
Delete API Key
DELETE /admin/keys/{key_id}
Model Aliases
List Model Aliases
GET /admin/models/aliases
Create Model Alias
POST /admin/models/aliases
{
"alias": "smart-model",
"provider": "openai",
"model": "gpt-4-turbo",
"enabled": true,
"routing_type": "simple",
"input_price_per_1k": 0.01,
"output_price_per_1k": 0.03
}
Routing Types:
simple- Direct mapping to a single provider/modelload_balance- Distribute across multiple providersfallback- Try providers in order until success
Load Balance Config:
{
"routing_type": "load_balance",
"routing_config": {
"targets": [
{"provider": "openai", "model": "gpt-4", "weight": 0.7},
{"provider": "azure", "model": "gpt-4", "weight": 0.3}
]
}
}
Fallback Config:
{
"routing_type": "fallback",
"routing_config": {
"chain": [
{"provider": "openai", "model": "gpt-4"},
{"provider": "anthropic", "model": "claude-3-opus"}
]
}
}
Usage Statistics
Get Usage Stats
GET /admin/usage/stats
Query parameters:
start_date(ISO date)end_date(ISO date)group_by(hour, day, provider, model, key)
Health Check
GET /health
{
"status": "healthy",
"version": "0.1.0",
"providers": {
"openai": "healthy",
"anthropic": "healthy"
}
}
Error Responses
All errors follow a consistent format:
{
"detail": {
"error": {
"type": "error_type",
"message": "Human readable error message",
"details": {}
}
}
}
Common Error Types
| Status | Type | Description |
|---|---|---|
| 401 | authentication_error | Invalid or missing API key |
| 403 | permission_error | API key disabled or expired |
| 402 | budget_exceeded_error | Budget limit reached |
| 429 | rate_limit_error | Rate limit exceeded |
| 503 | service_unavailable | Provider unavailable |
| 502 | provider_error | Upstream provider error |
Rate Limiting
Rate limits are applied per API key. Response headers include:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1714521600
When rate limited, the response includes:
{
"detail": {
"error": {
"type": "rate_limit_error",
"message": "Rate limit exceeded",
"details": {
"limit": 100,
"remaining": 0,
"reset_at": "2026-05-01T00:00:00Z"
}
}
}
}