# LLM Gateway API Documentation ## Overview LLM Gateway provides a unified API for interacting with multiple LLM providers. It supports three API formats: - **OpenAI-compatible Chat Completions API** (`/v1/chat/completions`) - **Anthropic Messages API** (`/v1/messages`) - **OpenAI Responses API** (`/v1/responses`) ## Authentication All API requests require authentication using a Virtual API Key. Include your key in one of two ways: ### Bearer Token (Recommended) ```bash curl -X POST https://gateway.example.com/v1/chat/completions \ -H "Authorization: Bearer sk_your_virtual_key" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}' ``` ### X-API-Key Header ```bash curl -X POST https://gateway.example.com/v1/chat/completions \ -H "X-API-Key: sk_your_virtual_key" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}' ``` ## Chat Completions API ### POST /v1/chat/completions OpenAI-compatible chat completions endpoint. **Request Body:** ```json { "model": "gpt-4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, how are you?"} ], "temperature": 0.7, "max_tokens": 1000, "stream": false } ``` **Parameters:** | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | model | string | Yes | Model alias or provider:model format | | messages | array | Yes | Array of message objects | | temperature | number | No | Sampling temperature (0-2) | | max_tokens | integer | No | Maximum tokens to generate | | stream | boolean | No | Enable streaming response | | tools | array | No | Tool definitions for function calling | | tool_choice | string/object | No | Tool selection behavior | **Response:** ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1234567890, "model": "gpt-4", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! I'm doing well, thank you for asking." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 15, "total_tokens": 35 } } ``` ## Anthropic Messages API ### POST /v1/messages Anthropic Messages API compatible endpoint. **Request Body:** ```json { "model": "claude-3-opus", "max_tokens": 1024, "messages": [ {"role": "user", "content": "Hello, Claude!"} ], "system": "You are a helpful assistant." } ``` **Parameters:** | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | model | string | Yes | Model alias or provider:model format | | max_tokens | integer | Yes | Maximum tokens to generate | | messages | array | Yes | Array of message objects | | system | string | No | System prompt | | temperature | number | No | Sampling temperature (0-1) | | tools | array | No | Tool definitions | | tool_choice | object | No | Tool selection behavior | **Response:** ```json { "id": "msg_abc123", "type": "message", "role": "assistant", "content": [ { "type": "text", "text": "Hello! How can I help you today?" } ], "model": "claude-3-opus-20240229", "stop_reason": "end_turn", "usage": { "input_tokens": 15, "output_tokens": 10 } } ``` ## OpenAI Responses API ### POST /v1/responses OpenAI Responses API compatible endpoint (new format). **Request Body:** ```json { "model": "gpt-4", "input": "What is the capital of France?", "instructions": "Be concise and accurate." } ``` **Response:** ```json { "id": "resp_abc123", "object": "response", "created": 1234567890, "model": "gpt-4", "output": "The capital of France is Paris.", "usage": { "input_tokens": 20, "output_tokens": 10, "total_tokens": 30 } } ``` --- ## Admin API Admin APIs are used to manage providers, projects, API keys, and model aliases. ### Providers #### List Providers ``` GET /admin/providers ``` Query parameters: - `page` (default: 1) - `page_size` (default: 20) - `enabled` (optional, filter by status) #### Create Provider ``` POST /admin/providers ``` ```json { "name": "openai", "api_base": "https://api.openai.com/v1", "api_key": "sk-xxx", "api_version": null, "rpm_limit": 500, "tpm_limit": 150000, "enabled": true } ``` #### Update Provider ``` PUT /admin/providers/{provider_id} ``` #### Delete Provider ``` DELETE /admin/providers/{provider_id} ``` ### Projects #### List Projects ``` GET /admin/projects ``` #### Create Project ``` POST /admin/projects ``` ```json { "name": "My Project", "description": "Project description", "budget_limit": 100.00, "budget_period": "monthly" } ``` ### API Keys #### List API Keys ``` GET /admin/keys ``` #### Create API Key ``` POST /admin/keys ``` ```json { "name": "Production Key", "project_id": "project-uuid", "rpm_limit": 100, "tpm_limit": 50000, "budget_limit": 50.00, "allowed_models": ["gpt-4", "claude-3-opus"] } ``` **Response includes the full key (only shown once):** ```json { "id": "key-uuid", "name": "Production Key", "key": "sk_prod_abc123...", "key_prefix": "sk_prod_abc...", "enabled": true, "created_at": "2026-05-01T00:00:00Z" } ``` #### Delete API Key ``` DELETE /admin/keys/{key_id} ``` ### Model Aliases #### List Model Aliases ``` GET /admin/models/aliases ``` #### Create Model Alias ``` POST /admin/models/aliases ``` ```json { "alias": "smart-model", "provider": "openai", "model": "gpt-4-turbo", "enabled": true, "routing_type": "simple", "input_price_per_1k": 0.01, "output_price_per_1k": 0.03 } ``` **Routing Types:** - `simple` - Direct mapping to a single provider/model - `load_balance` - Distribute across multiple providers - `fallback` - Try providers in order until success **Load Balance Config:** ```json { "routing_type": "load_balance", "routing_config": { "targets": [ {"provider": "openai", "model": "gpt-4", "weight": 0.7}, {"provider": "azure", "model": "gpt-4", "weight": 0.3} ] } } ``` **Fallback Config:** ```json { "routing_type": "fallback", "routing_config": { "chain": [ {"provider": "openai", "model": "gpt-4"}, {"provider": "anthropic", "model": "claude-3-opus"} ] } } ``` ### Usage Statistics #### Get Usage Stats ``` GET /admin/usage/stats ``` Query parameters: - `start_date` (ISO date) - `end_date` (ISO date) - `group_by` (hour, day, provider, model, key) ### Health Check ``` GET /health ``` ```json { "status": "healthy", "version": "0.1.0", "providers": { "openai": "healthy", "anthropic": "healthy" } } ``` --- ## Error Responses All errors follow a consistent format: ```json { "detail": { "error": { "type": "error_type", "message": "Human readable error message", "details": {} } } } ``` ### Common Error Types | Status | Type | Description | |--------|------|-------------| | 401 | authentication_error | Invalid or missing API key | | 403 | permission_error | API key disabled or expired | | 402 | budget_exceeded_error | Budget limit reached | | 429 | rate_limit_error | Rate limit exceeded | | 503 | service_unavailable | Provider unavailable | | 502 | provider_error | Upstream provider error | --- ## Rate Limiting Rate limits are applied per API key. Response headers include: ``` X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1714521600 ``` When rate limited, the response includes: ```json { "detail": { "error": { "type": "rate_limit_error", "message": "Rate limit exceeded", "details": { "limit": 100, "remaining": 0, "reset_at": "2026-05-01T00:00:00Z" } } } } ```