# LLM Gateway API Documentation

## Overview

LLM Gateway provides a unified API for interacting with multiple LLM providers. It supports three API formats:

- **OpenAI-compatible Chat Completions API** (`/v1/chat/completions`)
- **Anthropic Messages API** (`/v1/messages`)
- **OpenAI Responses API** (`/v1/responses`)

## Authentication

All API requests require authentication using a Virtual API Key. Include your key in one of two ways:

### Bearer Token (Recommended)

```bash
curl -X POST https://gateway.example.com/v1/chat/completions \
  -H "Authorization: Bearer sk_your_virtual_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'
```

### X-API-Key Header

```bash
curl -X POST https://gateway.example.com/v1/chat/completions \
  -H "X-API-Key: sk_your_virtual_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'
```

## Chat Completions API

### POST /v1/chat/completions

OpenAI-compatible chat completions endpoint.

**Request Body:**

```json
{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}
```

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| model | string | Yes | Model alias or provider:model format |
| messages | array | Yes | Array of message objects |
| temperature | number | No | Sampling temperature (0-2) |
| max_tokens | integer | No | Maximum tokens to generate |
| stream | boolean | No | Enable streaming response |
| tools | array | No | Tool definitions for function calling |
| tool_choice | string/object | No | Tool selection behavior |

**Response:**

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 15,
    "total_tokens": 35
  }
}
```

## Anthropic Messages API

### POST /v1/messages

Anthropic Messages API compatible endpoint.

**Request Body:**

```json
{
  "model": "claude-3-opus",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "Hello, Claude!"}
  ],
  "system": "You are a helpful assistant."
}
```

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| model | string | Yes | Model alias or provider:model format |
| max_tokens | integer | Yes | Maximum tokens to generate |
| messages | array | Yes | Array of message objects |
| system | string | No | System prompt |
| temperature | number | No | Sampling temperature (0-1) |
| tools | array | No | Tool definitions |
| tool_choice | object | No | Tool selection behavior |

**Response:**

```json
{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-3-opus-20240229",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 10
  }
}
```

## OpenAI Responses API

### POST /v1/responses

OpenAI Responses API compatible endpoint (new format).

**Request Body:**

```json
{
  "model": "gpt-4",
  "input": "What is the capital of France?",
  "instructions": "Be concise and accurate."
}
```

**Response:**

```json
{
  "id": "resp_abc123",
  "object": "response",
  "created": 1234567890,
  "model": "gpt-4",
  "output": "The capital of France is Paris.",
  "usage": {
    "input_tokens": 20,
    "output_tokens": 10,
    "total_tokens": 30
  }
}
```

---

## Admin API

Admin APIs are used to manage providers, projects, API keys, and model aliases.

### Providers

#### List Providers

```
GET /admin/providers
```

Query parameters:
- `page` (default: 1)
- `page_size` (default: 20)
- `enabled` (optional, filter by status)

#### Create Provider

```
POST /admin/providers
```

```json
{
  "name": "openai",
  "api_base": "https://api.openai.com/v1",
  "api_key": "sk-xxx",
  "api_version": null,
  "rpm_limit": 500,
  "tpm_limit": 150000,
  "enabled": true
}
```

#### Update Provider

```
PUT /admin/providers/{provider_id}
```

#### Delete Provider

```
DELETE /admin/providers/{provider_id}
```

### Projects

#### List Projects

```
GET /admin/projects
```

#### Create Project

```
POST /admin/projects
```

```json
{
  "name": "My Project",
  "description": "Project description",
  "budget_limit": 100.00,
  "budget_period": "monthly"
}
```

### API Keys

#### List API Keys

```
GET /admin/keys
```

#### Create API Key

```
POST /admin/keys
```

```json
{
  "name": "Production Key",
  "project_id": "project-uuid",
  "rpm_limit": 100,
  "tpm_limit": 50000,
  "budget_limit": 50.00,
  "allowed_models": ["gpt-4", "claude-3-opus"]
}
```

**Response includes the full key (only shown once):**

```json
{
  "id": "key-uuid",
  "name": "Production Key",
  "key": "sk_prod_abc123...",
  "key_prefix": "sk_prod_abc...",
  "enabled": true,
  "created_at": "2026-05-01T00:00:00Z"
}
```

#### Delete API Key

```
DELETE /admin/keys/{key_id}
```

### Model Aliases

#### List Model Aliases

```
GET /admin/models/aliases
```

#### Create Model Alias

```
POST /admin/models/aliases
```

```json
{
  "alias": "smart-model",
  "provider": "openai",
  "model": "gpt-4-turbo",
  "enabled": true,
  "routing_type": "simple",
  "input_price_per_1k": 0.01,
  "output_price_per_1k": 0.03
}
```

**Routing Types:**

- `simple` - Direct mapping to a single provider/model
- `load_balance` - Distribute across multiple providers
- `fallback` - Try providers in order until success

**Load Balance Config:**

```json
{
  "routing_type": "load_balance",
  "routing_config": {
    "targets": [
      {"provider": "openai", "model": "gpt-4", "weight": 0.7},
      {"provider": "azure", "model": "gpt-4", "weight": 0.3}
    ]
  }
}
```

**Fallback Config:**

```json
{
  "routing_type": "fallback",
  "routing_config": {
    "chain": [
      {"provider": "openai", "model": "gpt-4"},
      {"provider": "anthropic", "model": "claude-3-opus"}
    ]
  }
}
```

### Usage Statistics

#### Get Usage Stats

```
GET /admin/usage/stats
```

Query parameters:
- `start_date` (ISO date)
- `end_date` (ISO date)
- `group_by` (hour, day, provider, model, key)

### Health Check

```
GET /health
```

```json
{
  "status": "healthy",
  "version": "0.1.0",
  "providers": {
    "openai": "healthy",
    "anthropic": "healthy"
  }
}
```

---

## Error Responses

All errors follow a consistent format:

```json
{
  "detail": {
    "error": {
      "type": "error_type",
      "message": "Human readable error message",
      "details": {}
    }
  }
}
```

### Common Error Types

| Status | Type | Description |
|--------|------|-------------|
| 401 | authentication_error | Invalid or missing API key |
| 403 | permission_error | API key disabled or expired |
| 402 | budget_exceeded_error | Budget limit reached |
| 429 | rate_limit_error | Rate limit exceeded |
| 503 | service_unavailable | Provider unavailable |
| 502 | provider_error | Upstream provider error |

---

## Rate Limiting

Rate limits are applied per API key. Response headers include:

```
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1714521600
```

When rate limited, the response includes:

```json
{
  "detail": {
    "error": {
      "type": "rate_limit_error",
      "message": "Rate limit exceeded",
      "details": {
        "limit": 100,
        "remaining": 0,
        "reset_at": "2026-05-01T00:00:00Z"
      }
    }
  }
}
```