root 315326d0a2 feat(middleware): add auth, logging, and audit middleware
- Add authentication middleware with API key validation
- Add request logging middleware for observability
- Add audit logging middleware for admin operations
- Refactor API endpoints to use centralized auth middleware
- Add comprehensive unit tests for all middleware
- Add API documentation and deployment guide
- Update README with health endpoints and documentation links
- Fix test data isolation in router tests

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-03 03:23:37 +08:00

7.7 KiB

LLM Gateway API Documentation

Overview

LLM Gateway provides a unified API for interacting with multiple LLM providers. It supports three API formats:

  • OpenAI-compatible Chat Completions API (/v1/chat/completions)
  • Anthropic Messages API (/v1/messages)
  • OpenAI Responses API (/v1/responses)

Authentication

All API requests require authentication using a Virtual API Key. Include your key in one of two ways:

curl -X POST https://gateway.example.com/v1/chat/completions \
  -H "Authorization: Bearer sk_your_virtual_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'

X-API-Key Header

curl -X POST https://gateway.example.com/v1/chat/completions \
  -H "X-API-Key: sk_your_virtual_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'

Chat Completions API

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint.

Request Body:

{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Parameters:

Parameter Type Required Description
model string Yes Model alias or provider:model format
messages array Yes Array of message objects
temperature number No Sampling temperature (0-2)
max_tokens integer No Maximum tokens to generate
stream boolean No Enable streaming response
tools array No Tool definitions for function calling
tool_choice string/object No Tool selection behavior

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 15,
    "total_tokens": 35
  }
}

Anthropic Messages API

POST /v1/messages

Anthropic Messages API compatible endpoint.

Request Body:

{
  "model": "claude-3-opus",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "Hello, Claude!"}
  ],
  "system": "You are a helpful assistant."
}

Parameters:

Parameter Type Required Description
model string Yes Model alias or provider:model format
max_tokens integer Yes Maximum tokens to generate
messages array Yes Array of message objects
system string No System prompt
temperature number No Sampling temperature (0-1)
tools array No Tool definitions
tool_choice object No Tool selection behavior

Response:

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-3-opus-20240229",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 10
  }
}

OpenAI Responses API

POST /v1/responses

OpenAI Responses API compatible endpoint (new format).

Request Body:

{
  "model": "gpt-4",
  "input": "What is the capital of France?",
  "instructions": "Be concise and accurate."
}

Response:

{
  "id": "resp_abc123",
  "object": "response",
  "created": 1234567890,
  "model": "gpt-4",
  "output": "The capital of France is Paris.",
  "usage": {
    "input_tokens": 20,
    "output_tokens": 10,
    "total_tokens": 30
  }
}

Admin API

Admin APIs are used to manage providers, projects, API keys, and model aliases.

Providers

List Providers

GET /admin/providers

Query parameters:

  • page (default: 1)
  • page_size (default: 20)
  • enabled (optional, filter by status)

Create Provider

POST /admin/providers
{
  "name": "openai",
  "api_base": "https://api.openai.com/v1",
  "api_key": "sk-xxx",
  "api_version": null,
  "rpm_limit": 500,
  "tpm_limit": 150000,
  "enabled": true
}

Update Provider

PUT /admin/providers/{provider_id}

Delete Provider

DELETE /admin/providers/{provider_id}

Projects

List Projects

GET /admin/projects

Create Project

POST /admin/projects
{
  "name": "My Project",
  "description": "Project description",
  "budget_limit": 100.00,
  "budget_period": "monthly"
}

API Keys

List API Keys

GET /admin/keys

Create API Key

POST /admin/keys
{
  "name": "Production Key",
  "project_id": "project-uuid",
  "rpm_limit": 100,
  "tpm_limit": 50000,
  "budget_limit": 50.00,
  "allowed_models": ["gpt-4", "claude-3-opus"]
}

Response includes the full key (only shown once):

{
  "id": "key-uuid",
  "name": "Production Key",
  "key": "sk_prod_abc123...",
  "key_prefix": "sk_prod_abc...",
  "enabled": true,
  "created_at": "2026-05-01T00:00:00Z"
}

Delete API Key

DELETE /admin/keys/{key_id}

Model Aliases

List Model Aliases

GET /admin/models/aliases

Create Model Alias

POST /admin/models/aliases
{
  "alias": "smart-model",
  "provider": "openai",
  "model": "gpt-4-turbo",
  "enabled": true,
  "routing_type": "simple",
  "input_price_per_1k": 0.01,
  "output_price_per_1k": 0.03
}

Routing Types:

  • simple - Direct mapping to a single provider/model
  • load_balance - Distribute across multiple providers
  • fallback - Try providers in order until success

Load Balance Config:

{
  "routing_type": "load_balance",
  "routing_config": {
    "targets": [
      {"provider": "openai", "model": "gpt-4", "weight": 0.7},
      {"provider": "azure", "model": "gpt-4", "weight": 0.3}
    ]
  }
}

Fallback Config:

{
  "routing_type": "fallback",
  "routing_config": {
    "chain": [
      {"provider": "openai", "model": "gpt-4"},
      {"provider": "anthropic", "model": "claude-3-opus"}
    ]
  }
}

Usage Statistics

Get Usage Stats

GET /admin/usage/stats

Query parameters:

  • start_date (ISO date)
  • end_date (ISO date)
  • group_by (hour, day, provider, model, key)

Health Check

GET /health
{
  "status": "healthy",
  "version": "0.1.0",
  "providers": {
    "openai": "healthy",
    "anthropic": "healthy"
  }
}

Error Responses

All errors follow a consistent format:

{
  "detail": {
    "error": {
      "type": "error_type",
      "message": "Human readable error message",
      "details": {}
    }
  }
}

Common Error Types

Status Type Description
401 authentication_error Invalid or missing API key
403 permission_error API key disabled or expired
402 budget_exceeded_error Budget limit reached
429 rate_limit_error Rate limit exceeded
503 service_unavailable Provider unavailable
502 provider_error Upstream provider error

Rate Limiting

Rate limits are applied per API key. Response headers include:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1714521600

When rate limited, the response includes:

{
  "detail": {
    "error": {
      "type": "rate_limit_error",
      "message": "Rate limit exceeded",
      "details": {
        "limit": 100,
        "remaining": 0,
        "reset_at": "2026-05-01T00:00:00Z"
      }
    }
  }
}