root 315326d0a2 feat(middleware): add auth, logging, and audit middleware

- Add authentication middleware with API key validation
- Add request logging middleware for observability
- Add audit logging middleware for admin operations
- Refactor API endpoints to use centralized auth middleware
- Add comprehensive unit tests for all middleware
- Add API documentation and deployment guide
- Update README with health endpoints and documentation links
- Fix test data isolation in router tests

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-03 03:23:37 +08:00

7.7 KiB

Raw Blame History

LLM Gateway API Documentation

Overview

LLM Gateway provides a unified API for interacting with multiple LLM providers. It supports three API formats:

OpenAI-compatible Chat Completions API (/v1/chat/completions)
Anthropic Messages API (/v1/messages)
OpenAI Responses API (/v1/responses)

Authentication

All API requests require authentication using a Virtual API Key. Include your key in one of two ways:

Bearer Token (Recommended)

curl -X POST https://gateway.example.com/v1/chat/completions \
  -H "Authorization: Bearer sk_your_virtual_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'

X-API-Key Header

curl -X POST https://gateway.example.com/v1/chat/completions \
  -H "X-API-Key: sk_your_virtual_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'

Chat Completions API

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint.

Request Body:

{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Parameters:

Parameter	Type	Required	Description
model	string	Yes	Model alias or provider:model format
messages	array	Yes	Array of message objects
temperature	number	No	Sampling temperature (0-2)
max_tokens	integer	No	Maximum tokens to generate
stream	boolean	No	Enable streaming response
tools	array	No	Tool definitions for function calling
tool_choice	string/object	No	Tool selection behavior

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 15,
    "total_tokens": 35
  }
}

Anthropic Messages API

POST /v1/messages

Anthropic Messages API compatible endpoint.

Request Body:

{
  "model": "claude-3-opus",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "Hello, Claude!"}
  ],
  "system": "You are a helpful assistant."
}

Parameters:

Parameter	Type	Required	Description
model	string	Yes	Model alias or provider:model format
max_tokens	integer	Yes	Maximum tokens to generate
messages	array	Yes	Array of message objects
system	string	No	System prompt
temperature	number	No	Sampling temperature (0-1)
tools	array	No	Tool definitions
tool_choice	object	No	Tool selection behavior

Response:

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you today?"
    }
  ],
  "model": "claude-3-opus-20240229",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 10
  }
}

OpenAI Responses API

POST /v1/responses

OpenAI Responses API compatible endpoint (new format).

Request Body:

{
  "model": "gpt-4",
  "input": "What is the capital of France?",
  "instructions": "Be concise and accurate."
}

Response:

{
  "id": "resp_abc123",
  "object": "response",
  "created": 1234567890,
  "model": "gpt-4",
  "output": "The capital of France is Paris.",
  "usage": {
    "input_tokens": 20,
    "output_tokens": 10,
    "total_tokens": 30
  }
}

Admin API

Admin APIs are used to manage providers, projects, API keys, and model aliases.

Providers

List Providers

GET /admin/providers

Query parameters:

page (default: 1)
page_size (default: 20)
enabled (optional, filter by status)

Create Provider

POST /admin/providers

{
  "name": "openai",
  "api_base": "https://api.openai.com/v1",
  "api_key": "sk-xxx",
  "api_version": null,
  "rpm_limit": 500,
  "tpm_limit": 150000,
  "enabled": true
}

Update Provider

PUT /admin/providers/{provider_id}

Delete Provider

DELETE /admin/providers/{provider_id}

Projects

List Projects

GET /admin/projects

Create Project

POST /admin/projects

{
  "name": "My Project",
  "description": "Project description",
  "budget_limit": 100.00,
  "budget_period": "monthly"
}

API Keys

List API Keys

GET /admin/keys

Create API Key

POST /admin/keys

{
  "name": "Production Key",
  "project_id": "project-uuid",
  "rpm_limit": 100,
  "tpm_limit": 50000,
  "budget_limit": 50.00,
  "allowed_models": ["gpt-4", "claude-3-opus"]
}

Response includes the full key (only shown once):

{
  "id": "key-uuid",
  "name": "Production Key",
  "key": "sk_prod_abc123...",
  "key_prefix": "sk_prod_abc...",
  "enabled": true,
  "created_at": "2026-05-01T00:00:00Z"
}

Delete API Key

DELETE /admin/keys/{key_id}

Model Aliases

List Model Aliases

GET /admin/models/aliases

Create Model Alias

POST /admin/models/aliases

{
  "alias": "smart-model",
  "provider": "openai",
  "model": "gpt-4-turbo",
  "enabled": true,
  "routing_type": "simple",
  "input_price_per_1k": 0.01,
  "output_price_per_1k": 0.03
}

Routing Types:

simple - Direct mapping to a single provider/model
load_balance - Distribute across multiple providers
fallback - Try providers in order until success

Load Balance Config:

{
  "routing_type": "load_balance",
  "routing_config": {
    "targets": [
      {"provider": "openai", "model": "gpt-4", "weight": 0.7},
      {"provider": "azure", "model": "gpt-4", "weight": 0.3}
    ]
  }
}

Fallback Config:

{
  "routing_type": "fallback",
  "routing_config": {
    "chain": [
      {"provider": "openai", "model": "gpt-4"},
      {"provider": "anthropic", "model": "claude-3-opus"}
    ]
  }
}

Usage Statistics

Get Usage Stats

GET /admin/usage/stats

Query parameters:

start_date (ISO date)
end_date (ISO date)
group_by (hour, day, provider, model, key)

Health Check

GET /health

{
  "status": "healthy",
  "version": "0.1.0",
  "providers": {
    "openai": "healthy",
    "anthropic": "healthy"
  }
}

Error Responses

All errors follow a consistent format:

{
  "detail": {
    "error": {
      "type": "error_type",
      "message": "Human readable error message",
      "details": {}
    }
  }
}

Common Error Types

Status	Type	Description
401	authentication_error	Invalid or missing API key
403	permission_error	API key disabled or expired
402	budget_exceeded_error	Budget limit reached
429	rate_limit_error	Rate limit exceeded
503	service_unavailable	Provider unavailable
502	provider_error	Upstream provider error

Rate Limiting

Rate limits are applied per API key. Response headers include:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1714521600

When rate limited, the response includes:

{
  "detail": {
    "error": {
      "type": "rate_limit_error",
      "message": "Rate limit exceeded",
      "details": {
        "limit": 100,
        "remaining": 0,
        "reset_at": "2026-05-01T00:00:00Z"
      }
    }
  }
}

7.7 KiB Raw Blame History

LLM Gateway API Documentation

Overview

Authentication

Bearer Token (Recommended)

X-API-Key Header

Chat Completions API

POST /v1/chat/completions

Anthropic Messages API

POST /v1/messages

OpenAI Responses API

POST /v1/responses

Admin API

Providers

List Providers

Create Provider

Update Provider

Delete Provider

Projects

List Projects

Create Project

API Keys

List API Keys

Create API Key

Delete API Key

Model Aliases

List Model Aliases

Create Model Alias

Usage Statistics

Get Usage Stats

Health Check

Error Responses

Common Error Types

Rate Limiting

7.7 KiB

Raw Blame History