first commit
This commit is contained in:
599
docs/features/governance/budget-and-limits.mdx
Normal file
599
docs/features/governance/budget-and-limits.mdx
Normal file
@@ -0,0 +1,599 @@
|
||||
---
|
||||
title: "Budget and Limits"
|
||||
description: "Enterprise-grade budget management and cost control with hierarchical budget allocation through virtual keys, teams, and customers."
|
||||
icon: "money-bills"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Budgeting and rate limiting are a core feature of Bifrost's governance system managed through [Virtual Keys](./virtual-keys).
|
||||
|
||||
Bifrost's budget management system provides comprehensive cost control and financial governance for enterprise AI deployments. It operates through a **hierarchical budget structure** that enables granular cost management, usage tracking, and financial oversight across your entire organization.
|
||||
|
||||
**Core Hierarchy:**
|
||||
```
|
||||
Customer (has independent budget)
|
||||
↓ (one-to-many)
|
||||
Team (has independent budget)
|
||||
↓ (one-to-many)
|
||||
Virtual Key (has independent budget + rate limits)
|
||||
↓ (one-to-many)
|
||||
Provider Config (has independent budget + rate limits)
|
||||
|
||||
OR
|
||||
|
||||
Customer (has independent budget)
|
||||
↓ (direct attachment)
|
||||
Virtual Key (has independent budget + rate limits)
|
||||
↓ (one-to-many)
|
||||
Provider Config (has independent budget + rate limits)
|
||||
|
||||
OR
|
||||
|
||||
Virtual Key (standalone - has independent budget + rate limits)
|
||||
↓ (one-to-many)
|
||||
Provider Config (has independent budget + rate limits)
|
||||
```
|
||||
|
||||
**Key Capabilities:**
|
||||
- **Virtual Keys** - Primary access control via `x-bf-vk` header (exclusive team OR customer attachment)
|
||||
- **Budget Management** - Independent budget limits at each hierarchy level with cumulative checking
|
||||
- **Rate Limiting** - Request and token-based throttling at both VK and provider config levels
|
||||
- **Provider-Level Governance** - Granular budgets and rate limits per AI provider within a virtual key
|
||||
- **Model/Provider Filtering** - Granular access control per virtual key
|
||||
- **Usage Tracking** - Real-time monitoring and audit trails
|
||||
- **Audit Headers** - Optional team and customer identification
|
||||
|
||||
---
|
||||
|
||||
## Budget Management
|
||||
|
||||
### Cost Calculation
|
||||
|
||||
Bifrost automatically calculates costs based on:
|
||||
- **Provider Pricing** - Real-time model pricing data
|
||||
- **Token Usage** - Input + output tokens from API responses
|
||||
- **Request Type** - Different pricing for chat, text, embedding, speech, transcription
|
||||
- **Cache Status** - Reduced costs for cached responses
|
||||
- **Batch Operations** - Volume discounts for batch requests
|
||||
|
||||
All cost calculation details are covered in [Architecture > Framework > Model Catalog](../../architecture/framework/model-catalog).
|
||||
|
||||
### Budget Checking Flow
|
||||
|
||||
When a request is made with a virtual key, Bifrost checks **all applicable budgets independently** in the hierarchy. Each budget must have sufficient remaining balance for the request to proceed.
|
||||
|
||||
**Checking Sequence:**
|
||||
|
||||
**For VK → Team → Customer:**
|
||||
```
|
||||
1. ✓ Provider Config Budget (if provider config has budget)
|
||||
2. ✓ VK Budget (if VK has budget)
|
||||
3. ✓ Team Budget (if VK's team has budget)
|
||||
4. ✓ Customer Budget (if team's customer has budget)
|
||||
```
|
||||
|
||||
**For VK → Customer (direct):**
|
||||
```
|
||||
1. ✓ Provider Config Budget (if provider config has budget)
|
||||
2. ✓ VK Budget (if VK has budget)
|
||||
3. ✓ Customer Budget (if VK's customer has budget)
|
||||
```
|
||||
|
||||
**For Standalone VK:**
|
||||
```
|
||||
1. ✓ Provider Config Budget (if provider config has budget)
|
||||
2. ✓ VK Budget (if VK has budget)
|
||||
```
|
||||
|
||||
**Important Notes:**
|
||||
- **All applicable budgets must pass** - any single budget failure blocks the request
|
||||
- **Budgets are independent** - each tracks its own usage and limits
|
||||
- **Costs are deducted from all applicable budgets** - same cost applied to each level
|
||||
- **Rate limits checked at provider config and VK levels** - teams and customers have no rate limits
|
||||
- **Provider selection** - providers that exceed their budget or rate limits are excluded from [routing](./routing)
|
||||
|
||||
**Example:**
|
||||
```
|
||||
- Provider config budget: $4/$5 remaining ✓
|
||||
- VK budget: $9/$10 remaining ✓
|
||||
- Team budget: $15/$20 remaining ✓
|
||||
- Customer budget: $45/$50 remaining ✓
|
||||
- Result: Allowed (no budget is exceeded)
|
||||
|
||||
- After request:
|
||||
- Request cost: $2
|
||||
- Updated Provider=$6/$5, VK=$11/$10, Team=$17/$20, Customer=$47/$50
|
||||
- Then the next request will be blocked (both provider and VK budgets exceeded).
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
Rate limits protect your system from abuse and manage traffic by setting thresholds on request frequency and token usage over a specific time window. Rate limits can be configured at **both the Virtual Key level and Provider Config level** for granular control.
|
||||
|
||||
Bifrost supports two types of rate limits that work in parallel:
|
||||
- **Request Limits**: Control the maximum number of API calls that can be made within a set duration (e.g., 100 requests per minute).
|
||||
- **Token Limits**: Control the maximum number of tokens (prompt + completion) that can be processed within a set duration (e.g., 50,000 tokens per hour).
|
||||
|
||||
### Rate Limit Hierarchy
|
||||
|
||||
Rate limits are checked in hierarchical order:
|
||||
```
|
||||
1. ✓ Provider Config Rate Limits (if provider config has rate limits)
|
||||
2. ✓ Virtual Key Rate Limits (if VK has rate limits)
|
||||
```
|
||||
|
||||
For a request to be allowed, it must pass both the request limit and token limit checks at **all applicable levels**. If a provider config exceeds its rate limits, that provider is excluded from routing, but other providers within the same virtual key remain available.
|
||||
|
||||
### Provider-Level Rate Limiting
|
||||
|
||||
Provider configs within a virtual key can have independent rate limits, enabling:
|
||||
- **Per-Provider Throttling**: Different rate limits for OpenAI vs Anthropic
|
||||
- **Provider Isolation**: Rate limit violations on one provider don't affect others
|
||||
- **Granular Control**: Fine-tune limits based on provider capabilities and costs
|
||||
|
||||
## Reset Durations
|
||||
|
||||
Budgets and rate limits support flexible reset durations:
|
||||
|
||||
**Format Examples:**
|
||||
- `1m` - 1 minute
|
||||
- `5m` - 5 minutes
|
||||
- `1h` - 1 hour
|
||||
- `1d` - 1 day
|
||||
- `1w` - 1 week
|
||||
- `1M` - 1 month
|
||||
- `1Y` - 1 year
|
||||
|
||||
**Common Patterns:**
|
||||
- **Rate Limits**: `1m`, `1h`, `1d` for request throttling
|
||||
- **Budgets**: `1d`, `1w`, `1M`, `1Y` for cost control
|
||||
|
||||
### Calendar-aligned budgets
|
||||
|
||||
By default, a budget **rolls**: after `reset_duration` elapses since `last_reset`, usage resets. With **`calendar_aligned`: `true`**, the budget resets at the **start of each calendar period in UTC** instead (same instant for every customer of that configuration).
|
||||
|
||||
**Supported `reset_duration` suffixes:** only day (`d`), week (`w`), month (`M`), and year (`Y`). Examples: `1d` → midnight UTC each day; `1w` → Monday 00:00 UTC each week; `1M` → first day of each month; `1Y` → January 1 each year. Sub-day durations (for example `1h`, `30m`) **cannot** use calendar alignment; the API rejects invalid combinations.
|
||||
|
||||
Calendar alignment applies to budgets on **customers**, **teams**, **virtual keys**, and **per–provider-config** budgets. You can set it when creating a budget (`calendar_aligned` on create) or toggle it on update (`calendar_aligned` on the budget in `PUT` requests). Turning calendar alignment **on** for an existing budget resets **current usage to zero** and snaps **`last_reset`** to the current period start.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Guide
|
||||
|
||||
Configure provider-level budgets and rate limits using any of these methods:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Web UI">
|
||||
|
||||
The Bifrost Web UI provides an intuitive interface for configuring provider-level governance through the Virtual Keys management page.
|
||||
|
||||
### Creating Virtual Keys with Provider Configs
|
||||
|
||||
1. **Navigate to Virtual Keys**: Go to **Virtual Keys** page in the Bifrost dashboard
|
||||
2. **Create New Virtual Key**: Click "Create Virtual Key" button
|
||||
3. **Configure Providers**: In the "Provider Configurations" section:
|
||||
- Add multiple providers with individual weights
|
||||
- Set provider-specific budgets and rate limits
|
||||
- Configure allowed models per provider
|
||||
|
||||
### Provider Configuration Interface
|
||||
|
||||

|
||||
|
||||
**Key Features:**
|
||||
- **Visual Provider Cards**: Each provider displays as an expandable card
|
||||
- **Budget Controls**: Set spending limits with reset periods per provider
|
||||
- **Rate Limit Controls**: Configure token and request limits independently
|
||||
- **Model Filtering**: Specify allowed models for each provider
|
||||
- **Weight Distribution**: Visual indicators for load balancing weights
|
||||
- **Real-time Validation**: Immediate feedback on configuration errors
|
||||
|
||||
### Monitoring Provider Usage
|
||||
|
||||

|
||||
|
||||
The info sheet for the virtual key provides real-time monitoring of:
|
||||
- Budget consumption per provider
|
||||
- Rate limit utilization (tokens and requests)
|
||||
- Provider availability status
|
||||
- Usage trends and forecasting
|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
Use the Bifrost HTTP API to programmatically manage provider-level governance configurations.
|
||||
|
||||
### Create Virtual Key with Provider Configs
|
||||
|
||||
```bash
|
||||
curl -X POST "https://your-bifrost-instance.com/api/governance/virtual-keys" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "marketing-team-vk",
|
||||
"description": "Marketing team virtual key with provider-specific limits",
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"weight": 0.7,
|
||||
"allowed_models": ["gpt-4", "gpt-3.5-turbo"],
|
||||
"budget": {
|
||||
"max_limit": 500.00,
|
||||
"reset_duration": "1M",
|
||||
"calendar_aligned": true
|
||||
},
|
||||
"rate_limit": {
|
||||
"token_max_limit": 1000000,
|
||||
"token_reset_duration": "1h",
|
||||
"request_max_limit": 1000,
|
||||
"request_reset_duration": "1h"
|
||||
}
|
||||
},
|
||||
{
|
||||
"provider": "anthropic",
|
||||
"weight": 0.3,
|
||||
"allowed_models": ["claude-3-opus", "claude-3-sonnet"],
|
||||
"budget": {
|
||||
"max_limit": 200.00,
|
||||
"reset_duration": "1M"
|
||||
},
|
||||
"rate_limit": {
|
||||
"token_max_limit": 500000,
|
||||
"token_reset_duration": "1h",
|
||||
"request_max_limit": 500,
|
||||
"request_reset_duration": "1h"
|
||||
}
|
||||
}
|
||||
],
|
||||
"budget": {
|
||||
"max_limit": 1000.00,
|
||||
"reset_duration": "1M",
|
||||
"calendar_aligned": true
|
||||
},
|
||||
"is_active": true
|
||||
}'
|
||||
```
|
||||
|
||||
Use `calendar_aligned` only with `d` / `w` / `M` / `Y` reset durations (see [Calendar-aligned budgets](#calendar-aligned-budgets)).
|
||||
|
||||
### Update Provider Configuration
|
||||
|
||||
```bash
|
||||
curl -X PUT "https://your-bifrost-instance.com/api/governance/virtual-keys/{vk_id}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"provider_configs": [
|
||||
{
|
||||
"id": 1,
|
||||
"provider": "openai",
|
||||
"weight": 0.8,
|
||||
"budget": {
|
||||
"max_limit": 600.00,
|
||||
"reset_duration": "1M"
|
||||
},
|
||||
"rate_limit": {
|
||||
"token_max_limit": 1200000,
|
||||
"token_reset_duration": "1h"
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
### API Response Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"message": "Virtual key created successfully",
|
||||
"virtual_key": {
|
||||
"id": "vk_123",
|
||||
"name": "marketing-team-vk",
|
||||
"value": "vk_abc123def456",
|
||||
"provider_configs": [
|
||||
{
|
||||
"id": 1,
|
||||
"provider": "openai",
|
||||
"weight": 0.7,
|
||||
"allowed_models": ["gpt-4", "gpt-3.5-turbo"],
|
||||
"budget": {
|
||||
"id": "budget_789",
|
||||
"max_limit": 500.00,
|
||||
"current_usage": 0.00,
|
||||
"reset_duration": "1M",
|
||||
"calendar_aligned": true,
|
||||
"last_reset": "2024-01-01T00:00:00Z"
|
||||
},
|
||||
"rate_limit": {
|
||||
"id": "rate_limit_456",
|
||||
"token_max_limit": 1000000,
|
||||
"token_current_usage": 0,
|
||||
"token_reset_duration": "1h",
|
||||
"token_last_reset": "2024-01-01T00:00:00Z",
|
||||
"request_max_limit": 1000,
|
||||
"request_current_usage": 0,
|
||||
"request_reset_duration": "1h",
|
||||
"request_last_reset": "2024-01-01T00:00:00Z"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Field Descriptions
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `provider` | string | AI provider name (e.g., "openai", "anthropic") |
|
||||
| `weight` | float | Load balancing weight (0.0-1.0) |
|
||||
| `allowed_models` | array | Specific models allowed for this provider |
|
||||
| `budget.max_limit` | float | Maximum spend in USD |
|
||||
| `budget.reset_duration` | string | Reset period (e.g., "1h", "1d", "1M") |
|
||||
| `budget.calendar_aligned` | boolean | When true, resets at calendar boundaries in UTC (requires `d`/`w`/`M`/`Y` durations) |
|
||||
| `rate_limit.token_max_limit` | integer | Maximum tokens per period |
|
||||
| `rate_limit.request_max_limit` | integer | Maximum requests per period |
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
Configure provider-level governance through Bifrost's configuration file for declarative management.
|
||||
|
||||
### Basic Configuration Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-dev-001",
|
||||
"name": "development-team-vk",
|
||||
"description": "Development team with multi-provider setup",
|
||||
"is_active": true,
|
||||
"rate_limit_id": "rl-vk-dev",
|
||||
"provider_configs": [
|
||||
{
|
||||
"id": 1,
|
||||
"provider": "openai",
|
||||
"weight": 0.6,
|
||||
"allowed_models": ["gpt-4", "gpt-3.5-turbo"],
|
||||
"rate_limit_id": "rl-pc-openai"
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"provider": "anthropic",
|
||||
"weight": 0.4,
|
||||
"allowed_models": ["claude-3-opus", "claude-3-sonnet"],
|
||||
"rate_limit_id": "rl-pc-anthropic"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{
|
||||
"id": "budget-vk-dev",
|
||||
"virtual_key_id": "vk-dev-001",
|
||||
"max_limit": 2000.00,
|
||||
"reset_duration": "1M",
|
||||
"calendar_aligned": true
|
||||
},
|
||||
{
|
||||
"id": "budget-pc-openai",
|
||||
"provider_config_id": 1,
|
||||
"max_limit": 1000.00,
|
||||
"reset_duration": "1M"
|
||||
},
|
||||
{
|
||||
"id": "budget-pc-anthropic",
|
||||
"provider_config_id": 2,
|
||||
"max_limit": 500.00,
|
||||
"reset_duration": "1M"
|
||||
}
|
||||
],
|
||||
"rate_limits": [
|
||||
{
|
||||
"id": "rl-vk-dev",
|
||||
"token_max_limit": 5000000,
|
||||
"token_reset_duration": "1h",
|
||||
"request_max_limit": 3000,
|
||||
"request_reset_duration": "1h"
|
||||
},
|
||||
{
|
||||
"id": "rl-pc-openai",
|
||||
"token_max_limit": 2000000,
|
||||
"token_reset_duration": "1h",
|
||||
"request_max_limit": 2000,
|
||||
"request_reset_duration": "1h"
|
||||
},
|
||||
{
|
||||
"id": "rl-pc-anthropic",
|
||||
"token_max_limit": 1000000,
|
||||
"token_reset_duration": "1h",
|
||||
"request_max_limit": 1000,
|
||||
"request_reset_duration": "1h"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Budgets and rate limits live as **separate top-level arrays** inside `governance`. Virtual keys and provider configs reference them by id (`rate_limit_id`) or are referenced back (`virtual_key_id` / `provider_config_id` on each `budgets[]` entry). Optional `calendar_aligned` on each `budget` matches the HTTP API and [calendar-aligned behavior](#calendar-aligned-budgets).
|
||||
|
||||
### Advanced Configuration Examples
|
||||
|
||||
#### Cost-Optimized Setup
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-cost-opt",
|
||||
"name": "cost-optimized-vk",
|
||||
"provider_configs": [
|
||||
{"id": 10, "provider": "openai-gpt-3.5", "weight": 0.8, "rate_limit_id": "rl-cheap"},
|
||||
{"id": 11, "provider": "openai-gpt-4", "weight": 0.2, "rate_limit_id": "rl-premium"}
|
||||
]
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{"id": "b-cheap", "provider_config_id": 10, "max_limit": 50.00, "reset_duration": "1d"},
|
||||
{"id": "b-premium", "provider_config_id": 11, "max_limit": 200.00, "reset_duration": "1d"}
|
||||
],
|
||||
"rate_limits": [
|
||||
{"id": "rl-cheap", "request_max_limit": 1000, "request_reset_duration": "1h"},
|
||||
{"id": "rl-premium", "request_max_limit": 100, "request_reset_duration": "1h"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### High-Volume Production Setup
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-prod-hv",
|
||||
"name": "production-high-volume-vk",
|
||||
"provider_configs": [
|
||||
{"id": 20, "provider": "openai", "weight": 0.5, "rate_limit_id": "rl-openai"},
|
||||
{"id": 21, "provider": "anthropic", "weight": 0.3, "rate_limit_id": "rl-anthropic"},
|
||||
{"id": 22, "provider": "azure-openai", "weight": 0.2, "rate_limit_id": "rl-azure"}
|
||||
]
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{"id": "b-openai", "provider_config_id": 20, "max_limit": 5000.00, "reset_duration": "1M"},
|
||||
{"id": "b-anthropic", "provider_config_id": 21, "max_limit": 3000.00, "reset_duration": "1M"},
|
||||
{"id": "b-azure", "provider_config_id": 22, "max_limit": 2000.00, "reset_duration": "1M"}
|
||||
],
|
||||
"rate_limits": [
|
||||
{"id": "rl-openai", "token_max_limit": 10000000, "token_reset_duration": "1h", "request_max_limit": 10000, "request_reset_duration": "1h"},
|
||||
{"id": "rl-anthropic", "token_max_limit": 6000000, "token_reset_duration": "1h", "request_max_limit": 6000, "request_reset_duration": "1h"},
|
||||
{"id": "rl-azure", "token_max_limit": 4000000, "token_reset_duration": "1h", "request_max_limit": 4000, "request_reset_duration": "1h"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Validation Rules:**
|
||||
- Budget limits must be positive numbers
|
||||
- Reset durations must be valid time formats
|
||||
- Rate limits must be positive integers
|
||||
- Provider names must match configured providers
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Provider-Level Governance Examples
|
||||
|
||||
### Example 1: Mixed Provider Budgets
|
||||
|
||||
A virtual key configured with multiple providers and different budget allocations:
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-mkt",
|
||||
"name": "marketing-team-vk",
|
||||
"provider_configs": [
|
||||
{"id": 30, "provider": "openai", "weight": 0.7},
|
||||
{"id": 31, "provider": "anthropic", "weight": 0.3}
|
||||
]
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{"id": "b-vk-mkt", "virtual_key_id": "vk-mkt", "max_limit": 100, "reset_duration": "1M"},
|
||||
{"id": "b-openai", "provider_config_id": 30, "max_limit": 50, "reset_duration": "1M"},
|
||||
{"id": "b-anth", "provider_config_id": 31, "max_limit": 30, "reset_duration": "1M"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- OpenAI requests limited to 50 dollars/month at provider level + 100 dollars/month at VK level
|
||||
- Anthropic requests limited to 30 dollars/month at provider level + 100 dollars/month at VK level
|
||||
- If any provider's budget is exhausted, all requests to that provider will be blocked
|
||||
|
||||
### Example 2: Provider-Specific Rate Limits
|
||||
|
||||
Different rate limits based on provider capabilities:
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-hv",
|
||||
"name": "high-volume-vk",
|
||||
"provider_configs": [
|
||||
{"id": 40, "provider": "openai", "rate_limit_id": "rl-openai"},
|
||||
{"id": 41, "provider": "anthropic", "rate_limit_id": "rl-anthropic"}
|
||||
]
|
||||
}
|
||||
],
|
||||
"rate_limits": [
|
||||
{"id": "rl-openai", "request_max_limit": 1000, "request_reset_duration": "1h", "token_max_limit": 1000000, "token_reset_duration": "1h"},
|
||||
{"id": "rl-anthropic", "request_max_limit": 500, "request_reset_duration": "1h", "token_max_limit": 500000, "token_reset_duration": "1h"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- OpenAI: 1000 requests/hour, 1M tokens/hour
|
||||
- Anthropic: 500 requests/hour, 500K tokens/hour
|
||||
- If any provider's rate limits are exceeded, all requests to that provider will be blocked
|
||||
|
||||
### Example 3: Failover Strategy
|
||||
|
||||
Provider configurations with budget-based failover:
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-cost",
|
||||
"name": "cost-optimized-vk",
|
||||
"provider_configs": [
|
||||
{"id": 50, "provider": "openai-cheap", "weight": 1.0},
|
||||
{"id": 51, "provider": "openai-premium", "weight": 0.0, "rate_limit_id": "rl-premium"}
|
||||
]
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{"id": "b-cheap", "provider_config_id": 50, "max_limit": 10, "reset_duration": "1d"},
|
||||
{"id": "b-premium", "provider_config_id": 51, "max_limit": 50, "reset_duration": "1d"}
|
||||
],
|
||||
"rate_limits": [
|
||||
{"id": "rl-premium", "request_max_limit": 100, "request_reset_duration": "1h", "token_max_limit": 50000, "token_reset_duration": "1h"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- Primary: Use cheap provider until $10 daily budget exhausted
|
||||
- Fallback: Automatically switch to premium provider when cheap option unavailable. To enable this, you should not send `provider` name in the request body, read [Routing](./routing#automatic-fallbacks) for more details.
|
||||
- Cost containment: Prevent unexpected overspend on premium resources and limit the number of requests to the premium provider
|
||||
|
||||
|
||||
## Key Benefits of Provider-Level Governance
|
||||
|
||||
- **Granular Control**: Set specific spending limits and rate limits per AI provider
|
||||
- **Automatic Fallback**: Route to alternative providers when budgets or rate limits are exceeded
|
||||
- **Cost Control**: Track and control spending by provider for better financial oversight
|
||||
- **Performance Testing**: A/B testing across providers with controlled budgets
|
||||
- **Multi-Provider Strategies**: Primary/backup provider configurations
|
||||
- **Cost-Tiered Access**: Cheap providers for basic tasks, premium for complex workloads
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Routing](./routing)** - Direct requests to specific AI models, providers, and keys using Virtual Keys.
|
||||
- **[MCP Tool Filtering](./mcp-tools)** - Manage MCP clients/tools for virtual keys.
|
||||
- **[Tracing](../observability/default)** - Audit trails and request tracking
|
||||
160
docs/features/governance/mcp-tools.mdx
Normal file
160
docs/features/governance/mcp-tools.mdx
Normal file
@@ -0,0 +1,160 @@
|
||||
---
|
||||
title: "MCP Tool Filtering"
|
||||
description: "Control which MCP tools are available for each Virtual Key."
|
||||
icon: "grid-2"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
MCP Tool Filtering allows you to control which tools are available to AI models on a per-request basis using Virtual Keys (VKs). By configuring a VirtualKey, you can create a strict allow-list of MCP clients and tools, ensuring that only approved tools can be executed.
|
||||
|
||||
Make sure you have at least one MCP client set up. Read more about it [here](../../mcp/overview).
|
||||
|
||||
## How It Works
|
||||
|
||||
The filtering logic is determined by the Virtual Key's configuration:
|
||||
|
||||
1. **No MCP Configuration on Virtual Key (Default)**
|
||||
- If a Virtual Key has no specific MCP configurations, **no MCP tools are available** (deny-by-default).
|
||||
- You must explicitly add MCP client configurations to allow tools.
|
||||
|
||||
2. **With MCP Configuration on Virtual Key**
|
||||
- When you configure MCP clients on a Virtual Key, its settings take full precedence.
|
||||
- Bifrost automatically generates an `x-bf-mcp-include-tools` header based on your VK configuration (unless `disable_auto_tool_inject` is enabled or the caller already sent the header). This acts as a strict allow-list for the request.
|
||||
- If the caller already includes an `x-bf-mcp-include-tools` header, auto-injection is skipped — but the VK allow-list is enforced at inference time and still enforced again at MCP tool execution time.
|
||||
|
||||
For each MCP client associated with a Virtual Key, you can specify the allowed tools:
|
||||
- **Select specific tools**: Only the chosen tools from that client will be available.
|
||||
- **Use `*` wildcard**: All available tools from that client will be permitted.
|
||||
- **Leave tool list empty**: All tools from that client will be **blocked**.
|
||||
- **Do not configure a client**: All tools from that client will be **blocked** (if other clients are configured).
|
||||
|
||||
## Setting MCP Tool Restrictions
|
||||
|
||||
<Tabs group="mcp-tool-restrictions">
|
||||
<Tab title="Web UI">
|
||||
|
||||
You can configure which tools a Virtual Key has access to via the UI.
|
||||
|
||||
1. Go to **Virtual Keys** page.
|
||||
2. Create/Edit virtual key
|
||||

|
||||
3. In **MCP Client Configurations** section, add the MCP client you want to restrict the VK to
|
||||
4. Select the specific tools to allow, or choose **Allow All Tools** to permit all current and future tools from that client (stored as `*`). Leaving the list empty blocks all tools for that client.
|
||||
5. Click on the **Save** button
|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
You can configure this via the REST API when creating (`POST`) or updating (`PUT`) a virtual key.
|
||||
|
||||
**Create Virtual Key:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/governance/virtual-keys \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "vk-for-billing-support",
|
||||
"mcp_configs": [
|
||||
{
|
||||
"mcp_client_name": "billing-client",
|
||||
"tools_to_execute": ["check-status"]
|
||||
},
|
||||
{
|
||||
"mcp_client_name": "support-client",
|
||||
"tools_to_execute": ["*"]
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
**Update Virtual Key:**
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"mcp_configs": [
|
||||
{
|
||||
"mcp_client_name": "billing-client",
|
||||
"tools_to_execute": ["check-status"]
|
||||
},
|
||||
{
|
||||
"mcp_client_name": "support-client",
|
||||
"tools_to_execute": ["*"]
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- The virtual key can only access the `check-status` tool from `billing-client`.
|
||||
- It can access all tools from `support-client`.
|
||||
- Any other MCP client is implicitly blocked for this key.
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="config.json">
|
||||
|
||||
You can also define MCP tool restrictions directly in your `config.json` file. The `mcp_configs` array under a virtual key should reference the MCP client by name.
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-billing-support-only",
|
||||
"name": "VK for Billing and Support",
|
||||
"mcp_configs": [
|
||||
{
|
||||
"mcp_client_name": "billing-client",
|
||||
"tools_to_execute": ["check-status"]
|
||||
},
|
||||
{
|
||||
"mcp_client_name": "support-client",
|
||||
"tools_to_execute": ["*"]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Example Scenario
|
||||
|
||||
**Available MCP Clients & Tools:**
|
||||
- **`billing-client`**: with tools `[create-invoice, check-status]`
|
||||
- **`support-client`**: with tools `[create-ticket, get-faq]`
|
||||
|
||||
<Tabs>
|
||||
<Tab title="VK with Full Access">
|
||||
**Configuration:**
|
||||
- `billing-client` -> Allowed Tools: `[*]` (wildcard)
|
||||
- `support-client` -> Allowed Tools: `[*]` (wildcard)
|
||||
|
||||
**Result:**
|
||||
A request with this Virtual Key can access all four tools: `create-invoice`, `check-status`, `create-ticket`, and `get-faq`.
|
||||
|
||||
</Tab>
|
||||
<Tab title="VK with Partial Access">
|
||||
**Configuration:**
|
||||
- `billing-client` -> Allowed Tools: `[check-status]`
|
||||
- `support-client` -> Not configured
|
||||
|
||||
**Result:**
|
||||
A request with this Virtual Key can only access the `check-status` tool. All other tools are blocked.
|
||||
|
||||
</Tab>
|
||||
<Tab title="VK with No Tools">
|
||||
**Configuration:**
|
||||
- `billing-client` -> Allowed Tools: `[]` (empty list)
|
||||
|
||||
**Result:**
|
||||
A request with this Virtual Key cannot access any tools. All tools from all clients are blocked.
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
When a Virtual Key has MCP configurations, Bifrost enforces the allow-list at both inference time and MCP tool execution time. Auto-injection of the `x-bf-mcp-include-tools` header is skipped if the caller already provides it or if `disable_auto_tool_inject` is enabled — but the VK's restrictions are always applied regardless. You can still use the `x-bf-mcp-include-clients` header to filter MCP clients per request.
|
||||
</Note>
|
||||
166
docs/features/governance/required-headers.mdx
Normal file
166
docs/features/governance/required-headers.mdx
Normal file
@@ -0,0 +1,166 @@
|
||||
---
|
||||
title: "Required Headers"
|
||||
description: "Enforce mandatory headers on every request through governance."
|
||||
icon: "shield-check"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Required headers let you enforce that specific HTTP headers are present on every LLM and MCP request passing through Bifrost. If a request is missing any required header, the governance plugin rejects it with a **400 Bad Request** error before it reaches the provider.
|
||||
|
||||
This is useful for:
|
||||
- **Tenant isolation** - Require `X-Tenant-ID` to identify the calling tenant
|
||||
- **Audit trails** - Require `X-Correlation-ID` for request tracing across services
|
||||
- **Custom routing metadata** - Require headers your infrastructure depends on
|
||||
|
||||
<Note>
|
||||
Required headers validation requires **governance to be enabled**. The check runs in both `PreLLMHook` and `PreMCPHook`, so it applies to all inference and MCP tool execution requests.
|
||||
</Note>
|
||||
|
||||
Header matching is **case-insensitive** — configuring `X-Tenant-ID` will match `x-tenant-id`, `X-TENANT-ID`, or any other casing.
|
||||
|
||||
---
|
||||
|
||||
## How it works
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
A[Request] --> B{All required<br/>headers present?}
|
||||
B -->|Yes| C[Continue to<br/>governance evaluation]
|
||||
B -->|No| D[400 Bad Request<br/>missing_required_headers]
|
||||
```
|
||||
|
||||
When a request arrives:
|
||||
1. The HTTP transport middleware stores all request headers in the Bifrost context (lowercased keys)
|
||||
2. The governance plugin's `PreLLMHook` / `PreMCPHook` checks for each required header
|
||||
3. If any are missing, the request is rejected immediately with a `400` status and a JSON error listing the missing headers
|
||||
|
||||
**Example error response:**
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"message": "missing required headers: x-tenant-id, x-correlation-id",
|
||||
"type": "missing_required_headers"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
<Tabs group="config-method">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Navigate to **Config** > **Security Settings**
|
||||
2. Ensure **Governance** is enabled (the required headers section only appears when governance is active)
|
||||
3. Scroll to **Required Headers**
|
||||
|
||||

|
||||
|
||||
4. Enter a comma-separated list of header names (e.g., `X-Tenant-ID, X-Correlation-ID`)
|
||||
5. Click **Save Changes**
|
||||
|
||||
Changes take effect immediately — no restart required.
|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
Include `required_headers` in the `client_config` when updating the configuration:
|
||||
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/config \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"client_config": {
|
||||
"required_headers": ["X-Tenant-ID", "X-Correlation-ID"]
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
To clear required headers, pass an empty array:
|
||||
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/config \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"client_config": {
|
||||
"required_headers": []
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
Add `required_headers` to the `client` section:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"required_headers": ["X-Tenant-ID", "X-Correlation-ID"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `required_headers` | `string[]` | No | List of header names that must be present on every request. Case-insensitive. |
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Requiring a tenant header
|
||||
|
||||
Configure a single required header to enforce tenant identification:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"required_headers": ["X-Tenant-ID"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Valid request:**
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-Tenant-ID: tenant-123" \
|
||||
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
|
||||
```
|
||||
|
||||
**Rejected request** (missing header):
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
|
||||
# → 400: missing required headers: x-tenant-id
|
||||
```
|
||||
|
||||
### Combining with virtual keys
|
||||
|
||||
Required headers work alongside virtual key enforcement. When both are configured, the governance plugin checks required headers first, then validates the virtual key:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"enforce_auth_on_inference": true,
|
||||
"required_headers": ["X-Tenant-ID"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A request must include **both** the virtual key header and `X-Tenant-ID` to pass governance.
|
||||
|
||||
---
|
||||
|
||||
## Next steps
|
||||
|
||||
- **[Virtual Keys](./virtual-keys)** - Set up access control with virtual keys
|
||||
- **[Budget and Limits](./budget-and-limits)** - Configure budgets and rate limits
|
||||
- **[Routing](./routing)** - Route requests based on headers and other criteria
|
||||
337
docs/features/governance/routing.mdx
Normal file
337
docs/features/governance/routing.mdx
Normal file
@@ -0,0 +1,337 @@
|
||||
---
|
||||
title: "Routing"
|
||||
description: "Direct requests to specific AI models, providers, and keys using Virtual Keys."
|
||||
icon: "arrow-progress"
|
||||
---
|
||||
|
||||
<Info>
|
||||
**Looking for comprehensive provider routing documentation?**
|
||||
|
||||
For a detailed guide covering governance-based routing, adaptive load balancing, Model Catalog, and how they interact, see the [**Provider Routing Guide**](/providers/provider-routing).
|
||||
|
||||
This page focuses specifically on configuring governance routing via Virtual Keys.
|
||||
</Info>
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost's governance-based routing capabilities offer granular control over how requests are directed to different AI models and providers through Virtual Key configuration. By configuring routing rules on a Virtual Key, you can enforce which providers and models are accessible, implement weighted load balancing strategies, create automatic fallbacks, and restrict access to specific provider API keys.
|
||||
|
||||
This powerful feature enables key use cases like:
|
||||
|
||||
- **Resilience & Failover**: Automatically fall back to a secondary provider if the primary one fails.
|
||||
- **Environment Separation**: Dedicate specific virtual keys to development, testing, and production environments with different provider and key access.
|
||||
- **Cost Management**: Route traffic to cheaper models or providers based on weights to optimize costs.
|
||||
- **Fine-grained Access Control**: Ensure that different teams or applications only use the models and API keys they are explicitly permitted to.
|
||||
|
||||
## Provider/Model Restrictions
|
||||
|
||||
Virtual Keys can be restricted to use only specific provider/models. When provider/model restrictions are configured, the VK can only access those designated provider/models, providing fine-grained control over which provider/models different users or applications can utilize.
|
||||
|
||||
**How It Works:**
|
||||
- **No Provider Configs** (default): VK **blocks all providers** (deny-by-default). You must add provider configurations to allow traffic.
|
||||
- **With Provider Configs**: VK limited to only the specified provider/models. Configured providers participate in weighted load balancing only if their `weight` is set to a numeric value, while providers with `weight: null` remain configured but are opted out of weighted selection.
|
||||
|
||||
**Model Validation:**
|
||||
When you configure provider restrictions on a Virtual Key, Bifrost validates that the requested model is allowed for the selected provider:
|
||||
- **`allowed_models: ["*"]`**: Allow all models supported by the provider (uses the Model Catalog for validation).
|
||||
- **Empty `allowed_models`**: **Deny all** models (deny-by-default).
|
||||
- **Explicit model list**: Only those specific models are permitted.
|
||||
- **Model Catalog Sync**: On startup and provider updates, Bifrost calls each provider's list models API. If this fails, you'll see a warning: `{"level":"warn","message":"failed to list models for provider <name>: failed to execute HTTP request to provider API"}`
|
||||
|
||||
<Note>
|
||||
**Cross-provider routing does NOT happen automatically**. For example, requests for `gpt-4o` will NOT be routed to Anthropic unless you explicitly add `"gpt-4o"` to Anthropic's `allowed_models` in the Virtual Key configuration. Each provider only handles models it actually supports (determined by the Model Catalog).
|
||||
</Note>
|
||||
|
||||
## Weighted Load Balancing
|
||||
|
||||
When you configure multiple providers on a Virtual Key, Bifrost automatically implements weighted load balancing. Each provider can be assigned a weight, and requests are distributed proportionally. The `weight` field is optional — omitting it (or setting it to `null`) excludes the provider from weighted selection while still allowing it to be used for direct `provider/model` requests or as a fallback.
|
||||
|
||||
**Example Configuration:**
|
||||
```
|
||||
Virtual Key: vk-prod-main
|
||||
├── OpenAI
|
||||
│ ├── Allowed Models: [gpt-4o, gpt-4o-mini] ← Explicit whitelist
|
||||
│ └── Weight: 0.2 (20% of traffic)
|
||||
└── Azure
|
||||
├── Allowed Models: [gpt-4o] ← Explicit whitelist
|
||||
└── Weight: 0.8 (80% of traffic)
|
||||
```
|
||||
|
||||
**Load Balancing Behavior:**
|
||||
- For `gpt-4o`: 80% Azure, 20% OpenAI (both providers have it in allowed_models)
|
||||
- For `gpt-4o-mini`: 100% OpenAI (only OpenAI has it in allowed_models)
|
||||
- For `claude-3-sonnet`: ❌ Rejected (neither provider has it in allowed_models)
|
||||
|
||||
**Usage:**
|
||||
To trigger weighted load balancing, send requests with just the model name:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-prod-main" \
|
||||
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
```
|
||||
|
||||
To bypass load balancing and target a specific provider:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-prod-main" \
|
||||
-d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
```
|
||||
|
||||
<Info>
|
||||
Weights are automatically normalized to a sum 1.0 based on the weights of all providers available on the VK for the given model.
|
||||
</Info>
|
||||
|
||||
**Example with Wildcard `allowed_models` (allow all via Model Catalog):**
|
||||
```json
|
||||
{
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"allowed_models": ["*"], // Allow all — uses Model Catalog for validation
|
||||
"weight": 0.5
|
||||
},
|
||||
{
|
||||
"provider": "anthropic",
|
||||
"allowed_models": ["*"], // Allow all — uses Model Catalog for validation
|
||||
"weight": 0.5
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
With this configuration:
|
||||
- Request for `gpt-4o` → Routed to OpenAI (Model Catalog shows OpenAI supports this)
|
||||
- Request for `claude-3-sonnet` → Routed to Anthropic (Model Catalog shows Anthropic supports this)
|
||||
- Request for `gpt-4o` will NOT route to Anthropic (Model Catalog shows Anthropic doesn't support OpenAI models)
|
||||
|
||||
## Automatic Fallbacks
|
||||
|
||||
When multiple providers are configured on a Virtual Key, Bifrost automatically creates fallback chains for resilience. This feature provides automatic failover without manual intervention.
|
||||
|
||||
**How It Works:**
|
||||
- **Only activated when**: Your request has no existing `fallbacks` array in the request body
|
||||
- **Fallback creation**: Providers are sorted by weight (highest first) and added as fallbacks
|
||||
- **Respects existing fallbacks**: If you manually specify fallbacks, they are preserved
|
||||
|
||||
**Example Request Flow:**
|
||||
1. Primary request goes to weighted-selected provider (e.g., Azure with 80% weight)
|
||||
2. If Azure fails, automatically retry with OpenAI
|
||||
3. Continue until success or all providers exhausted
|
||||
|
||||
**Request with automatic fallbacks:**
|
||||
```bash
|
||||
# This request will get automatic fallbacks
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-prod-main" \
|
||||
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
```
|
||||
|
||||
**Request with manual fallbacks (no automatic fallbacks added):**
|
||||
```bash
|
||||
# This request keeps your specified fallbacks
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-prod-main" \
|
||||
-d '{
|
||||
"model": "gpt-4o",
|
||||
"messages": [{"role": "user", "content": "Hello!"}],
|
||||
"fallbacks": ["anthropic/claude-3-sonnet-20240229"]
|
||||
}'
|
||||
```
|
||||
|
||||
## Setting Provider/Model Routing
|
||||
|
||||
<Tabs group="provider-model-restrictions">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Go to **Virtual Keys**
|
||||
2. Create/Edit virtual key
|
||||
|
||||

|
||||
|
||||
3. In **Provider Configurations** section, add the provider you want to restrict the VK to
|
||||
4. **Allowed Models**:
|
||||
- **Specify models**: Enter specific models (e.g., `["gpt-4o", "gpt-4o-mini"]`) to explicitly whitelist only those models
|
||||
- **`["*"]`**: Allow all models (uses the Model Catalog for validation).
|
||||
- **Leave blank**: Deny all models (deny-by-default).
|
||||
5. Optionally add a weight for this provider (numeric value for weighted load balancing, or leave blank to exclude from weighted routing while keeping the provider available for direct requests and fallbacks)
|
||||
6. Click on the **Save** button
|
||||
</Tab>
|
||||
|
||||
<Tab title="API">
|
||||
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
|
||||
"weight": 0.2
|
||||
},
|
||||
{
|
||||
"provider": "azure",
|
||||
"allowed_models": ["gpt-4o"],
|
||||
"weight": 0.8
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-prod-main",
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
|
||||
"weight": 0.2
|
||||
},
|
||||
{
|
||||
"provider": "azure",
|
||||
"allowed_models": ["gpt-4o"],
|
||||
"weight": 0.8
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
## API Key Restrictions
|
||||
|
||||
Virtual Keys can be restricted to use only specific provider API keys. When key restrictions are configured, the VK can only access those designated keys, providing fine-grained control over which API keys different users or applications can utilize.
|
||||
|
||||
**How It Works:**
|
||||
- **No Restrictions** (`key_ids: ["*"]`): VK can use any available provider keys based on load balancing
|
||||
- **With Restrictions**: VK limited to only the specified key IDs, regardless of other available keys
|
||||
- **All Blocked** (`key_ids: []` or field omitted): VK cannot use any provider keys (deny-by-default)
|
||||
|
||||
**Example Scenario:**
|
||||
```
|
||||
Available Provider Keys:
|
||||
├── key-prod-001 → sk-prod-key... (Production OpenAI key)
|
||||
├── key-dev-002 → sk-dev-key... (Development OpenAI key)
|
||||
└── key-test-003 → sk-test-key... (Testing OpenAI key)
|
||||
|
||||
Virtual Key Restrictions:
|
||||
├── vk-prod-main
|
||||
│ ├── Allowed Models: [gpt-4o]
|
||||
│ └── Restricted Keys: [key-prod-001] ← ONLY production key
|
||||
├── vk-dev-main
|
||||
│ ├── Allowed Models: [gpt-4o-mini]
|
||||
│ └── Restricted Keys: [key-dev-002, key-test-003] ← Dev + test keys
|
||||
└── vk-unrestricted
|
||||
├── Allowed Models: ["*"] ← All models via catalog
|
||||
└── Restricted Keys: ["*"] ← Can use ANY available key
|
||||
```
|
||||
|
||||
**Request Behavior:**
|
||||
```bash
|
||||
# Production VK - will ONLY use key-prod-001
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-prod-main" \
|
||||
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
|
||||
# Development VK - will load balance between key-dev-002 and key-test-003
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-dev-main" \
|
||||
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
|
||||
# VK with key_ids: ["*"] - can use any available OpenAI key
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-unrestricted" \
|
||||
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
```
|
||||
|
||||
**Setting API Key Restrictions:**
|
||||
|
||||
<Tabs group="api-key-restrictions">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Go to **Virtual Keys**
|
||||
2. Create/Edit virtual key
|
||||
|
||||

|
||||
|
||||
3. In **Allowed Keys** section, select the API key you want to restrict the VK to
|
||||
4. Click on the **Save** button
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="API">
|
||||
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"key_ids": ["key-prod-001"]
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-prod-main",
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"key_ids": [
|
||||
"key-prod-001"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
**Use Cases:**
|
||||
- **Environment Separation** - Production VKs use production keys, dev VKs use dev keys
|
||||
- **Cost Control** - Different teams use keys with different billing accounts
|
||||
- **Access Control** - Restrict sensitive keys to specific VKs only
|
||||
- **Compliance** - Ensure certain workloads only use compliant/audited keys
|
||||
|
||||
<Note>The models restrictions applied on the keys of individual providers will always be applied and will work together with the provider/model or api key restrictions set on the virtual key.</Note>
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Model Catalog Sync Failures
|
||||
|
||||
If you see warnings like this in your Bifrost logs during startup or provider updates:
|
||||
```json
|
||||
{"level":"warn","time":"2026-01-13T14:18:53+05:30","message":"failed to list models for provider ollama: failed to execute HTTP request to provider API"}
|
||||
```
|
||||
|
||||
**What this means:**
|
||||
- Bifrost attempted to call the provider's list models API to populate the Model Catalog
|
||||
- The request failed (network issue, provider unavailable, incorrect credentials, etc.)
|
||||
- If your Virtual Key has `allowed_models: []` (empty) for this provider, **all models will be denied**. Use `["*"]` to allow all models.
|
||||
|
||||
**How to fix:**
|
||||
1. Check that the provider is correctly configured and accessible
|
||||
2. Verify network connectivity to the provider's API
|
||||
3. Ensure API credentials are valid
|
||||
4. Use `allowed_models: ["*"]` to allow all models, or specify an explicit list for critical providers
|
||||
698
docs/features/governance/virtual-keys.mdx
Normal file
698
docs/features/governance/virtual-keys.mdx
Normal file
@@ -0,0 +1,698 @@
|
||||
---
|
||||
title: "Virtual Keys"
|
||||
description: "Virtual keys are a way to manage access to your AI models."
|
||||
icon: "key"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Virtual Keys are the primary governance entity in Bifrost. Users and applications authenticate using the given headers to access virtual keys and get specific access permissions, budgets, and rate limits.
|
||||
|
||||
**Allowed Headers:**
|
||||
- `x-bf-vk` - Virtual key header, eg. `sk-bf-*`
|
||||
- `Authorization` - Authorization header, eg. `Bearer sk-bf-*` (OpenAI style)
|
||||
- `x-api-key` - API key header, eg. `sk-bf-*` (Anthropic style)
|
||||
- `x-goog-api-key` - API key header, eg. `sk-bf-*` (Google Gemini style)
|
||||
|
||||
<Note>Old virtual keys(without `sk-bf-*` prefix) are only supported by `x-bf-vk` header.</Note>
|
||||
|
||||
<Info>You can also use `Authorization`, `x-api-key` and `x-goog-api-key` headers to pass direct keys to the provider. Read more about it in [Direct Key Bypass](../keys-management#direct-key-bypass).</Info>
|
||||
|
||||
**Key Features:**
|
||||
- **Access Control** - Model and provider filtering
|
||||
- **Cost Management** - Independent budgets (checked along with team/customer budgets if attached)
|
||||
- **Rate Limiting** - Token and request-based throttling (VK-level only)
|
||||
- **Key Restrictions** - Limit VK to specific provider API keys (if configured, VK can only use those keys)
|
||||
- **Exclusive Attachment** - Belongs to either one team OR one customer OR neither (mutually exclusive)
|
||||
- **Active/Inactive Status** - Enable/disable access instantly
|
||||
|
||||
## Configuration
|
||||
|
||||
<Tabs group="config-method">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Go to **Virtual Keys**
|
||||
2. Click on **Add Virtual Key** button
|
||||
|
||||

|
||||
|
||||
**Budget Settings:**
|
||||
- **Max Limit**: Dollar amount (e.g., `10.50`)
|
||||
- **Reset Duration**: `1m`, `1h`, `1d`, `1w`, `1M`, `1Y`
|
||||
- **Calendar aligned** (optional): When enabled, the budget resets at calendar boundaries in UTC (day/week/month/year) instead of on a rolling window. Only applies to day/week/month/year periods. See [Budget and Limits](./budget-and-limits#calendar-aligned-budgets).
|
||||
|
||||
**Rate Limits:**
|
||||
- **Token Limit**: Max tokens per period
|
||||
- **Request Limit**: Max requests per period
|
||||
- **Reset Duration**: Reset frequency for each limit
|
||||
|
||||
**Associations:**
|
||||
- **Team**: Assign to existing team (mutually exclusive with customer)
|
||||
- **Customer**: Assign to existing customer (mutually exclusive with team)
|
||||
|
||||
3. Click **Create Virtual Key**
|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
**Create Virtual Key (attached to team):**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/governance/virtual-keys \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Engineering Team API",
|
||||
"description": "Main API key for engineering team",
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["gpt-4o-mini"]
|
||||
},
|
||||
{
|
||||
"provider": "anthropic",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["claude-3-sonnet-20240229"]
|
||||
}
|
||||
],
|
||||
"team_id": "team-eng-001",
|
||||
"budget": {
|
||||
"max_limit": 100.00,
|
||||
"reset_duration": "1M"
|
||||
},
|
||||
"rate_limit": {
|
||||
"token_max_limit": 10000,
|
||||
"token_reset_duration": "1h",
|
||||
"request_max_limit": 100,
|
||||
"request_reset_duration": "1m"
|
||||
},
|
||||
"key_ids": ["8c52039e-38c6-48b2-8016-0bd884b7befb"],
|
||||
"is_active": true
|
||||
}'
|
||||
```
|
||||
|
||||
**Create Virtual Key (directly attached to customer):**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/governance/virtual-keys \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Executive API Key",
|
||||
"description": "Direct customer-level API access",
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["gpt-4o"]
|
||||
},
|
||||
{
|
||||
"provider": "anthropic",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["claude-3-opus-20240229"]
|
||||
}
|
||||
],
|
||||
"customer_id": "customer-acme-corp",
|
||||
"budget": {
|
||||
"max_limit": 500.00,
|
||||
"reset_duration": "1M"
|
||||
},
|
||||
"is_active": true
|
||||
}'
|
||||
```
|
||||
|
||||
> **Note**:
|
||||
> - `team_id` and `customer_id` are mutually exclusive - a VK can only belong to one team OR one customer, not both.
|
||||
> - `key_ids` restricts the VK to only use those specific provider API keys. Use `["*"]` to allow access to all available keys. An empty array `[]` or omitting the field entirely denies all keys.
|
||||
|
||||
**Update Virtual Key:**
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"description": "Updated description",
|
||||
"budget": {
|
||||
"max_limit": 150.00,
|
||||
"reset_duration": "1M"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
**Get Virtual Keys:**
|
||||
```bash
|
||||
# List all virtual keys
|
||||
curl http://localhost:8080/api/governance/virtual-keys
|
||||
|
||||
# Get specific virtual key
|
||||
curl http://localhost:8080/api/governance/virtual-keys/{vk_id}
|
||||
```
|
||||
|
||||
**Delete Virtual Key:**
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8080/api/governance/virtual-keys/{vk_id}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"enforce_auth_on_inference": true
|
||||
},
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-001",
|
||||
"name": "Engineering Team API",
|
||||
"value": "sk-bf-*",
|
||||
"description": "Main API key for engineering team",
|
||||
"is_active": true,
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["gpt-4o-mini"],
|
||||
"key_ids": ["openai-primary"]
|
||||
},
|
||||
{
|
||||
"provider": "anthropic",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["claude-3-sonnet-20240229"]
|
||||
}
|
||||
],
|
||||
"team_id": "team-eng-001",
|
||||
"rate_limit_id": "rate-limit-eng-vk"
|
||||
},
|
||||
{
|
||||
"id": "vk-002",
|
||||
"name": "Executive API Key",
|
||||
"value": "vk-executive-direct",
|
||||
"description": "Direct customer-level API access",
|
||||
"is_active": true,
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["gpt-4o"]
|
||||
},
|
||||
{
|
||||
"provider": "anthropic",
|
||||
"weight": 0.5,
|
||||
"allowed_models": ["claude-3-opus-20240229"]
|
||||
}
|
||||
],
|
||||
"customer_id": "customer-acme-corp"
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{
|
||||
"id": "budget-eng-vk",
|
||||
"virtual_key_id": "vk-001",
|
||||
"max_limit": 100.00,
|
||||
"reset_duration": "1M",
|
||||
"current_usage": 0.0,
|
||||
"last_reset": "2025-01-01T00:00:00Z"
|
||||
},
|
||||
{
|
||||
"id": "budget-exec-vk",
|
||||
"virtual_key_id": "vk-002",
|
||||
"max_limit": 500.00,
|
||||
"reset_duration": "1M",
|
||||
"current_usage": 0.0,
|
||||
"last_reset": "2025-01-01T00:00:00Z"
|
||||
}
|
||||
],
|
||||
"rate_limits": [
|
||||
{
|
||||
"id": "rate-limit-eng-vk",
|
||||
"token_max_limit": 10000,
|
||||
"token_reset_duration": "1h",
|
||||
"token_current_usage": 0,
|
||||
"token_last_reset": "2025-01-01T00:00:00Z",
|
||||
"request_max_limit": 100,
|
||||
"request_reset_duration": "1m",
|
||||
"request_current_usage": 0,
|
||||
"request_last_reset": "2025-01-01T00:00:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## User Groups
|
||||
|
||||
### Teams
|
||||
|
||||
Teams provide organizational grouping for virtual keys with department-level budget management. Teams can belong to one customer and have their own independent budget allocation.
|
||||
|
||||
**Key Features:**
|
||||
- **Organizational Structure** - Group multiple virtual keys
|
||||
- **Independent Budgets** - Department-level cost control (separate from customer budgets)
|
||||
- **Customer Association** - Can belong to one customer (optional)
|
||||
- **No Rate Limits** - Teams cannot have rate limits (VK-level only)
|
||||
|
||||
**Configuration**
|
||||
|
||||
<Tabs group="config-method">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Go to **Users & Groups** → **Teams**
|
||||
|
||||
2. Click on **Add Team** button
|
||||
|
||||

|
||||
|
||||
Fill the form and click on **Create Team** button
|
||||
|
||||
3. **Assign Virtual Keys to Team**
|
||||
- Go to **Virtual Keys** page
|
||||
- Edit the virtual key and assign it to the team
|
||||
- Click on **Save** button
|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
**Create Team:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/governance/teams \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Engineering Team",
|
||||
"customer_id": "customer-acme-corp",
|
||||
"budget": {
|
||||
"max_limit": 500.00,
|
||||
"reset_duration": "1M"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
**Update Team:**
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/governance/teams/{team_id} \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Updated Engineering Team",
|
||||
"budget": {
|
||||
"max_limit": 750.00,
|
||||
"reset_duration": "1M"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
**Get Teams:**
|
||||
```bash
|
||||
# List all teams
|
||||
curl http://localhost:8080/api/governance/teams
|
||||
|
||||
# Get specific team
|
||||
curl http://localhost:8080/api/governance/teams/{team_id}
|
||||
```
|
||||
|
||||
**Delete Team:**
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8080/api/governance/teams/{team_id}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"teams": [
|
||||
{
|
||||
"id": "team-eng-001",
|
||||
"name": "Engineering Team",
|
||||
"customer_id": "customer-acme-corp",
|
||||
"budget_id": "budget-team-eng"
|
||||
},
|
||||
{
|
||||
"id": "team-sales-001",
|
||||
"name": "Sales Team",
|
||||
"customer_id": "customer-acme-corp",
|
||||
"budget_id": "budget-team-sales"
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{
|
||||
"id": "budget-team-eng",
|
||||
"max_limit": 500.00,
|
||||
"reset_duration": "1M",
|
||||
"current_usage": 0.0,
|
||||
"last_reset": "2025-01-01T00:00:00Z"
|
||||
},
|
||||
{
|
||||
"id": "budget-team-sales",
|
||||
"max_limit": 250.00,
|
||||
"reset_duration": "1M",
|
||||
"current_usage": 0.0,
|
||||
"last_reset": "2025-01-01T00:00:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Customers
|
||||
|
||||
Customers represent the highest level in the governance hierarchy, typically corresponding to organizations or major business units. They provide top-level budget control and organizational structure.
|
||||
|
||||
**Key Features:**
|
||||
- **Top-Level Organization** - Highest hierarchy level
|
||||
- **Independent Budgets** - Organization-wide cost control (separate from team/VK budgets)
|
||||
- **Team Management** - Contains multiple teams and direct VKs
|
||||
- **No Rate Limits** - Customers cannot have rate limits (VK-level only)
|
||||
|
||||
**Configuration**
|
||||
|
||||
<Tabs group="config-method">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Go to **Users & Groups** → **Customers**
|
||||
|
||||
2. Click on **Add Customer** button
|
||||
|
||||

|
||||
|
||||
Fill the form and click on **Create Customer** button
|
||||
|
||||
3. **Assign Teams to Customer**
|
||||
- Go to **Teams** page
|
||||
- Edit the team and assign it to the customer
|
||||
- Click on **Save** button
|
||||
|
||||
4. **Assign Virtual Keys to Customer**
|
||||
- Go to **Virtual Keys** page
|
||||
- Edit the virtual key and assign it to the customer
|
||||
- Click on **Save** button
|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
**Create Customer:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/governance/customers \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Acme Corporation",
|
||||
"budget": {
|
||||
"max_limit": 2000.00,
|
||||
"reset_duration": "1M"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
**Update Customer:**
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/governance/customers/{customer_id} \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Acme Corp (Updated)",
|
||||
"budget": {
|
||||
"max_limit": 2500.00,
|
||||
"reset_duration": "1M"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
**Get Customers:**
|
||||
```bash
|
||||
# List all customers
|
||||
curl http://localhost:8080/api/governance/customers
|
||||
|
||||
# Get specific customer
|
||||
curl http://localhost:8080/api/governance/customers/{customer_id}
|
||||
```
|
||||
|
||||
**Delete Customer:**
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8080/api/governance/customers/{customer_id}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"customers": [
|
||||
{
|
||||
"id": "customer-acme-corp",
|
||||
"name": "Acme Corporation",
|
||||
"budget_id": "budget-customer-acme"
|
||||
},
|
||||
{
|
||||
"id": "customer-beta-inc",
|
||||
"name": "Beta Inc",
|
||||
"budget_id": "budget-customer-beta"
|
||||
}
|
||||
],
|
||||
"budgets": [
|
||||
{
|
||||
"id": "budget-customer-acme",
|
||||
"max_limit": 2000.00,
|
||||
"reset_duration": "1M",
|
||||
"current_usage": 0.0,
|
||||
"last_reset": "2025-01-01T00:00:00Z"
|
||||
},
|
||||
{
|
||||
"id": "budget-customer-beta",
|
||||
"max_limit": 1500.00,
|
||||
"reset_duration": "1M",
|
||||
"current_usage": 0.0,
|
||||
"last_reset": "2025-01-01T00:00:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Features
|
||||
|
||||
- **[Budget and Limits](./budget-and-limits)** - Enterprise-grade budget management and cost control and rate limiting using virtual keys
|
||||
- **[Routing](./routing)** - Route requests to the appropriate providers/models and restrict api keys using virtual keys
|
||||
- **[MCP Tool Filtering](./mcp-tools)** - Manage MCP clients/tools for virtual keys
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
### Making Virtual Keys Mandatory
|
||||
|
||||
All governance-enabled requests must include the virtual key header:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-bf-vk: sk-bf-*" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}'
|
||||
```
|
||||
|
||||
By default governance is optional, meaning that if the virtual key header is not present, the request will be allowed but without any governance checks/routing. But you can make it mandatory by enforcing the virtual key header.
|
||||
|
||||
<Tabs group="enforce-governance-header">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Go to **Config** → **Security**
|
||||
|
||||
2. Check the **Enforce Virtual Keys** checkbox
|
||||
|
||||

|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/config \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"client_config": {
|
||||
"enforce_auth_on_inference": true
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"enforce_auth_on_inference": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
When the governance header is enforced, the request will be rejected if the `x-bf-vk` header is not present.
|
||||
|
||||
### Authentication and Virtual Keys
|
||||
|
||||
Virtual keys and HTTP authentication are **independent layers** that can work together:
|
||||
|
||||
| Layer | Purpose | Headers |
|
||||
|-------|---------|---------|
|
||||
| **Authentication** | Validates user identity | `Authorization: Basic/Bearer <credentials>` |
|
||||
| **Virtual Keys** | Request routing and governance | `x-bf-vk`, `Authorization`[^1], `x-api-key`, `x-goog-api-key` |
|
||||
|
||||
[^1]: Authorization can carry virtual keys only when auth is disabled (`disable_auth_on_inference: true`). When auth is enabled, Authorization is consumed by authentication and cannot be used for virtual keys.
|
||||
|
||||
**When `disable_auth_on_inference: true` (auth disabled):**
|
||||
|
||||
Virtual keys can be passed via any supported header without additional authentication:
|
||||
|
||||
```bash
|
||||
# Using x-bf-vk header
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: <VIRTUAL_KEY>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gpt-4o-mini", "messages": [...]}'
|
||||
|
||||
# Using Authorization header (OpenAI style)
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "Authorization: Bearer <VIRTUAL_KEY>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gpt-4o-mini", "messages": [...]}'
|
||||
```
|
||||
|
||||
**When `disable_auth_on_inference: false` (auth enabled):**
|
||||
|
||||
You must provide both authentication credentials AND the virtual key. Use `x-bf-vk` for the virtual key since the `Authorization` header is used for authentication:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "Authorization: Basic <base64-credentials>" \
|
||||
-H "x-bf-vk: <VIRTUAL_KEY>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gpt-4o-mini", "messages": [...]}'
|
||||
```
|
||||
|
||||
**Configuring `disable_auth_on_inference`:**
|
||||
|
||||
<Tabs group="config-method">
|
||||
<Tab title="Web UI">
|
||||
|
||||
1. Go to **Config** → **Security**
|
||||
2. Toggle **Disable Auth on Inference** to enable/disable
|
||||
|
||||

|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
```bash
|
||||
curl -X PUT http://localhost:8080/api/config \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"auth_config": {
|
||||
"disable_auth_on_inference": true
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"auth_config": {
|
||||
"is_enabled": true,
|
||||
"disable_auth_on_inference": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Error Responses
|
||||
|
||||
- Virtual Key Not Found (400)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "virtual_key_required",
|
||||
"message": "virtual key is missing in headers"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Virtual Key Blocked (403)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "virtual_key_blocked",
|
||||
"message": "Virtual key is inactive"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Rate Limit Exceeded (429)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "rate_limited",
|
||||
"message": "Rate limits exceeded: [token limit exceeded (1500/1000, resets every 1h)]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Token Limit Exceeded (429)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "token_limited",
|
||||
"message": "Rate limits exceeded: [token limit exceeded (1500/1000, resets every 1h)]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Request Limit Exceeded (429)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "request_limited",
|
||||
"message": "Rate limits exceeded: [request limit exceeded (101/100, resets every 1m)]"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Budget Exceeded (402)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "budget_exceeded",
|
||||
"message": "Budget exceeded: VK budget exceeded: 105.50 > 100.00 dollars"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Model Not Allowed (403)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "model_blocked",
|
||||
"message": "Model 'gpt-4o' is not allowed for this virtual key"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- Provider Not Allowed (403)
|
||||
```json
|
||||
{
|
||||
"error": {
|
||||
"type": "provider_blocked",
|
||||
"message": "Provider 'anthropic' is not allowed for this virtual key"
|
||||
}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user