first commit

This commit is contained in:
Beyhan Oğur
2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions

View File

@@ -0,0 +1,599 @@
---
title: "Budget and Limits"
description: "Enterprise-grade budget management and cost control with hierarchical budget allocation through virtual keys, teams, and customers."
icon: "money-bills"
---
## Overview
Budgeting and rate limiting are a core feature of Bifrost's governance system managed through [Virtual Keys](./virtual-keys).
Bifrost's budget management system provides comprehensive cost control and financial governance for enterprise AI deployments. It operates through a **hierarchical budget structure** that enables granular cost management, usage tracking, and financial oversight across your entire organization.
**Core Hierarchy:**
```
Customer (has independent budget)
↓ (one-to-many)
Team (has independent budget)
↓ (one-to-many)
Virtual Key (has independent budget + rate limits)
↓ (one-to-many)
Provider Config (has independent budget + rate limits)
OR
Customer (has independent budget)
↓ (direct attachment)
Virtual Key (has independent budget + rate limits)
↓ (one-to-many)
Provider Config (has independent budget + rate limits)
OR
Virtual Key (standalone - has independent budget + rate limits)
↓ (one-to-many)
Provider Config (has independent budget + rate limits)
```
**Key Capabilities:**
- **Virtual Keys** - Primary access control via `x-bf-vk` header (exclusive team OR customer attachment)
- **Budget Management** - Independent budget limits at each hierarchy level with cumulative checking
- **Rate Limiting** - Request and token-based throttling at both VK and provider config levels
- **Provider-Level Governance** - Granular budgets and rate limits per AI provider within a virtual key
- **Model/Provider Filtering** - Granular access control per virtual key
- **Usage Tracking** - Real-time monitoring and audit trails
- **Audit Headers** - Optional team and customer identification
---
## Budget Management
### Cost Calculation
Bifrost automatically calculates costs based on:
- **Provider Pricing** - Real-time model pricing data
- **Token Usage** - Input + output tokens from API responses
- **Request Type** - Different pricing for chat, text, embedding, speech, transcription
- **Cache Status** - Reduced costs for cached responses
- **Batch Operations** - Volume discounts for batch requests
All cost calculation details are covered in [Architecture > Framework > Model Catalog](../../architecture/framework/model-catalog).
### Budget Checking Flow
When a request is made with a virtual key, Bifrost checks **all applicable budgets independently** in the hierarchy. Each budget must have sufficient remaining balance for the request to proceed.
**Checking Sequence:**
**For VK → Team → Customer:**
```
1. ✓ Provider Config Budget (if provider config has budget)
2. ✓ VK Budget (if VK has budget)
3. ✓ Team Budget (if VK's team has budget)
4. ✓ Customer Budget (if team's customer has budget)
```
**For VK → Customer (direct):**
```
1. ✓ Provider Config Budget (if provider config has budget)
2. ✓ VK Budget (if VK has budget)
3. ✓ Customer Budget (if VK's customer has budget)
```
**For Standalone VK:**
```
1. ✓ Provider Config Budget (if provider config has budget)
2. ✓ VK Budget (if VK has budget)
```
**Important Notes:**
- **All applicable budgets must pass** - any single budget failure blocks the request
- **Budgets are independent** - each tracks its own usage and limits
- **Costs are deducted from all applicable budgets** - same cost applied to each level
- **Rate limits checked at provider config and VK levels** - teams and customers have no rate limits
- **Provider selection** - providers that exceed their budget or rate limits are excluded from [routing](./routing)
**Example:**
```
- Provider config budget: $4/$5 remaining ✓
- VK budget: $9/$10 remaining ✓
- Team budget: $15/$20 remaining ✓
- Customer budget: $45/$50 remaining ✓
- Result: Allowed (no budget is exceeded)
- After request:
- Request cost: $2
- Updated Provider=$6/$5, VK=$11/$10, Team=$17/$20, Customer=$47/$50
- Then the next request will be blocked (both provider and VK budgets exceeded).
```
## Rate Limiting
Rate limits protect your system from abuse and manage traffic by setting thresholds on request frequency and token usage over a specific time window. Rate limits can be configured at **both the Virtual Key level and Provider Config level** for granular control.
Bifrost supports two types of rate limits that work in parallel:
- **Request Limits**: Control the maximum number of API calls that can be made within a set duration (e.g., 100 requests per minute).
- **Token Limits**: Control the maximum number of tokens (prompt + completion) that can be processed within a set duration (e.g., 50,000 tokens per hour).
### Rate Limit Hierarchy
Rate limits are checked in hierarchical order:
```
1. ✓ Provider Config Rate Limits (if provider config has rate limits)
2. ✓ Virtual Key Rate Limits (if VK has rate limits)
```
For a request to be allowed, it must pass both the request limit and token limit checks at **all applicable levels**. If a provider config exceeds its rate limits, that provider is excluded from routing, but other providers within the same virtual key remain available.
### Provider-Level Rate Limiting
Provider configs within a virtual key can have independent rate limits, enabling:
- **Per-Provider Throttling**: Different rate limits for OpenAI vs Anthropic
- **Provider Isolation**: Rate limit violations on one provider don't affect others
- **Granular Control**: Fine-tune limits based on provider capabilities and costs
## Reset Durations
Budgets and rate limits support flexible reset durations:
**Format Examples:**
- `1m` - 1 minute
- `5m` - 5 minutes
- `1h` - 1 hour
- `1d` - 1 day
- `1w` - 1 week
- `1M` - 1 month
- `1Y` - 1 year
**Common Patterns:**
- **Rate Limits**: `1m`, `1h`, `1d` for request throttling
- **Budgets**: `1d`, `1w`, `1M`, `1Y` for cost control
### Calendar-aligned budgets
By default, a budget **rolls**: after `reset_duration` elapses since `last_reset`, usage resets. With **`calendar_aligned`: `true`**, the budget resets at the **start of each calendar period in UTC** instead (same instant for every customer of that configuration).
**Supported `reset_duration` suffixes:** only day (`d`), week (`w`), month (`M`), and year (`Y`). Examples: `1d` → midnight UTC each day; `1w` → Monday 00:00 UTC each week; `1M` → first day of each month; `1Y` → January 1 each year. Sub-day durations (for example `1h`, `30m`) **cannot** use calendar alignment; the API rejects invalid combinations.
Calendar alignment applies to budgets on **customers**, **teams**, **virtual keys**, and **perprovider-config** budgets. You can set it when creating a budget (`calendar_aligned` on create) or toggle it on update (`calendar_aligned` on the budget in `PUT` requests). Turning calendar alignment **on** for an existing budget resets **current usage to zero** and snaps **`last_reset`** to the current period start.
---
## Configuration Guide
Configure provider-level budgets and rate limits using any of these methods:
<Tabs>
<Tab title="Web UI">
The Bifrost Web UI provides an intuitive interface for configuring provider-level governance through the Virtual Keys management page.
### Creating Virtual Keys with Provider Configs
1. **Navigate to Virtual Keys**: Go to **Virtual Keys** page in the Bifrost dashboard
2. **Create New Virtual Key**: Click "Create Virtual Key" button
3. **Configure Providers**: In the "Provider Configurations" section:
- Add multiple providers with individual weights
- Set provider-specific budgets and rate limits
- Configure allowed models per provider
### Provider Configuration Interface
![Virtual Key Provider Configuration Interface](../../media/ui-virtual-key-provider-config.png)
**Key Features:**
- **Visual Provider Cards**: Each provider displays as an expandable card
- **Budget Controls**: Set spending limits with reset periods per provider
- **Rate Limit Controls**: Configure token and request limits independently
- **Model Filtering**: Specify allowed models for each provider
- **Weight Distribution**: Visual indicators for load balancing weights
- **Real-time Validation**: Immediate feedback on configuration errors
### Monitoring Provider Usage
![Provider Usage Sheet](../../media/ui-virtual-key-provider-usage-sheet.png)
The info sheet for the virtual key provides real-time monitoring of:
- Budget consumption per provider
- Rate limit utilization (tokens and requests)
- Provider availability status
- Usage trends and forecasting
</Tab>
<Tab title="API">
Use the Bifrost HTTP API to programmatically manage provider-level governance configurations.
### Create Virtual Key with Provider Configs
```bash
curl -X POST "https://your-bifrost-instance.com/api/governance/virtual-keys" \
-H "Content-Type: application/json" \
-d '{
"name": "marketing-team-vk",
"description": "Marketing team virtual key with provider-specific limits",
"provider_configs": [
{
"provider": "openai",
"weight": 0.7,
"allowed_models": ["gpt-4", "gpt-3.5-turbo"],
"budget": {
"max_limit": 500.00,
"reset_duration": "1M",
"calendar_aligned": true
},
"rate_limit": {
"token_max_limit": 1000000,
"token_reset_duration": "1h",
"request_max_limit": 1000,
"request_reset_duration": "1h"
}
},
{
"provider": "anthropic",
"weight": 0.3,
"allowed_models": ["claude-3-opus", "claude-3-sonnet"],
"budget": {
"max_limit": 200.00,
"reset_duration": "1M"
},
"rate_limit": {
"token_max_limit": 500000,
"token_reset_duration": "1h",
"request_max_limit": 500,
"request_reset_duration": "1h"
}
}
],
"budget": {
"max_limit": 1000.00,
"reset_duration": "1M",
"calendar_aligned": true
},
"is_active": true
}'
```
Use `calendar_aligned` only with `d` / `w` / `M` / `Y` reset durations (see [Calendar-aligned budgets](#calendar-aligned-budgets)).
### Update Provider Configuration
```bash
curl -X PUT "https://your-bifrost-instance.com/api/governance/virtual-keys/{vk_id}" \
-H "Content-Type: application/json" \
-d '{
"provider_configs": [
{
"id": 1,
"provider": "openai",
"weight": 0.8,
"budget": {
"max_limit": 600.00,
"reset_duration": "1M"
},
"rate_limit": {
"token_max_limit": 1200000,
"token_reset_duration": "1h"
}
}
]
}'
```
### API Response Structure
```json
{
"message": "Virtual key created successfully",
"virtual_key": {
"id": "vk_123",
"name": "marketing-team-vk",
"value": "vk_abc123def456",
"provider_configs": [
{
"id": 1,
"provider": "openai",
"weight": 0.7,
"allowed_models": ["gpt-4", "gpt-3.5-turbo"],
"budget": {
"id": "budget_789",
"max_limit": 500.00,
"current_usage": 0.00,
"reset_duration": "1M",
"calendar_aligned": true,
"last_reset": "2024-01-01T00:00:00Z"
},
"rate_limit": {
"id": "rate_limit_456",
"token_max_limit": 1000000,
"token_current_usage": 0,
"token_reset_duration": "1h",
"token_last_reset": "2024-01-01T00:00:00Z",
"request_max_limit": 1000,
"request_current_usage": 0,
"request_reset_duration": "1h",
"request_last_reset": "2024-01-01T00:00:00Z"
}
}
]
}
}
```
### Field Descriptions
| Field | Type | Description |
|-------|------|-------------|
| `provider` | string | AI provider name (e.g., "openai", "anthropic") |
| `weight` | float | Load balancing weight (0.0-1.0) |
| `allowed_models` | array | Specific models allowed for this provider |
| `budget.max_limit` | float | Maximum spend in USD |
| `budget.reset_duration` | string | Reset period (e.g., "1h", "1d", "1M") |
| `budget.calendar_aligned` | boolean | When true, resets at calendar boundaries in UTC (requires `d`/`w`/`M`/`Y` durations) |
| `rate_limit.token_max_limit` | integer | Maximum tokens per period |
| `rate_limit.request_max_limit` | integer | Maximum requests per period |
</Tab>
<Tab title="config.json">
Configure provider-level governance through Bifrost's configuration file for declarative management.
### Basic Configuration Structure
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-dev-001",
"name": "development-team-vk",
"description": "Development team with multi-provider setup",
"is_active": true,
"rate_limit_id": "rl-vk-dev",
"provider_configs": [
{
"id": 1,
"provider": "openai",
"weight": 0.6,
"allowed_models": ["gpt-4", "gpt-3.5-turbo"],
"rate_limit_id": "rl-pc-openai"
},
{
"id": 2,
"provider": "anthropic",
"weight": 0.4,
"allowed_models": ["claude-3-opus", "claude-3-sonnet"],
"rate_limit_id": "rl-pc-anthropic"
}
]
}
],
"budgets": [
{
"id": "budget-vk-dev",
"virtual_key_id": "vk-dev-001",
"max_limit": 2000.00,
"reset_duration": "1M",
"calendar_aligned": true
},
{
"id": "budget-pc-openai",
"provider_config_id": 1,
"max_limit": 1000.00,
"reset_duration": "1M"
},
{
"id": "budget-pc-anthropic",
"provider_config_id": 2,
"max_limit": 500.00,
"reset_duration": "1M"
}
],
"rate_limits": [
{
"id": "rl-vk-dev",
"token_max_limit": 5000000,
"token_reset_duration": "1h",
"request_max_limit": 3000,
"request_reset_duration": "1h"
},
{
"id": "rl-pc-openai",
"token_max_limit": 2000000,
"token_reset_duration": "1h",
"request_max_limit": 2000,
"request_reset_duration": "1h"
},
{
"id": "rl-pc-anthropic",
"token_max_limit": 1000000,
"token_reset_duration": "1h",
"request_max_limit": 1000,
"request_reset_duration": "1h"
}
]
}
}
```
Budgets and rate limits live as **separate top-level arrays** inside `governance`. Virtual keys and provider configs reference them by id (`rate_limit_id`) or are referenced back (`virtual_key_id` / `provider_config_id` on each `budgets[]` entry). Optional `calendar_aligned` on each `budget` matches the HTTP API and [calendar-aligned behavior](#calendar-aligned-budgets).
### Advanced Configuration Examples
#### Cost-Optimized Setup
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-cost-opt",
"name": "cost-optimized-vk",
"provider_configs": [
{"id": 10, "provider": "openai-gpt-3.5", "weight": 0.8, "rate_limit_id": "rl-cheap"},
{"id": 11, "provider": "openai-gpt-4", "weight": 0.2, "rate_limit_id": "rl-premium"}
]
}
],
"budgets": [
{"id": "b-cheap", "provider_config_id": 10, "max_limit": 50.00, "reset_duration": "1d"},
{"id": "b-premium", "provider_config_id": 11, "max_limit": 200.00, "reset_duration": "1d"}
],
"rate_limits": [
{"id": "rl-cheap", "request_max_limit": 1000, "request_reset_duration": "1h"},
{"id": "rl-premium", "request_max_limit": 100, "request_reset_duration": "1h"}
]
}
}
```
#### High-Volume Production Setup
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-prod-hv",
"name": "production-high-volume-vk",
"provider_configs": [
{"id": 20, "provider": "openai", "weight": 0.5, "rate_limit_id": "rl-openai"},
{"id": 21, "provider": "anthropic", "weight": 0.3, "rate_limit_id": "rl-anthropic"},
{"id": 22, "provider": "azure-openai", "weight": 0.2, "rate_limit_id": "rl-azure"}
]
}
],
"budgets": [
{"id": "b-openai", "provider_config_id": 20, "max_limit": 5000.00, "reset_duration": "1M"},
{"id": "b-anthropic", "provider_config_id": 21, "max_limit": 3000.00, "reset_duration": "1M"},
{"id": "b-azure", "provider_config_id": 22, "max_limit": 2000.00, "reset_duration": "1M"}
],
"rate_limits": [
{"id": "rl-openai", "token_max_limit": 10000000, "token_reset_duration": "1h", "request_max_limit": 10000, "request_reset_duration": "1h"},
{"id": "rl-anthropic", "token_max_limit": 6000000, "token_reset_duration": "1h", "request_max_limit": 6000, "request_reset_duration": "1h"},
{"id": "rl-azure", "token_max_limit": 4000000, "token_reset_duration": "1h", "request_max_limit": 4000, "request_reset_duration": "1h"}
]
}
}
```
**Validation Rules:**
- Budget limits must be positive numbers
- Reset durations must be valid time formats
- Rate limits must be positive integers
- Provider names must match configured providers
</Tab>
</Tabs>
## Provider-Level Governance Examples
### Example 1: Mixed Provider Budgets
A virtual key configured with multiple providers and different budget allocations:
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-mkt",
"name": "marketing-team-vk",
"provider_configs": [
{"id": 30, "provider": "openai", "weight": 0.7},
{"id": 31, "provider": "anthropic", "weight": 0.3}
]
}
],
"budgets": [
{"id": "b-vk-mkt", "virtual_key_id": "vk-mkt", "max_limit": 100, "reset_duration": "1M"},
{"id": "b-openai", "provider_config_id": 30, "max_limit": 50, "reset_duration": "1M"},
{"id": "b-anth", "provider_config_id": 31, "max_limit": 30, "reset_duration": "1M"}
]
}
}
```
**Behavior:**
- OpenAI requests limited to 50 dollars/month at provider level + 100 dollars/month at VK level
- Anthropic requests limited to 30 dollars/month at provider level + 100 dollars/month at VK level
- If any provider's budget is exhausted, all requests to that provider will be blocked
### Example 2: Provider-Specific Rate Limits
Different rate limits based on provider capabilities:
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-hv",
"name": "high-volume-vk",
"provider_configs": [
{"id": 40, "provider": "openai", "rate_limit_id": "rl-openai"},
{"id": 41, "provider": "anthropic", "rate_limit_id": "rl-anthropic"}
]
}
],
"rate_limits": [
{"id": "rl-openai", "request_max_limit": 1000, "request_reset_duration": "1h", "token_max_limit": 1000000, "token_reset_duration": "1h"},
{"id": "rl-anthropic", "request_max_limit": 500, "request_reset_duration": "1h", "token_max_limit": 500000, "token_reset_duration": "1h"}
]
}
}
```
**Behavior:**
- OpenAI: 1000 requests/hour, 1M tokens/hour
- Anthropic: 500 requests/hour, 500K tokens/hour
- If any provider's rate limits are exceeded, all requests to that provider will be blocked
### Example 3: Failover Strategy
Provider configurations with budget-based failover:
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-cost",
"name": "cost-optimized-vk",
"provider_configs": [
{"id": 50, "provider": "openai-cheap", "weight": 1.0},
{"id": 51, "provider": "openai-premium", "weight": 0.0, "rate_limit_id": "rl-premium"}
]
}
],
"budgets": [
{"id": "b-cheap", "provider_config_id": 50, "max_limit": 10, "reset_duration": "1d"},
{"id": "b-premium", "provider_config_id": 51, "max_limit": 50, "reset_duration": "1d"}
],
"rate_limits": [
{"id": "rl-premium", "request_max_limit": 100, "request_reset_duration": "1h", "token_max_limit": 50000, "token_reset_duration": "1h"}
]
}
}
```
**Behavior:**
- Primary: Use cheap provider until $10 daily budget exhausted
- Fallback: Automatically switch to premium provider when cheap option unavailable. To enable this, you should not send `provider` name in the request body, read [Routing](./routing#automatic-fallbacks) for more details.
- Cost containment: Prevent unexpected overspend on premium resources and limit the number of requests to the premium provider
## Key Benefits of Provider-Level Governance
- **Granular Control**: Set specific spending limits and rate limits per AI provider
- **Automatic Fallback**: Route to alternative providers when budgets or rate limits are exceeded
- **Cost Control**: Track and control spending by provider for better financial oversight
- **Performance Testing**: A/B testing across providers with controlled budgets
- **Multi-Provider Strategies**: Primary/backup provider configurations
- **Cost-Tiered Access**: Cheap providers for basic tasks, premium for complex workloads
---
## Next Steps
- **[Routing](./routing)** - Direct requests to specific AI models, providers, and keys using Virtual Keys.
- **[MCP Tool Filtering](./mcp-tools)** - Manage MCP clients/tools for virtual keys.
- **[Tracing](../observability/default)** - Audit trails and request tracking

View File

@@ -0,0 +1,160 @@
---
title: "MCP Tool Filtering"
description: "Control which MCP tools are available for each Virtual Key."
icon: "grid-2"
---
## Overview
MCP Tool Filtering allows you to control which tools are available to AI models on a per-request basis using Virtual Keys (VKs). By configuring a VirtualKey, you can create a strict allow-list of MCP clients and tools, ensuring that only approved tools can be executed.
Make sure you have at least one MCP client set up. Read more about it [here](../../mcp/overview).
## How It Works
The filtering logic is determined by the Virtual Key's configuration:
1. **No MCP Configuration on Virtual Key (Default)**
- If a Virtual Key has no specific MCP configurations, **no MCP tools are available** (deny-by-default).
- You must explicitly add MCP client configurations to allow tools.
2. **With MCP Configuration on Virtual Key**
- When you configure MCP clients on a Virtual Key, its settings take full precedence.
- Bifrost automatically generates an `x-bf-mcp-include-tools` header based on your VK configuration (unless `disable_auto_tool_inject` is enabled or the caller already sent the header). This acts as a strict allow-list for the request.
- If the caller already includes an `x-bf-mcp-include-tools` header, auto-injection is skipped — but the VK allow-list is enforced at inference time and still enforced again at MCP tool execution time.
For each MCP client associated with a Virtual Key, you can specify the allowed tools:
- **Select specific tools**: Only the chosen tools from that client will be available.
- **Use `*` wildcard**: All available tools from that client will be permitted.
- **Leave tool list empty**: All tools from that client will be **blocked**.
- **Do not configure a client**: All tools from that client will be **blocked** (if other clients are configured).
## Setting MCP Tool Restrictions
<Tabs group="mcp-tool-restrictions">
<Tab title="Web UI">
You can configure which tools a Virtual Key has access to via the UI.
1. Go to **Virtual Keys** page.
2. Create/Edit virtual key
![Virtual Key MCP Tool Restrictions](../../media/ui-virtual-key-mcp-filter.png)
3. In **MCP Client Configurations** section, add the MCP client you want to restrict the VK to
4. Select the specific tools to allow, or choose **Allow All Tools** to permit all current and future tools from that client (stored as `*`). Leaving the list empty blocks all tools for that client.
5. Click on the **Save** button
</Tab>
<Tab title="API">
You can configure this via the REST API when creating (`POST`) or updating (`PUT`) a virtual key.
**Create Virtual Key:**
```bash
curl -X POST http://localhost:8080/api/governance/virtual-keys \
-H "Content-Type: application/json" \
-d '{
"name": "vk-for-billing-support",
"mcp_configs": [
{
"mcp_client_name": "billing-client",
"tools_to_execute": ["check-status"]
},
{
"mcp_client_name": "support-client",
"tools_to_execute": ["*"]
}
]
}'
```
**Update Virtual Key:**
```bash
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
-H "Content-Type: application/json" \
-d '{
"mcp_configs": [
{
"mcp_client_name": "billing-client",
"tools_to_execute": ["check-status"]
},
{
"mcp_client_name": "support-client",
"tools_to_execute": ["*"]
}
]
}'
```
**Behavior:**
- The virtual key can only access the `check-status` tool from `billing-client`.
- It can access all tools from `support-client`.
- Any other MCP client is implicitly blocked for this key.
</Tab>
<Tab title="config.json">
You can also define MCP tool restrictions directly in your `config.json` file. The `mcp_configs` array under a virtual key should reference the MCP client by name.
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-billing-support-only",
"name": "VK for Billing and Support",
"mcp_configs": [
{
"mcp_client_name": "billing-client",
"tools_to_execute": ["check-status"]
},
{
"mcp_client_name": "support-client",
"tools_to_execute": ["*"]
}
]
}
]
}
}
```
</Tab>
</Tabs>
## Example Scenario
**Available MCP Clients & Tools:**
- **`billing-client`**: with tools `[create-invoice, check-status]`
- **`support-client`**: with tools `[create-ticket, get-faq]`
<Tabs>
<Tab title="VK with Full Access">
**Configuration:**
- `billing-client` -> Allowed Tools: `[*]` (wildcard)
- `support-client` -> Allowed Tools: `[*]` (wildcard)
**Result:**
A request with this Virtual Key can access all four tools: `create-invoice`, `check-status`, `create-ticket`, and `get-faq`.
</Tab>
<Tab title="VK with Partial Access">
**Configuration:**
- `billing-client` -> Allowed Tools: `[check-status]`
- `support-client` -> Not configured
**Result:**
A request with this Virtual Key can only access the `check-status` tool. All other tools are blocked.
</Tab>
<Tab title="VK with No Tools">
**Configuration:**
- `billing-client` -> Allowed Tools: `[]` (empty list)
**Result:**
A request with this Virtual Key cannot access any tools. All tools from all clients are blocked.
</Tab>
</Tabs>
<Note>
When a Virtual Key has MCP configurations, Bifrost enforces the allow-list at both inference time and MCP tool execution time. Auto-injection of the `x-bf-mcp-include-tools` header is skipped if the caller already provides it or if `disable_auto_tool_inject` is enabled — but the VK's restrictions are always applied regardless. You can still use the `x-bf-mcp-include-clients` header to filter MCP clients per request.
</Note>

View File

@@ -0,0 +1,166 @@
---
title: "Required Headers"
description: "Enforce mandatory headers on every request through governance."
icon: "shield-check"
---
## Overview
Required headers let you enforce that specific HTTP headers are present on every LLM and MCP request passing through Bifrost. If a request is missing any required header, the governance plugin rejects it with a **400 Bad Request** error before it reaches the provider.
This is useful for:
- **Tenant isolation** - Require `X-Tenant-ID` to identify the calling tenant
- **Audit trails** - Require `X-Correlation-ID` for request tracing across services
- **Custom routing metadata** - Require headers your infrastructure depends on
<Note>
Required headers validation requires **governance to be enabled**. The check runs in both `PreLLMHook` and `PreMCPHook`, so it applies to all inference and MCP tool execution requests.
</Note>
Header matching is **case-insensitive** — configuring `X-Tenant-ID` will match `x-tenant-id`, `X-TENANT-ID`, or any other casing.
---
## How it works
```mermaid
graph LR
A[Request] --> B{All required<br/>headers present?}
B -->|Yes| C[Continue to<br/>governance evaluation]
B -->|No| D[400 Bad Request<br/>missing_required_headers]
```
When a request arrives:
1. The HTTP transport middleware stores all request headers in the Bifrost context (lowercased keys)
2. The governance plugin's `PreLLMHook` / `PreMCPHook` checks for each required header
3. If any are missing, the request is rejected immediately with a `400` status and a JSON error listing the missing headers
**Example error response:**
```json
{
"error": {
"message": "missing required headers: x-tenant-id, x-correlation-id",
"type": "missing_required_headers"
}
}
```
---
## Configuration
<Tabs group="config-method">
<Tab title="Web UI">
1. Navigate to **Config** > **Security Settings**
2. Ensure **Governance** is enabled (the required headers section only appears when governance is active)
3. Scroll to **Required Headers**
![Required Headers Configuration](../../media/ui-required-headers-setting.png)
4. Enter a comma-separated list of header names (e.g., `X-Tenant-ID, X-Correlation-ID`)
5. Click **Save Changes**
Changes take effect immediately — no restart required.
</Tab>
<Tab title="API">
Include `required_headers` in the `client_config` when updating the configuration:
```bash
curl -X PUT http://localhost:8080/api/config \
-H "Content-Type: application/json" \
-d '{
"client_config": {
"required_headers": ["X-Tenant-ID", "X-Correlation-ID"]
}
}'
```
To clear required headers, pass an empty array:
```bash
curl -X PUT http://localhost:8080/api/config \
-H "Content-Type: application/json" \
-d '{
"client_config": {
"required_headers": []
}
}'
```
</Tab>
<Tab title="config.json">
Add `required_headers` to the `client` section:
```json
{
"client": {
"required_headers": ["X-Tenant-ID", "X-Correlation-ID"]
}
}
```
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `required_headers` | `string[]` | No | List of header names that must be present on every request. Case-insensitive. |
</Tab>
</Tabs>
---
## Examples
### Requiring a tenant header
Configure a single required header to enforce tenant identification:
```json
{
"client": {
"required_headers": ["X-Tenant-ID"]
}
}
```
**Valid request:**
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Tenant-ID: tenant-123" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
```
**Rejected request** (missing header):
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
# → 400: missing required headers: x-tenant-id
```
### Combining with virtual keys
Required headers work alongside virtual key enforcement. When both are configured, the governance plugin checks required headers first, then validates the virtual key:
```json
{
"client": {
"enforce_auth_on_inference": true,
"required_headers": ["X-Tenant-ID"]
}
}
```
A request must include **both** the virtual key header and `X-Tenant-ID` to pass governance.
---
## Next steps
- **[Virtual Keys](./virtual-keys)** - Set up access control with virtual keys
- **[Budget and Limits](./budget-and-limits)** - Configure budgets and rate limits
- **[Routing](./routing)** - Route requests based on headers and other criteria

View File

@@ -0,0 +1,337 @@
---
title: "Routing"
description: "Direct requests to specific AI models, providers, and keys using Virtual Keys."
icon: "arrow-progress"
---
<Info>
**Looking for comprehensive provider routing documentation?**
For a detailed guide covering governance-based routing, adaptive load balancing, Model Catalog, and how they interact, see the [**Provider Routing Guide**](/providers/provider-routing).
This page focuses specifically on configuring governance routing via Virtual Keys.
</Info>
## Overview
Bifrost's governance-based routing capabilities offer granular control over how requests are directed to different AI models and providers through Virtual Key configuration. By configuring routing rules on a Virtual Key, you can enforce which providers and models are accessible, implement weighted load balancing strategies, create automatic fallbacks, and restrict access to specific provider API keys.
This powerful feature enables key use cases like:
- **Resilience & Failover**: Automatically fall back to a secondary provider if the primary one fails.
- **Environment Separation**: Dedicate specific virtual keys to development, testing, and production environments with different provider and key access.
- **Cost Management**: Route traffic to cheaper models or providers based on weights to optimize costs.
- **Fine-grained Access Control**: Ensure that different teams or applications only use the models and API keys they are explicitly permitted to.
## Provider/Model Restrictions
Virtual Keys can be restricted to use only specific provider/models. When provider/model restrictions are configured, the VK can only access those designated provider/models, providing fine-grained control over which provider/models different users or applications can utilize.
**How It Works:**
- **No Provider Configs** (default): VK **blocks all providers** (deny-by-default). You must add provider configurations to allow traffic.
- **With Provider Configs**: VK limited to only the specified provider/models. Configured providers participate in weighted load balancing only if their `weight` is set to a numeric value, while providers with `weight: null` remain configured but are opted out of weighted selection.
**Model Validation:**
When you configure provider restrictions on a Virtual Key, Bifrost validates that the requested model is allowed for the selected provider:
- **`allowed_models: ["*"]`**: Allow all models supported by the provider (uses the Model Catalog for validation).
- **Empty `allowed_models`**: **Deny all** models (deny-by-default).
- **Explicit model list**: Only those specific models are permitted.
- **Model Catalog Sync**: On startup and provider updates, Bifrost calls each provider's list models API. If this fails, you'll see a warning: `{"level":"warn","message":"failed to list models for provider <name>: failed to execute HTTP request to provider API"}`
<Note>
**Cross-provider routing does NOT happen automatically**. For example, requests for `gpt-4o` will NOT be routed to Anthropic unless you explicitly add `"gpt-4o"` to Anthropic's `allowed_models` in the Virtual Key configuration. Each provider only handles models it actually supports (determined by the Model Catalog).
</Note>
## Weighted Load Balancing
When you configure multiple providers on a Virtual Key, Bifrost automatically implements weighted load balancing. Each provider can be assigned a weight, and requests are distributed proportionally. The `weight` field is optional — omitting it (or setting it to `null`) excludes the provider from weighted selection while still allowing it to be used for direct `provider/model` requests or as a fallback.
**Example Configuration:**
```
Virtual Key: vk-prod-main
├── OpenAI
│ ├── Allowed Models: [gpt-4o, gpt-4o-mini] ← Explicit whitelist
│ └── Weight: 0.2 (20% of traffic)
└── Azure
├── Allowed Models: [gpt-4o] ← Explicit whitelist
└── Weight: 0.8 (80% of traffic)
```
**Load Balancing Behavior:**
- For `gpt-4o`: 80% Azure, 20% OpenAI (both providers have it in allowed_models)
- For `gpt-4o-mini`: 100% OpenAI (only OpenAI has it in allowed_models)
- For `claude-3-sonnet`: ❌ Rejected (neither provider has it in allowed_models)
**Usage:**
To trigger weighted load balancing, send requests with just the model name:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
```
To bypass load balancing and target a specific provider:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
```
<Info>
Weights are automatically normalized to a sum 1.0 based on the weights of all providers available on the VK for the given model.
</Info>
**Example with Wildcard `allowed_models` (allow all via Model Catalog):**
```json
{
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["*"], // Allow all — uses Model Catalog for validation
"weight": 0.5
},
{
"provider": "anthropic",
"allowed_models": ["*"], // Allow all — uses Model Catalog for validation
"weight": 0.5
}
]
}
```
With this configuration:
- Request for `gpt-4o` → Routed to OpenAI (Model Catalog shows OpenAI supports this)
- Request for `claude-3-sonnet` → Routed to Anthropic (Model Catalog shows Anthropic supports this)
- Request for `gpt-4o` will NOT route to Anthropic (Model Catalog shows Anthropic doesn't support OpenAI models)
## Automatic Fallbacks
When multiple providers are configured on a Virtual Key, Bifrost automatically creates fallback chains for resilience. This feature provides automatic failover without manual intervention.
**How It Works:**
- **Only activated when**: Your request has no existing `fallbacks` array in the request body
- **Fallback creation**: Providers are sorted by weight (highest first) and added as fallbacks
- **Respects existing fallbacks**: If you manually specify fallbacks, they are preserved
**Example Request Flow:**
1. Primary request goes to weighted-selected provider (e.g., Azure with 80% weight)
2. If Azure fails, automatically retry with OpenAI
3. Continue until success or all providers exhausted
**Request with automatic fallbacks:**
```bash
# This request will get automatic fallbacks
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
```
**Request with manual fallbacks (no automatic fallbacks added):**
```bash
# This request keeps your specified fallbacks
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"fallbacks": ["anthropic/claude-3-sonnet-20240229"]
}'
```
## Setting Provider/Model Routing
<Tabs group="provider-model-restrictions">
<Tab title="Web UI">
1. Go to **Virtual Keys**
2. Create/Edit virtual key
![Virtual Key Provider/Model Restrictions](../../media/ui-virtual-key-routing.png)
3. In **Provider Configurations** section, add the provider you want to restrict the VK to
4. **Allowed Models**:
- **Specify models**: Enter specific models (e.g., `["gpt-4o", "gpt-4o-mini"]`) to explicitly whitelist only those models
- **`["*"]`**: Allow all models (uses the Model Catalog for validation).
- **Leave blank**: Deny all models (deny-by-default).
5. Optionally add a weight for this provider (numeric value for weighted load balancing, or leave blank to exclude from weighted routing while keeping the provider available for direct requests and fallbacks)
6. Click on the **Save** button
</Tab>
<Tab title="API">
```bash
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
-H "Content-Type: application/json" \
-d '{
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
"weight": 0.2
},
{
"provider": "azure",
"allowed_models": ["gpt-4o"],
"weight": 0.8
}
]
}'
```
</Tab>
<Tab title="config.json">
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-prod-main",
"provider_configs": [
{
"provider": "openai",
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
"weight": 0.2
},
{
"provider": "azure",
"allowed_models": ["gpt-4o"],
"weight": 0.8
}
]
}
]
}
}
```
</Tab>
</Tabs>
## API Key Restrictions
Virtual Keys can be restricted to use only specific provider API keys. When key restrictions are configured, the VK can only access those designated keys, providing fine-grained control over which API keys different users or applications can utilize.
**How It Works:**
- **No Restrictions** (`key_ids: ["*"]`): VK can use any available provider keys based on load balancing
- **With Restrictions**: VK limited to only the specified key IDs, regardless of other available keys
- **All Blocked** (`key_ids: []` or field omitted): VK cannot use any provider keys (deny-by-default)
**Example Scenario:**
```
Available Provider Keys:
├── key-prod-001 → sk-prod-key... (Production OpenAI key)
├── key-dev-002 → sk-dev-key... (Development OpenAI key)
└── key-test-003 → sk-test-key... (Testing OpenAI key)
Virtual Key Restrictions:
├── vk-prod-main
│ ├── Allowed Models: [gpt-4o]
│ └── Restricted Keys: [key-prod-001] ← ONLY production key
├── vk-dev-main
│ ├── Allowed Models: [gpt-4o-mini]
│ └── Restricted Keys: [key-dev-002, key-test-003] ← Dev + test keys
└── vk-unrestricted
├── Allowed Models: ["*"] ← All models via catalog
└── Restricted Keys: ["*"] ← Can use ANY available key
```
**Request Behavior:**
```bash
# Production VK - will ONLY use key-prod-001
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-prod-main" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
# Development VK - will load balance between key-dev-002 and key-test-003
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-dev-main" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}]}'
# VK with key_ids: ["*"] - can use any available OpenAI key
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: vk-unrestricted" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}]}'
```
**Setting API Key Restrictions:**
<Tabs group="api-key-restrictions">
<Tab title="Web UI">
1. Go to **Virtual Keys**
2. Create/Edit virtual key
![Virtual Key API Key Restrictions](../../media/ui-virtual-key-keys-filter.png)
3. In **Allowed Keys** section, select the API key you want to restrict the VK to
4. Click on the **Save** button
</Tab>
<Tab title="API">
```bash
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
-H "Content-Type: application/json" \
-d '{
"key_ids": ["key-prod-001"]
}'
```
</Tab>
<Tab title="config.json">
```json
{
"governance": {
"virtual_keys": [
{
"id": "vk-prod-main",
"provider_configs": [
{
"provider": "openai",
"key_ids": [
"key-prod-001"
]
}
]
}
]
}
}
```
</Tab>
</Tabs>
**Use Cases:**
- **Environment Separation** - Production VKs use production keys, dev VKs use dev keys
- **Cost Control** - Different teams use keys with different billing accounts
- **Access Control** - Restrict sensitive keys to specific VKs only
- **Compliance** - Ensure certain workloads only use compliant/audited keys
<Note>The models restrictions applied on the keys of individual providers will always be applied and will work together with the provider/model or api key restrictions set on the virtual key.</Note>
## Troubleshooting
### Model Catalog Sync Failures
If you see warnings like this in your Bifrost logs during startup or provider updates:
```json
{"level":"warn","time":"2026-01-13T14:18:53+05:30","message":"failed to list models for provider ollama: failed to execute HTTP request to provider API"}
```
**What this means:**
- Bifrost attempted to call the provider's list models API to populate the Model Catalog
- The request failed (network issue, provider unavailable, incorrect credentials, etc.)
- If your Virtual Key has `allowed_models: []` (empty) for this provider, **all models will be denied**. Use `["*"]` to allow all models.
**How to fix:**
1. Check that the provider is correctly configured and accessible
2. Verify network connectivity to the provider's API
3. Ensure API credentials are valid
4. Use `allowed_models: ["*"]` to allow all models, or specify an explicit list for critical providers

View File

@@ -0,0 +1,698 @@
---
title: "Virtual Keys"
description: "Virtual keys are a way to manage access to your AI models."
icon: "key"
---
## Overview
Virtual Keys are the primary governance entity in Bifrost. Users and applications authenticate using the given headers to access virtual keys and get specific access permissions, budgets, and rate limits.
**Allowed Headers:**
- `x-bf-vk` - Virtual key header, eg. `sk-bf-*`
- `Authorization` - Authorization header, eg. `Bearer sk-bf-*` (OpenAI style)
- `x-api-key` - API key header, eg. `sk-bf-*` (Anthropic style)
- `x-goog-api-key` - API key header, eg. `sk-bf-*` (Google Gemini style)
<Note>Old virtual keys(without `sk-bf-*` prefix) are only supported by `x-bf-vk` header.</Note>
<Info>You can also use `Authorization`, `x-api-key` and `x-goog-api-key` headers to pass direct keys to the provider. Read more about it in [Direct Key Bypass](../keys-management#direct-key-bypass).</Info>
**Key Features:**
- **Access Control** - Model and provider filtering
- **Cost Management** - Independent budgets (checked along with team/customer budgets if attached)
- **Rate Limiting** - Token and request-based throttling (VK-level only)
- **Key Restrictions** - Limit VK to specific provider API keys (if configured, VK can only use those keys)
- **Exclusive Attachment** - Belongs to either one team OR one customer OR neither (mutually exclusive)
- **Active/Inactive Status** - Enable/disable access instantly
## Configuration
<Tabs group="config-method">
<Tab title="Web UI">
1. Go to **Virtual Keys**
2. Click on **Add Virtual Key** button
![Virtual Key Creation](../../media/ui-virtual-key.png)
**Budget Settings:**
- **Max Limit**: Dollar amount (e.g., `10.50`)
- **Reset Duration**: `1m`, `1h`, `1d`, `1w`, `1M`, `1Y`
- **Calendar aligned** (optional): When enabled, the budget resets at calendar boundaries in UTC (day/week/month/year) instead of on a rolling window. Only applies to day/week/month/year periods. See [Budget and Limits](./budget-and-limits#calendar-aligned-budgets).
**Rate Limits:**
- **Token Limit**: Max tokens per period
- **Request Limit**: Max requests per period
- **Reset Duration**: Reset frequency for each limit
**Associations:**
- **Team**: Assign to existing team (mutually exclusive with customer)
- **Customer**: Assign to existing customer (mutually exclusive with team)
3. Click **Create Virtual Key**
</Tab>
<Tab title="API">
**Create Virtual Key (attached to team):**
```bash
curl -X POST http://localhost:8080/api/governance/virtual-keys \
-H "Content-Type: application/json" \
-d '{
"name": "Engineering Team API",
"description": "Main API key for engineering team",
"provider_configs": [
{
"provider": "openai",
"weight": 0.5,
"allowed_models": ["gpt-4o-mini"]
},
{
"provider": "anthropic",
"weight": 0.5,
"allowed_models": ["claude-3-sonnet-20240229"]
}
],
"team_id": "team-eng-001",
"budget": {
"max_limit": 100.00,
"reset_duration": "1M"
},
"rate_limit": {
"token_max_limit": 10000,
"token_reset_duration": "1h",
"request_max_limit": 100,
"request_reset_duration": "1m"
},
"key_ids": ["8c52039e-38c6-48b2-8016-0bd884b7befb"],
"is_active": true
}'
```
**Create Virtual Key (directly attached to customer):**
```bash
curl -X POST http://localhost:8080/api/governance/virtual-keys \
-H "Content-Type: application/json" \
-d '{
"name": "Executive API Key",
"description": "Direct customer-level API access",
"provider_configs": [
{
"provider": "openai",
"weight": 0.5,
"allowed_models": ["gpt-4o"]
},
{
"provider": "anthropic",
"weight": 0.5,
"allowed_models": ["claude-3-opus-20240229"]
}
],
"customer_id": "customer-acme-corp",
"budget": {
"max_limit": 500.00,
"reset_duration": "1M"
},
"is_active": true
}'
```
> **Note**:
> - `team_id` and `customer_id` are mutually exclusive - a VK can only belong to one team OR one customer, not both.
> - `key_ids` restricts the VK to only use those specific provider API keys. Use `["*"]` to allow access to all available keys. An empty array `[]` or omitting the field entirely denies all keys.
**Update Virtual Key:**
```bash
curl -X PUT http://localhost:8080/api/governance/virtual-keys/{vk_id} \
-H "Content-Type: application/json" \
-d '{
"description": "Updated description",
"budget": {
"max_limit": 150.00,
"reset_duration": "1M"
}
}'
```
**Get Virtual Keys:**
```bash
# List all virtual keys
curl http://localhost:8080/api/governance/virtual-keys
# Get specific virtual key
curl http://localhost:8080/api/governance/virtual-keys/{vk_id}
```
**Delete Virtual Key:**
```bash
curl -X DELETE http://localhost:8080/api/governance/virtual-keys/{vk_id}
```
</Tab>
<Tab title="config.json">
```json
{
"client": {
"enforce_auth_on_inference": true
},
"governance": {
"virtual_keys": [
{
"id": "vk-001",
"name": "Engineering Team API",
"value": "sk-bf-*",
"description": "Main API key for engineering team",
"is_active": true,
"provider_configs": [
{
"provider": "openai",
"weight": 0.5,
"allowed_models": ["gpt-4o-mini"],
"key_ids": ["openai-primary"]
},
{
"provider": "anthropic",
"weight": 0.5,
"allowed_models": ["claude-3-sonnet-20240229"]
}
],
"team_id": "team-eng-001",
"rate_limit_id": "rate-limit-eng-vk"
},
{
"id": "vk-002",
"name": "Executive API Key",
"value": "vk-executive-direct",
"description": "Direct customer-level API access",
"is_active": true,
"provider_configs": [
{
"provider": "openai",
"weight": 0.5,
"allowed_models": ["gpt-4o"]
},
{
"provider": "anthropic",
"weight": 0.5,
"allowed_models": ["claude-3-opus-20240229"]
}
],
"customer_id": "customer-acme-corp"
}
],
"budgets": [
{
"id": "budget-eng-vk",
"virtual_key_id": "vk-001",
"max_limit": 100.00,
"reset_duration": "1M",
"current_usage": 0.0,
"last_reset": "2025-01-01T00:00:00Z"
},
{
"id": "budget-exec-vk",
"virtual_key_id": "vk-002",
"max_limit": 500.00,
"reset_duration": "1M",
"current_usage": 0.0,
"last_reset": "2025-01-01T00:00:00Z"
}
],
"rate_limits": [
{
"id": "rate-limit-eng-vk",
"token_max_limit": 10000,
"token_reset_duration": "1h",
"token_current_usage": 0,
"token_last_reset": "2025-01-01T00:00:00Z",
"request_max_limit": 100,
"request_reset_duration": "1m",
"request_current_usage": 0,
"request_last_reset": "2025-01-01T00:00:00Z"
}
]
}
}
```
</Tab>
</Tabs>
## User Groups
### Teams
Teams provide organizational grouping for virtual keys with department-level budget management. Teams can belong to one customer and have their own independent budget allocation.
**Key Features:**
- **Organizational Structure** - Group multiple virtual keys
- **Independent Budgets** - Department-level cost control (separate from customer budgets)
- **Customer Association** - Can belong to one customer (optional)
- **No Rate Limits** - Teams cannot have rate limits (VK-level only)
**Configuration**
<Tabs group="config-method">
<Tab title="Web UI">
1. Go to **Users & Groups** → **Teams**
2. Click on **Add Team** button
![Team Creation](../../media/ui-create-teams.png)
Fill the form and click on **Create Team** button
3. **Assign Virtual Keys to Team**
- Go to **Virtual Keys** page
- Edit the virtual key and assign it to the team
- Click on **Save** button
</Tab>
<Tab title="API">
**Create Team:**
```bash
curl -X POST http://localhost:8080/api/governance/teams \
-H "Content-Type: application/json" \
-d '{
"name": "Engineering Team",
"customer_id": "customer-acme-corp",
"budget": {
"max_limit": 500.00,
"reset_duration": "1M"
}
}'
```
**Update Team:**
```bash
curl -X PUT http://localhost:8080/api/governance/teams/{team_id} \
-H "Content-Type: application/json" \
-d '{
"name": "Updated Engineering Team",
"budget": {
"max_limit": 750.00,
"reset_duration": "1M"
}
}'
```
**Get Teams:**
```bash
# List all teams
curl http://localhost:8080/api/governance/teams
# Get specific team
curl http://localhost:8080/api/governance/teams/{team_id}
```
**Delete Team:**
```bash
curl -X DELETE http://localhost:8080/api/governance/teams/{team_id}
```
</Tab>
<Tab title="config.json">
```json
{
"governance": {
"teams": [
{
"id": "team-eng-001",
"name": "Engineering Team",
"customer_id": "customer-acme-corp",
"budget_id": "budget-team-eng"
},
{
"id": "team-sales-001",
"name": "Sales Team",
"customer_id": "customer-acme-corp",
"budget_id": "budget-team-sales"
}
],
"budgets": [
{
"id": "budget-team-eng",
"max_limit": 500.00,
"reset_duration": "1M",
"current_usage": 0.0,
"last_reset": "2025-01-01T00:00:00Z"
},
{
"id": "budget-team-sales",
"max_limit": 250.00,
"reset_duration": "1M",
"current_usage": 0.0,
"last_reset": "2025-01-01T00:00:00Z"
}
]
}
}
```
</Tab>
</Tabs>
### Customers
Customers represent the highest level in the governance hierarchy, typically corresponding to organizations or major business units. They provide top-level budget control and organizational structure.
**Key Features:**
- **Top-Level Organization** - Highest hierarchy level
- **Independent Budgets** - Organization-wide cost control (separate from team/VK budgets)
- **Team Management** - Contains multiple teams and direct VKs
- **No Rate Limits** - Customers cannot have rate limits (VK-level only)
**Configuration**
<Tabs group="config-method">
<Tab title="Web UI">
1. Go to **Users & Groups** → **Customers**
2. Click on **Add Customer** button
![Customer Creation](../../media/ui-create-customer.png)
Fill the form and click on **Create Customer** button
3. **Assign Teams to Customer**
- Go to **Teams** page
- Edit the team and assign it to the customer
- Click on **Save** button
4. **Assign Virtual Keys to Customer**
- Go to **Virtual Keys** page
- Edit the virtual key and assign it to the customer
- Click on **Save** button
</Tab>
<Tab title="API">
**Create Customer:**
```bash
curl -X POST http://localhost:8080/api/governance/customers \
-H "Content-Type: application/json" \
-d '{
"name": "Acme Corporation",
"budget": {
"max_limit": 2000.00,
"reset_duration": "1M"
}
}'
```
**Update Customer:**
```bash
curl -X PUT http://localhost:8080/api/governance/customers/{customer_id} \
-H "Content-Type: application/json" \
-d '{
"name": "Acme Corp (Updated)",
"budget": {
"max_limit": 2500.00,
"reset_duration": "1M"
}
}'
```
**Get Customers:**
```bash
# List all customers
curl http://localhost:8080/api/governance/customers
# Get specific customer
curl http://localhost:8080/api/governance/customers/{customer_id}
```
**Delete Customer:**
```bash
curl -X DELETE http://localhost:8080/api/governance/customers/{customer_id}
```
</Tab>
<Tab title="config.json">
```json
{
"governance": {
"customers": [
{
"id": "customer-acme-corp",
"name": "Acme Corporation",
"budget_id": "budget-customer-acme"
},
{
"id": "customer-beta-inc",
"name": "Beta Inc",
"budget_id": "budget-customer-beta"
}
],
"budgets": [
{
"id": "budget-customer-acme",
"max_limit": 2000.00,
"reset_duration": "1M",
"current_usage": 0.0,
"last_reset": "2025-01-01T00:00:00Z"
},
{
"id": "budget-customer-beta",
"max_limit": 1500.00,
"reset_duration": "1M",
"current_usage": 0.0,
"last_reset": "2025-01-01T00:00:00Z"
}
]
}
}
```
</Tab>
</Tabs>
## Features
- **[Budget and Limits](./budget-and-limits)** - Enterprise-grade budget management and cost control and rate limiting using virtual keys
- **[Routing](./routing)** - Route requests to the appropriate providers/models and restrict api keys using virtual keys
- **[MCP Tool Filtering](./mcp-tools)** - Manage MCP clients/tools for virtual keys
## Usage
### Making Virtual Keys Mandatory
All governance-enabled requests must include the virtual key header:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-bf-vk: sk-bf-*" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
By default governance is optional, meaning that if the virtual key header is not present, the request will be allowed but without any governance checks/routing. But you can make it mandatory by enforcing the virtual key header.
<Tabs group="enforce-governance-header">
<Tab title="Web UI">
1. Go to **Config** → **Security**
2. Check the **Enforce Virtual Keys** checkbox
![Enforce Virtual Keys](../../media/ui-enforce-virtual-keys.png)
</Tab>
<Tab title="API">
```bash
curl -X PUT http://localhost:8080/api/config \
-H "Content-Type: application/json" \
-d '{
"client_config": {
"enforce_auth_on_inference": true
}
}'
```
</Tab>
<Tab title="config.json">
```json
{
"client": {
"enforce_auth_on_inference": true
}
}
```
</Tab>
</Tabs>
When the governance header is enforced, the request will be rejected if the `x-bf-vk` header is not present.
### Authentication and Virtual Keys
Virtual keys and HTTP authentication are **independent layers** that can work together:
| Layer | Purpose | Headers |
|-------|---------|---------|
| **Authentication** | Validates user identity | `Authorization: Basic/Bearer <credentials>` |
| **Virtual Keys** | Request routing and governance | `x-bf-vk`, `Authorization`[^1], `x-api-key`, `x-goog-api-key` |
[^1]: Authorization can carry virtual keys only when auth is disabled (`disable_auth_on_inference: true`). When auth is enabled, Authorization is consumed by authentication and cannot be used for virtual keys.
**When `disable_auth_on_inference: true` (auth disabled):**
Virtual keys can be passed via any supported header without additional authentication:
```bash
# Using x-bf-vk header
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-vk: <VIRTUAL_KEY>" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
# Using Authorization header (OpenAI style)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer <VIRTUAL_KEY>" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
```
**When `disable_auth_on_inference: false` (auth enabled):**
You must provide both authentication credentials AND the virtual key. Use `x-bf-vk` for the virtual key since the `Authorization` header is used for authentication:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Basic <base64-credentials>" \
-H "x-bf-vk: <VIRTUAL_KEY>" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
```
**Configuring `disable_auth_on_inference`:**
<Tabs group="config-method">
<Tab title="Web UI">
1. Go to **Config** → **Security**
2. Toggle **Disable Auth on Inference** to enable/disable
![Disable Auth on Inference](../../media/ui-disable-auth-on-inference.png)
</Tab>
<Tab title="API">
```bash
curl -X PUT http://localhost:8080/api/config \
-H "Content-Type: application/json" \
-d '{
"auth_config": {
"disable_auth_on_inference": true
}
}'
```
</Tab>
<Tab title="config.json">
```json
{
"auth_config": {
"is_enabled": true,
"disable_auth_on_inference": true
}
}
```
</Tab>
</Tabs>
### Error Responses
- Virtual Key Not Found (400)
```json
{
"error": {
"type": "virtual_key_required",
"message": "virtual key is missing in headers"
}
}
```
- Virtual Key Blocked (403)
```json
{
"error": {
"type": "virtual_key_blocked",
"message": "Virtual key is inactive"
}
}
```
- Rate Limit Exceeded (429)
```json
{
"error": {
"type": "rate_limited",
"message": "Rate limits exceeded: [token limit exceeded (1500/1000, resets every 1h)]"
}
}
```
- Token Limit Exceeded (429)
```json
{
"error": {
"type": "token_limited",
"message": "Rate limits exceeded: [token limit exceeded (1500/1000, resets every 1h)]"
}
}
```
- Request Limit Exceeded (429)
```json
{
"error": {
"type": "request_limited",
"message": "Rate limits exceeded: [request limit exceeded (101/100, resets every 1m)]"
}
}
```
- Budget Exceeded (402)
```json
{
"error": {
"type": "budget_exceeded",
"message": "Budget exceeded: VK budget exceeded: 105.50 > 100.00 dollars"
}
}
```
- Model Not Allowed (403)
```json
{
"error": {
"type": "model_blocked",
"message": "Model 'gpt-4o' is not allowed for this virtual key"
}
}
```
- Provider Not Allowed (403)
```json
{
"error": {
"type": "provider_blocked",
"message": "Provider 'anthropic' is not allowed for this virtual key"
}
}
```