1498 lines
49 KiB
Plaintext
1498 lines
49 KiB
Plaintext
---
|
||
title: "Provider Routing"
|
||
description: "Understand how Bifrost routes requests across AI providers using governance rules and adaptive load balancing."
|
||
icon: "route"
|
||
---
|
||
|
||
## Overview
|
||
|
||
Bifrost offers two powerful methods for routing requests across AI providers, each serving different use cases:
|
||
|
||
1. **Governance-based Routing**: Explicit, user-defined routing rules configured via Virtual Keys
|
||
2. **Adaptive Load Balancing**: Automatic, performance-based routing powered by real-time metrics (Enterprise feature)
|
||
|
||
When both methods are available, **governance takes precedence** because users have explicitly defined their routing preferences through provider configurations on Virtual Keys.
|
||
|
||
<Info>
|
||
**When to use which method:**
|
||
- Use **Governance** when you need explicit control, compliance requirements, or specific cost optimization strategies
|
||
- Use **Adaptive Load Balancing** for automatic performance optimization and minimal configuration overhead
|
||
</Info>
|
||
|
||
---
|
||
|
||
## The Model Catalog
|
||
|
||
The Model Catalog is Bifrost's central registry that tracks which models are available from which providers. It powers both governance-based routing and adaptive load balancing by maintaining an up-to-date mapping of models to providers.
|
||
|
||
<Info>
|
||
**Architecture Documentation**: For detailed technical documentation on the Model Catalog implementation, including API reference, thread safety, and advanced usage patterns, see [Model Catalog Architecture](/architecture/framework/model-catalog).
|
||
</Info>
|
||
|
||
### Data Sources
|
||
|
||
The Model Catalog combines two data sources to maintain a comprehensive and up-to-date model registry:
|
||
|
||
1. **Pricing Data** (Primary source)
|
||
- Downloaded from a remote URL (configurable, defaults to `https://getbifrost.ai/datasheet`)
|
||
- Contains model names, pricing tiers, and provider mappings
|
||
- Synced to database on startup and refreshed periodically (default: every 24 hours)
|
||
- Used for cost calculation and initial model-to-provider mapping
|
||
- **Stored as**: In-memory map `pricingData[model|provider|mode]` for O(1) lookups
|
||
|
||
2. **Provider List Models API** (Secondary source)
|
||
- Calls each provider's `/v1/models` endpoint during startup
|
||
- Enriches the catalog with provider-specific models and aliases
|
||
- Re-fetched when providers are added/updated via API or dashboard
|
||
- Adds models that may not be in pricing data yet (e.g., newly released models)
|
||
- **Stored as**: In-memory map `modelPool[provider][]models`
|
||
|
||
<Info>
|
||
**Why two sources?** Pricing data provides comprehensive model coverage with cost information, while the List Models API ensures you can use newly released models immediately without waiting for pricing data updates.
|
||
</Info>
|
||
|
||
### How Model Availability is Determined
|
||
|
||
Bifrost uses a sophisticated multi-step process to determine if a model is available for a provider:
|
||
|
||
<AccordionGroup>
|
||
<Accordion title="GetModelsForProvider(provider)">
|
||
**Purpose**: Find all models available for a specific provider
|
||
|
||
**Lookup Process**:
|
||
1. Check `modelPool[provider]` for direct matches
|
||
2. Return all models in that provider's slice
|
||
|
||
**Example**:
|
||
```go
|
||
models := GetModelsForProvider("openai")
|
||
// Returns: ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo", ...]
|
||
```
|
||
|
||
**Used by**:
|
||
- Routing Methods to validate `allowed_models`
|
||
- Dashboard model selector dropdowns
|
||
- API responses for `/v1/models?provider=openai`
|
||
</Accordion>
|
||
|
||
<Accordion title="GetProvidersForModel(model)">
|
||
**Purpose**: Find all providers that support a specific model
|
||
|
||
**Lookup Process**:
|
||
1. **Direct lookup**: Check each provider's model list in `modelPool`
|
||
2. **Cross-provider resolution**: Apply special handling for proxy providers
|
||
|
||
**Special Cross-Provider Rules**:
|
||
|
||
<Steps>
|
||
<Step title="OpenRouter Format">
|
||
If model is not found directly, check if `provider/model` exists in OpenRouter
|
||
```go
|
||
// Request: claude-3-5-sonnet
|
||
// Checks: openrouter models for "anthropic/claude-3-5-sonnet"
|
||
// Result: Adds "openrouter" to providers list
|
||
```
|
||
</Step>
|
||
|
||
<Step title="Vertex Format">
|
||
If model is not found directly, check if `provider/model` exists in Vertex
|
||
```go
|
||
// Request: claude-3-5-sonnet
|
||
// Checks: vertex models for "anthropic/claude-3-5-sonnet"
|
||
// Result: Adds "vertex" to providers list
|
||
```
|
||
</Step>
|
||
|
||
<Step title="Groq OpenAI Compatibility">
|
||
For GPT models, check if `openai/model` exists in Groq
|
||
```go
|
||
// Request: gpt-3.5-turbo
|
||
// Checks: groq models for "openai/gpt-3.5-turbo"
|
||
// Result: Adds "groq" to providers list
|
||
```
|
||
</Step>
|
||
|
||
<Step title="Bedrock Claude Models">
|
||
For Claude models, check Bedrock with flexible matching
|
||
```go
|
||
// Request: claude-3-5-sonnet
|
||
// Checks: bedrock models containing "claude-3-5-sonnet"
|
||
// Matches: "anthropic.claude-3-5-sonnet-20240620-v1:0"
|
||
// Result: Adds "bedrock" to providers list
|
||
```
|
||
</Step>
|
||
</Steps>
|
||
|
||
**Example**:
|
||
```go
|
||
providers := GetProvidersForModel("claude-3-5-sonnet")
|
||
// Returns: ["anthropic", "vertex", "bedrock", "openrouter"]
|
||
// Even though the request was just "claude-3-5-sonnet"!
|
||
```
|
||
|
||
**Used by**:
|
||
- Load balancing to find candidate providers
|
||
- Fallback generation
|
||
- Model validation in requests
|
||
</Accordion>
|
||
|
||
<Accordion title="Pricing Lookup with Fallbacks">
|
||
**Purpose**: Get pricing data for cost calculation and model validation
|
||
|
||
**Lookup Key**: `model|provider|mode` (e.g., `gpt-4o|openai|chat`)
|
||
|
||
**Fallback Chain**:
|
||
1. **Primary lookup**: `model|provider|requestType`
|
||
2. **Gemini → Vertex**: If Gemini not found, try Vertex with same model
|
||
3. **Vertex format stripping**: For `provider/model`, strip prefix and retry
|
||
4. **Bedrock prefix handling**: For Claude models, try with `anthropic.` prefix
|
||
5. **Responses → Chat**: If Responses mode not found, try Chat mode
|
||
|
||
**Example Flow**:
|
||
```go
|
||
// Request: claude-3-5-sonnet on Gemini (Responses API)
|
||
|
||
// 1. Try: claude-3-5-sonnet|gemini|responses → Not found
|
||
// 2. Try: claude-3-5-sonnet|vertex|responses → Not found
|
||
// 3. Try: claude-3-5-sonnet|vertex|chat → ✅ Found!
|
||
|
||
// Pricing returned from vertex/chat mode
|
||
```
|
||
|
||
**Used by**:
|
||
- Cost calculation for billing
|
||
- Model validation during routing
|
||
- Budget enforcement
|
||
</Accordion>
|
||
</AccordionGroup>
|
||
|
||
### Syncing Behavior
|
||
|
||
<AccordionGroup>
|
||
<Accordion title="Initial Sync (Startup)">
|
||
When Bifrost starts, it performs a complete model catalog initialization:
|
||
|
||
**Step-by-step process** (from `server.go:Bootstrap()`):
|
||
|
||
<Steps>
|
||
<Step title="Load Pricing Data">
|
||
```go
|
||
// 1. Download from URL
|
||
pricingData := loadPricingFromURL(ctx)
|
||
|
||
// 2. Store in database (if configStore available)
|
||
configStore.CreateModelPrices(ctx, pricingData)
|
||
|
||
// 3. Load into memory cache
|
||
mc.pricingData = map[string]TableModelPricing{...}
|
||
```
|
||
</Step>
|
||
|
||
<Step title="Populate Initial Model Pool">
|
||
```go
|
||
// Build modelPool from pricing data
|
||
mc.populateModelPoolFromPricingData()
|
||
// Result: modelPool[provider] = [models from pricing]
|
||
```
|
||
</Step>
|
||
|
||
<Step title="Fetch Dynamic Models">
|
||
```go
|
||
// Call ListAllModels for all configured providers
|
||
modelData, err := client.ListAllModels(ctx, nil)
|
||
|
||
// Add results to model pool
|
||
mc.AddModelDataToPool(modelData)
|
||
// Result: modelPool enriched with provider-specific models
|
||
```
|
||
</Step>
|
||
|
||
<Step title="Handle Failures Gracefully">
|
||
If list models API fails for a provider:
|
||
```json
|
||
{"level":"warn","message":"failed to list models for provider ollama: connection refused"}
|
||
```
|
||
- Logged as warning, **does not stop startup**
|
||
- Provider remains usable with models from pricing data
|
||
- Can be manually refreshed later via API
|
||
</Step>
|
||
</Steps>
|
||
|
||
**Result**: Bifrost is ready with a comprehensive model catalog combining both sources.
|
||
</Accordion>
|
||
|
||
<Accordion title="Ongoing Sync (Background)">
|
||
While Bifrost is running, the catalog stays up-to-date through background workers:
|
||
|
||
**Pricing Data Sync**:
|
||
- Background worker runs every **1 hour** (ticker interval)
|
||
- Checks if **24 hours** have elapsed since last sync (configurable)
|
||
- If yes, downloads fresh pricing data and updates database + memory cache
|
||
- Timer resets after successful sync
|
||
|
||
**List Models API Sync**:
|
||
Triggered by these events:
|
||
1. **Provider Added**: When a new provider is configured
|
||
```bash
|
||
POST /api/v1/providers
|
||
# Automatically calls ListModels for the new provider
|
||
```
|
||
|
||
2. **Provider Updated**: When provider config changes (keys, endpoints, etc.)
|
||
```bash
|
||
PUT /api/v1/providers/{provider}
|
||
# Refetches models to detect changes
|
||
```
|
||
|
||
3. **Manual Refresh**: Via API endpoint
|
||
```bash
|
||
POST /api/v1/providers/{provider}/models/refetch
|
||
# Explicitly refetches models for a provider
|
||
```
|
||
|
||
4. **Manual Delete + Refetch**: Clear and reload models
|
||
```bash
|
||
DELETE /api/v1/providers/{provider}/models
|
||
POST /api/v1/providers/{provider}/models/refetch
|
||
# Useful when models are out of sync
|
||
```
|
||
|
||
**Failure Handling**:
|
||
- Pricing URL fails but database has data → Use cached database records
|
||
- Pricing URL fails and no database data → Error logged, existing memory cache retained
|
||
- List models API fails → Log warning, retain existing model pool entries
|
||
</Accordion>
|
||
|
||
<Accordion title="Fallback Strategy">
|
||
Bifrost's multi-layered approach ensures high availability:
|
||
|
||
**Layer 1: Pricing Data Persistence**
|
||
```
|
||
URL fails → Database → Memory cache → Continue operation
|
||
```
|
||
|
||
**Layer 2: Model Pool Redundancy**
|
||
```
|
||
ListModels fails → Pricing data models → Continue with reduced catalog
|
||
```
|
||
|
||
**Layer 3: Runtime Validation**
|
||
```
|
||
Model not in catalog → Special cross-provider rules → May still work
|
||
```
|
||
|
||
**Example Scenario**:
|
||
```
|
||
Situation:
|
||
- Pricing URL is down
|
||
- OpenAI ListModels API is down
|
||
- User requests gpt-4o on OpenAI
|
||
|
||
Bifrost's Response:
|
||
1. ✅ Pricing data available from database (last sync 12h ago)
|
||
2. ✅ Model pool has gpt-4o from previous ListModels call
|
||
3. ✅ Request proceeds normally
|
||
4. 📊 Cost calculated from cached pricing data
|
||
```
|
||
|
||
This design ensures **requests never fail due to sync issues** as long as one data source is available.
|
||
</Accordion>
|
||
</AccordionGroup>
|
||
|
||
### Allowed Models Behavior with Examples
|
||
|
||
The `allowed_models` field in provider configs controls which models can be used with that provider. Understanding its behavior is crucial for governance routing.
|
||
|
||
<Tabs>
|
||
<Tab title="Wildcard allowed_models (Use Catalog)">
|
||
|
||
**Configuration**:
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openai",
|
||
"allowed_models": ["*"],
|
||
"weight": 1.0
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Behavior**:
|
||
- Bifrost calls `GetModelsForProvider("openai")`
|
||
- Returns all models in `modelPool["openai"]`
|
||
- Request validated against catalog
|
||
|
||
**Examples**:
|
||
```bash
|
||
# ✅ Allowed (in catalog)
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'
|
||
|
||
# ✅ Allowed (in catalog)
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-3.5-turbo"}'
|
||
|
||
# ❌ Rejected (not in OpenAI catalog)
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'
|
||
```
|
||
|
||
**Use Cases**:
|
||
- Default behavior for most deployments
|
||
- Automatically stays up-to-date with provider's model offerings
|
||
- No manual model list maintenance required
|
||
|
||
<Warning>
|
||
Using `"allowed_models": []` (empty array) means **deny all models** — no requests will be served. Use `["*"]` to allow all models via the catalog.
|
||
</Warning>
|
||
|
||
</Tab>
|
||
|
||
<Tab title="Explicit allowed_models (Strict Control)">
|
||
|
||
**Configuration**:
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openai",
|
||
"allowed_models": ["gpt-4o", "gpt-4o-mini"], // Only these two
|
||
"weight": 1.0
|
||
},
|
||
{
|
||
"provider": "anthropic",
|
||
"allowed_models": ["claude-3-5-sonnet-20241022"], // Specific version
|
||
"weight": 1.0
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Behavior**:
|
||
- Bifrost validates request model against explicit list
|
||
- Catalog is **ignored** for this provider
|
||
- Supports both direct matches and provider-prefixed entries
|
||
- Case-sensitive matching
|
||
|
||
**Examples**:
|
||
```bash
|
||
# ✅ Allowed (in explicit list)
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'
|
||
|
||
# ❌ Rejected (not in explicit list)
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4-turbo"}'
|
||
# Even though gpt-4-turbo is in the OpenAI catalog!
|
||
|
||
# ✅ Allowed (exact match for Anthropic)
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20241022"}'
|
||
|
||
# ❌ Rejected (version mismatch)
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20240620"}'
|
||
```
|
||
|
||
**Provider-Prefixed Entries**:
|
||
|
||
You can also use provider-prefixed model names in `allowed_models`. Bifrost will strip the prefix and match against the requested model:
|
||
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openrouter",
|
||
"allowed_models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"],
|
||
"weight": 1.0
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**How it works**:
|
||
```bash
|
||
# Request without prefix
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'
|
||
|
||
# 1. Checks: "openai/gpt-4o" in allowed_models
|
||
# 2. Strips prefix: "openai/gpt-4o" → "gpt-4o"
|
||
# 3. Compares: "gpt-4o" == "gpt-4o" ✅
|
||
# 4. Result: Allowed and routed to OpenRouter
|
||
```
|
||
|
||
This is particularly useful for proxy providers (OpenRouter, Vertex) where you want to explicitly control which upstream models are accessible.
|
||
|
||
**Use Cases**:
|
||
- Compliance requirements (only approved models)
|
||
- Cost control (restrict to cheaper models)
|
||
- Version pinning (prevent automatic updates)
|
||
- Testing specific model versions
|
||
- **Explicit cross-provider routing** (e.g., only allow OpenAI models via OpenRouter)
|
||
|
||
</Tab>
|
||
|
||
<Tab title="Aliases (Key-Level)">
|
||
|
||
**Key Concept**: Aliases are **key-level** mappings that allow user-friendly model names to map to provider-specific identifiers.
|
||
|
||
**How Aliases Work**:
|
||
- Defined at the **Key level**, not Virtual Key level
|
||
- Structure: `aliases: {"user-facing-name": "provider-specific-id"}`
|
||
- **Alias key** (left side): User-facing model name used in requests
|
||
- **Provider ID** (right side): Provider-specific identifier sent to the API
|
||
|
||
**Azure OpenAI Example**:
|
||
|
||
Provider configuration with alias mapping:
|
||
```json
|
||
{
|
||
"providers": {
|
||
"azure": {
|
||
"keys": [
|
||
{
|
||
"name": "azure-prod-key",
|
||
"value": "your-api-key",
|
||
"aliases": {
|
||
"gpt-4o": "my-prod-gpt4o-deployment",
|
||
"gpt-4o-mini": "my-mini-deployment"
|
||
},
|
||
"azure_key_config": {
|
||
"endpoint": "https://your-resource.openai.azure.com"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**What Happens**:
|
||
1. **Allowed models derived from aliases**: `["gpt-4o", "gpt-4o-mini"]`
|
||
2. **User requests with alias**: `{"model": "gpt-4o"}`
|
||
3. **Bifrost validates**: `gpt-4o` is in derived allowed models ✅
|
||
4. **Bifrost resolves alias**: `gpt-4o` → `my-prod-gpt4o-deployment`
|
||
5. **Sent to Azure**: Uses `my-prod-gpt4o-deployment` as the deployment name
|
||
6. **Pricing lookup**: If pricing for resolved ID not found, falls back to alias `gpt-4o`
|
||
|
||
**Bedrock Example with Inference Profiles**:
|
||
|
||
```json
|
||
{
|
||
"providers": {
|
||
"bedrock": {
|
||
"keys": [
|
||
{
|
||
"name": "bedrock-key",
|
||
"aliases": {
|
||
"claude-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
|
||
"claude-opus": "us.anthropic.claude-3-opus-20240229-v1:0"
|
||
},
|
||
"bedrock_key_config": {
|
||
"access_key": "your-access-key",
|
||
"secret_key": "your-secret-key",
|
||
"region": "us-east-1"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**What Happens**:
|
||
1. **Allowed models**: `["claude-sonnet", "claude-opus"]` (from alias keys)
|
||
2. **User requests**: `{"model": "claude-sonnet"}`
|
||
3. **Bifrost validates**: `claude-sonnet` in allowed models ✅
|
||
4. **Resolves alias**: `claude-sonnet` → `us.anthropic.claude-3-5-sonnet-20241022-v2:0`
|
||
5. **Sent to Bedrock**: Full ARN used in API call
|
||
|
||
**Priority of Model Restrictions**:
|
||
|
||
When determining allowed models for a key:
|
||
```
|
||
1. If key.models is NOT empty → Use key.models
|
||
2. Else if aliases exist → Use alias keys
|
||
3. Else → All models allowed (use Model Catalog)
|
||
```
|
||
|
||
**Example with Both**:
|
||
```json
|
||
{
|
||
"keys": [
|
||
{
|
||
"models": ["gpt-4o", "gpt-3.5-turbo"], // Explicit restriction
|
||
"aliases": {
|
||
"gpt-4o": "my-deployment",
|
||
"gpt-4-turbo": "another-deployment" // NOT accessible!
|
||
},
|
||
"azure_key_config": {
|
||
"endpoint": "https://your-resource.openai.azure.com"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
Result: Only `["gpt-4o", "gpt-3.5-turbo"]` allowed (models field takes priority)
|
||
|
||
**Vertex Example** (similar pattern):
|
||
```json
|
||
{
|
||
"keys": [
|
||
{
|
||
"aliases": {
|
||
"claude-3-5-sonnet": "anthropic/claude-3-5-sonnet@20241022",
|
||
"gemini-pro": "google/gemini-1.5-pro"
|
||
},
|
||
"vertex_key_config": {
|
||
"project_id": "my-project",
|
||
"region": "us-central1"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Use Cases for Aliases**:
|
||
- **Azure**: Map generic model names to specific deployment names in your Azure resource
|
||
- **Bedrock**: Use short aliases for long inference profile ARNs
|
||
- **Vertex**: Map to specific model versions or regional endpoints
|
||
- **Multi-environment**: Different aliases per key (dev/staging/prod)
|
||
|
||
**Key Insight**:
|
||
```
|
||
User Request: {"model": "gpt-4o"}
|
||
↓
|
||
Validation: Check if "gpt-4o" in allowed models (derived from aliases)
|
||
↓
|
||
Mapping: aliases["gpt-4o"] → "my-prod-gpt4o-deployment"
|
||
↓
|
||
API Call: Uses "my-prod-gpt4o-deployment" as deployment ID
|
||
↓
|
||
Pricing: Falls back to "gpt-4o" if resolved ID not in pricing data
|
||
```
|
||
|
||
This allows user-friendly model names in requests while supporting provider-specific identifier patterns at the key level.
|
||
|
||
</Tab>
|
||
|
||
<Tab title="Cross-Provider Model Routing">
|
||
|
||
**Configuration**:
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openai",
|
||
"allowed_models": ["gpt-4o"],
|
||
"weight": 0.5
|
||
},
|
||
{
|
||
"provider": "azure",
|
||
"allowed_models": ["gpt-4o"],
|
||
"weight": 0.5
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Request**:
|
||
```bash
|
||
curl -H "x-bf-vk: vk-123" \
|
||
-d '{"model": "gpt-4o"}'
|
||
```
|
||
|
||
**Routing Behavior**:
|
||
1. **Model validation**: Both providers have `gpt-4o` in allowed_models ✅
|
||
2. **Weighted selection**: 50% chance each
|
||
3. **Provider selected**: Let's say Azure
|
||
4. **Model transformation**: `gpt-4o` → `azure/gpt-4o`
|
||
5. **Fallbacks**: `["openai/gpt-4o"]` (remaining providers)
|
||
|
||
**Special Cross-Provider Scenarios**:
|
||
|
||
<Steps>
|
||
<Step title="OpenRouter as Universal Proxy">
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openrouter",
|
||
"allowed_models": ["*"]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
Request `claude-3-5-sonnet`:
|
||
- Bifrost checks: `GetModelsForProvider("openrouter")`
|
||
- Finds: `anthropic/claude-3-5-sonnet` in OpenRouter catalog
|
||
- ✅ Allowed, routes to OpenRouter
|
||
</Step>
|
||
|
||
<Step title="Weighted Routing via Proxy Provider">
|
||
**Use Case**: Route 99% of OpenAI traffic through OpenRouter for cost savings, keep 1% direct for fallback
|
||
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openai",
|
||
"allowed_models": ["gpt-4o"],
|
||
"weight": 0.01 // 1% direct to OpenAI
|
||
},
|
||
{
|
||
"provider": "openrouter",
|
||
"allowed_models": ["openai/gpt-4o"], // Provider-prefixed
|
||
"weight": 0.99 // 99% via OpenRouter
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
Request `gpt-4o`:
|
||
- **OpenAI check**: `"gpt-4o"` in `["gpt-4o"]` → ✅ Allowed
|
||
- **OpenRouter check**: Strips prefix from `"openai/gpt-4o"` → matches `"gpt-4o"` → ✅ Allowed
|
||
- **Weighted selection**: 99% chance → OpenRouter selected
|
||
- **Final model**: `openrouter/gpt-4o`
|
||
- **Fallbacks**: `["openai/gpt-4o"]` (1% provider as fallback)
|
||
|
||
**Why this works**: Bifrost now supports provider-prefixed entries in `allowed_models`, so `"openai/gpt-4o"` matches requests for `"gpt-4o"`.
|
||
</Step>
|
||
|
||
<Step title="Vertex as Multi-Provider Gateway">
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "vertex",
|
||
"allowed_models": ["claude-3-5-sonnet", "gemini-1.5-pro"]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
Request `claude-3-5-sonnet`:
|
||
- Model catalog lookup: `GetProvidersForModel("claude-3-5-sonnet")`
|
||
- Finds: `["anthropic", "vertex", "bedrock"]`
|
||
- Validation: `claude-3-5-sonnet` in allowed_models ✅
|
||
- Sends to Vertex as: `anthropic/claude-3-5-sonnet`
|
||
</Step>
|
||
|
||
<Step title="Groq OpenAI Compatibility">
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "groq",
|
||
"allowed_models": ["gpt-3.5-turbo"]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
Request `gpt-3.5-turbo`:
|
||
- Special handling: Checks Groq catalog for `openai/gpt-3.5-turbo`
|
||
- ✅ Found, validation passes
|
||
- Sends to Groq as: `openai/gpt-3.5-turbo`
|
||
</Step>
|
||
</Steps>
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### How It's Used in Routing
|
||
|
||
<Tabs>
|
||
<Tab title="Governance Routing">
|
||
|
||
When a Virtual Key has `provider_configs`, governance uses the model catalog for validation:
|
||
|
||
**Wildcard allowed_models Example**:
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openai",
|
||
"allowed_models": ["*"],
|
||
"weight": 0.5
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Request Flow**:
|
||
```bash
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'
|
||
|
||
# 1. Governance checks: Is "gpt-4o" in GetModelsForProvider("openai")?
|
||
# 2. Catalog lookup: modelPool["openai"] contains "gpt-4o" ✅
|
||
# 3. Validation passes, provider selected
|
||
# 4. Model becomes: "openai/gpt-4o"
|
||
```
|
||
|
||
**Rejection Example**:
|
||
```bash
|
||
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'
|
||
|
||
# 1. Governance checks: Is "claude-3-5-sonnet" in GetModelsForProvider("openai")?
|
||
# 2. Catalog lookup: modelPool["openai"] does NOT contain "claude-3-5-sonnet" ❌
|
||
# 3. Validation fails, request rejected
|
||
# 4. Error: "model not allowed for any configured provider"
|
||
```
|
||
|
||
</Tab>
|
||
|
||
<Tab title="Load Balancing">
|
||
|
||
When load balancing selects providers, it queries the catalog to find candidates:
|
||
|
||
**Request Flow**:
|
||
```bash
|
||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||
-d '{"model": "gpt-4o", "messages": [...]}'
|
||
|
||
# 1. Load balancer: GetProvidersForModel("gpt-4o")
|
||
# 2. Catalog returns: ["openai", "azure", "groq"]
|
||
# 3. Filter by configured providers: ["openai", "azure"] (groq not configured)
|
||
# 4. Performance scoring: openai=0.95, azure=0.87
|
||
# 5. Select: openai (highest score)
|
||
# 6. Model becomes: "openai/gpt-4o"
|
||
# 7. Fallbacks: ["azure/gpt-4o"]
|
||
```
|
||
|
||
**Cross-Provider Discovery**:
|
||
```bash
|
||
curl -d '{"model": "claude-3-5-sonnet"}'
|
||
|
||
# 1. Load balancer: GetProvidersForModel("claude-3-5-sonnet")
|
||
# 2. Catalog checks:
|
||
# - Direct: ["anthropic"] ✅
|
||
# - OpenRouter: Has "anthropic/claude-3-5-sonnet" ✅
|
||
# - Vertex: Has "anthropic/claude-3-5-sonnet" ✅
|
||
# - Bedrock: Has "anthropic.claude-3-5-sonnet-..." ✅
|
||
# 3. Catalog returns: ["anthropic", "openrouter", "vertex", "bedrock"]
|
||
# 4. Performance scoring across all four
|
||
# 5. Best performer selected
|
||
```
|
||
|
||
This is how Bifrost achieves **intelligent cross-provider routing** without manual configuration.
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
<Note>
|
||
**Model Catalog is essential for cross-provider routing**. Without it, Bifrost wouldn't know that `gpt-4o` is available from OpenAI, Azure, and Groq, or that `claude-3-5-sonnet` can be routed through Anthropic, Vertex, Bedrock, and OpenRouter. This knowledge powers both governance validation and load balancing provider discovery.
|
||
</Note>
|
||
|
||
---
|
||
|
||
## Governance-based Routing
|
||
|
||
Governance-based routing allows you to explicitly define which providers and models should handle requests for a specific Virtual Key. This method provides precise control over routing decisions.
|
||
|
||
### How It Works
|
||
|
||
When a Virtual Key has `provider_configs` defined:
|
||
|
||
1. **Request arrives** with a Virtual Key (e.g., `x-bf-vk: vk-prod-main`)
|
||
2. **Model validation**: Bifrost checks if the requested model is allowed for any configured provider
|
||
3. **Provider filtering**: Providers are filtered based on:
|
||
- Model availability in `allowed_models`
|
||
- Budget limits (current usage vs max limit)
|
||
- Rate limits (tokens/requests per time window)
|
||
4. **Weighted selection**: A provider is selected using weighted random distribution
|
||
5. **Provider prefix added**: Model string becomes `provider/model` (e.g., `openai/gpt-4o`)
|
||
6. **Fallbacks created**: Remaining providers sorted by weight (descending) are added as fallbacks
|
||
|
||
### Configuration Example
|
||
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{
|
||
"provider": "openai",
|
||
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
|
||
"weight": 0.3,
|
||
"budget": {
|
||
"max_limit": 100.0,
|
||
"current_usage": 45.0
|
||
}
|
||
},
|
||
{
|
||
"provider": "azure",
|
||
"allowed_models": ["gpt-4o"],
|
||
"weight": 0.7,
|
||
"rate_limit": {
|
||
"token_max_limit": 100000,
|
||
"token_reset_duration": "1m"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### Request Flow
|
||
|
||
<Steps>
|
||
<Step title="Request with Virtual Key">
|
||
```bash
|
||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||
-H "x-bf-vk: vk-prod-main" \
|
||
-d '{"model": "gpt-4o", "messages": [...]}'
|
||
```
|
||
</Step>
|
||
<Step title="Governance Evaluation">
|
||
- OpenAI: ✅ Has `gpt-4o` in allowed_models, budget OK, weight 0.3
|
||
- Azure: ✅ Has `gpt-4o` in allowed_models, rate limit OK, weight 0.7
|
||
</Step>
|
||
<Step title="Weighted Selection">
|
||
- 70% chance → Azure
|
||
- 30% chance → OpenAI
|
||
</Step>
|
||
<Step title="Request Transformation">
|
||
```json
|
||
{
|
||
"model": "azure/gpt-4o",
|
||
"messages": [...],
|
||
"fallbacks": ["openai/gpt-4o"]
|
||
}
|
||
```
|
||
</Step>
|
||
</Steps>
|
||
|
||
### Key Features
|
||
|
||
| Feature | Description |
|
||
|---------|-------------|
|
||
| **Explicit Control** | Define exactly which providers and models are accessible |
|
||
| **Budget Enforcement** | Automatically exclude providers exceeding budget limits |
|
||
| **Rate Limit Protection** | Skip providers that have hit rate limits |
|
||
| **Weighted Distribution** | Control traffic distribution with custom weights |
|
||
| **Automatic Fallbacks** | Failed providers automatically retry with next highest weight |
|
||
|
||
### Best Practices
|
||
|
||
<AccordionGroup>
|
||
<Accordion title="Cost Optimization">
|
||
Assign higher weights to cheaper providers for cost-sensitive workloads:
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{"provider": "groq", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.7},
|
||
{"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.3}
|
||
]
|
||
}
|
||
```
|
||
</Accordion>
|
||
|
||
<Accordion title="Environment Separation">
|
||
Create different Virtual Keys for dev/staging/prod with different provider access:
|
||
```json
|
||
{
|
||
"virtual_keys": [
|
||
{
|
||
"id": "vk-dev",
|
||
"provider_configs": [{"provider": "ollama", "allowed_models": ["*"], "key_ids": ["*"]}]
|
||
},
|
||
{
|
||
"id": "vk-prod",
|
||
"provider_configs": [
|
||
{"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"]},
|
||
{"provider": "azure", "allowed_models": ["*"], "key_ids": ["*"]}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
</Accordion>
|
||
|
||
<Accordion title="Compliance & Data Residency">
|
||
Restrict specific Virtual Keys to compliant providers:
|
||
```json
|
||
{
|
||
"provider_configs": [
|
||
{"provider": "azure", "allowed_models": ["gpt-4o"]},
|
||
{"provider": "bedrock", "allowed_models": ["claude-3-sonnet-20240229"]}
|
||
]
|
||
}
|
||
```
|
||
</Accordion>
|
||
</AccordionGroup>
|
||
|
||
<Note>
|
||
**`allowed_models: ["*"]`**: Allows all models supported by the provider, validated via the Model Catalog (populated from pricing data and the provider's list models API). See the [Model Catalog section](#the-model-catalog) above for how syncing works. For configuration instructions, see [Governance Routing](/features/governance/routing).
|
||
|
||
**`allowed_models: []` (empty array)**: Denies **all** models — no requests will be served for this provider config. This is deny-by-default behavior introduced in v1.5.0.
|
||
|
||
**Empty `provider_configs`**: When `provider_configs` is empty (no providers configured), **all providers are blocked** (deny-by-default). You must explicitly add provider configurations to allow traffic through a Virtual Key.
|
||
</Note>
|
||
|
||
---
|
||
|
||
## Adaptive Load Balancing
|
||
|
||
<Info>
|
||
**Enterprise Feature**: Adaptive Load Balancing is available in Bifrost Enterprise. [Contact us](https://www.getmaxim.ai/bifrost/enterprise) to enable it.
|
||
</Info>
|
||
|
||
Adaptive Load Balancing automatically optimizes routing based on real-time performance metrics. It operates at **two levels** to provide both macro-level provider selection and micro-level key optimization.
|
||
|
||
### Two-Level Architecture
|
||
|
||
<Card title="Why Two Levels?" icon="layer-group">
|
||
Separating provider selection (direction) from key selection (route) enables:
|
||
- **Provider-level optimization**: Choose the best provider for a model based on aggregate performance
|
||
- **Key-level optimization**: Within that provider, choose the best API key based on individual key performance
|
||
- **Resilience**: Even when provider is specified (by governance or user), key-level load balancing still optimizes which API key to use
|
||
</Card>
|
||
|
||
```mermaid
|
||
flowchart TB
|
||
Request["Request: gpt-4o"]
|
||
|
||
subgraph Level1["Level 1: Direction (Provider Selection)"]
|
||
Cat["Model Catalog Lookup"]
|
||
Providers["Candidate Providers:<br/>openai, azure, groq"]
|
||
Filter["Filter by allowed_models<br/>and key availability"]
|
||
Score["Score by performance:<br/>error rate, latency, utilization"]
|
||
Select["Select: openai"]
|
||
end
|
||
|
||
subgraph Level2["Level 2: Route (Key Selection)"]
|
||
Keys["Available OpenAI Keys:<br/>key-1, key-2, key-3"]
|
||
KeyScore["Score each key:<br/>error rate, latency, TPM hits"]
|
||
KeySelect["Select: key-2<br/>(best performing)"]
|
||
end
|
||
|
||
Request --> Cat --> Providers --> Filter --> Score --> Select
|
||
Select --> Keys --> KeyScore --> KeySelect --> Response["Execute with<br/>openai/gpt-4o + key-2"]
|
||
```
|
||
|
||
### Level 1: Direction (Provider Selection)
|
||
|
||
**When it runs**: Only when the model string has **no** provider prefix (e.g., `gpt-4o`)
|
||
|
||
**How it works**:
|
||
|
||
1. **Model catalog lookup**: Find all configured providers that support the requested model
|
||
2. **Provider filtering**: Filter based on:
|
||
- Allowed models from keys configuration
|
||
- Keys availability for the provider
|
||
3. **Performance scoring**: Calculate scores for each provider based on:
|
||
- Error rates (50% weight)
|
||
- Latency (20% weight, using MV-TACOS algorithm)
|
||
- Utilization (5% weight)
|
||
- Momentum bias (recovery acceleration)
|
||
4. **Smart selection**: Choose provider using weighted random with jitter and exploration
|
||
5. **Fallbacks created**: Remaining providers sorted by performance score (descending) are added as fallbacks
|
||
|
||
### Level 2: Route (Key Selection)
|
||
|
||
**When it runs**: **Always**, even when provider is already specified (by governance, user, or Level 1)
|
||
|
||
**How it works**:
|
||
|
||
1. **Get available keys**: Fetch all keys for the selected provider
|
||
2. **Filter by configuration**: Apply model restrictions from key configuration
|
||
3. **Performance scoring**: Calculate score for each key based on:
|
||
- Error rates (recent failures)
|
||
- Latency (response time)
|
||
- TPM hits (rate limit violations)
|
||
- Current state (Healthy, Degraded, Failed, Recovering)
|
||
4. **Weighted random selection**: Choose key with exploration (25% chance to probe recovering keys)
|
||
5. **Circuit breaker**: Skip keys with zero weight (TPM hits, repeated failures)
|
||
|
||
### Scoring Algorithm
|
||
|
||
The load balancer computes a performance score for each provider-model combination:
|
||
|
||
$$
|
||
Score = (P_{error} \times 0.5) + (P_{latency} \times 0.2) + (P_{util} \times 0.05) - M_{momentum}
|
||
$$
|
||
|
||
<Tip>
|
||
Lower penalties = Higher weights = More traffic. The system self-heals by quickly penalizing failing routes but enabling fast recovery once issues are resolved.
|
||
</Tip>
|
||
|
||
### Request Flow
|
||
|
||
<Steps>
|
||
<Step title="Request without Provider Prefix">
|
||
```bash
|
||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||
-d '{"model": "gpt-4o", "messages": [...]}'
|
||
```
|
||
</Step>
|
||
<Step title="Model Catalog Lookup">
|
||
Providers supporting `gpt-4o`: [openai, azure, groq]
|
||
</Step>
|
||
<Step title="Performance Evaluation">
|
||
- OpenAI: Score 0.92 (low latency, 99% success rate)
|
||
- Azure: Score 0.85 (medium latency, 98% success rate)
|
||
- Groq: Score 0.65 (high latency recently)
|
||
</Step>
|
||
<Step title="Provider Selection">
|
||
OpenAI selected (highest score within jitter band)
|
||
</Step>
|
||
<Step title="Request Transformation">
|
||
```json
|
||
{
|
||
"model": "openai/gpt-4o",
|
||
"messages": [...],
|
||
"fallbacks": ["azure/gpt-4o", "groq/gpt-4o"]
|
||
}
|
||
```
|
||
</Step>
|
||
</Steps>
|
||
|
||
### Key Features
|
||
|
||
| Feature | Description |
|
||
|---------|-------------|
|
||
| **Automatic Optimization** | No manual weight tuning required |
|
||
| **Real-time Adaptation** | Weights recomputed every 5 seconds based on live metrics |
|
||
| **Circuit Breakers** | Failing routes automatically removed from rotation |
|
||
| **Fast Recovery** | 90% penalty reduction in 30 seconds after issues resolve |
|
||
| **Health States** | Routes transition between Healthy, Degraded, Failed, and Recovering |
|
||
| **Smart Exploration** | 25% chance to probe potentially recovered routes |
|
||
|
||
|
||
### Dashboard Visibility
|
||
|
||
Monitor load balancing performance in real-time:
|
||
|
||
<Frame>
|
||
<img src="/media/ui-load-balancing.png" alt="Adaptive Load Balancing Dashboard" />
|
||
</Frame>
|
||
|
||
The dashboard shows:
|
||
- Weight distribution across provider-model-key routes
|
||
- Performance metrics (error rates, latency, success rates)
|
||
- State transitions (Healthy → Degraded → Failed → Recovering)
|
||
- Actual vs expected traffic distribution
|
||
|
||
---
|
||
|
||
## How Governance and Load Balancing Interact
|
||
|
||
When both methods are available in your Bifrost deployment, they work together in a complementary way across two levels.
|
||
|
||
<Warning>
|
||
**Key Insight**: Load balancing has **two levels**:
|
||
- **Level 1 (Direction/Provider)**: Skipped when provider is already specified
|
||
- **Level 2 (Route/Key)**: **Always runs**, even when provider is specified
|
||
|
||
This means key-level optimization works regardless of how the provider was chosen!
|
||
</Warning>
|
||
|
||
### Execution Flow
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start["Request: gpt-4o"]
|
||
|
||
subgraph Governance["Governance Plugin (HTTPTransportIntercept)"]
|
||
HasVK{"Has VK with<br/>provider_configs?"}
|
||
GovRoute["Provider Selection:<br/>Weighted random"]
|
||
AddPrefix["Add prefix:<br/>azure/gpt-4o"]
|
||
end
|
||
|
||
subgraph LB1["Load Balancer Level 1 (Middleware)"]
|
||
PrefixCheck{"Has provider<br/>prefix?"}
|
||
LBProvider["Provider Selection:<br/>Performance-based"]
|
||
AddLBPrefix["Add prefix:<br/>openai/gpt-4o"]
|
||
end
|
||
|
||
subgraph LB2["Load Balancer Level 2 (Key Selector)"]
|
||
GetKeys["Get available keys<br/>for selected provider"]
|
||
ScoreKeys["Score keys by<br/>performance metrics"]
|
||
SelectKey["Select best key"]
|
||
end
|
||
|
||
Start --> HasVK
|
||
HasVK -->|Yes| GovRoute --> AddPrefix
|
||
HasVK -->|No| PrefixCheck
|
||
AddPrefix --> PrefixCheck
|
||
PrefixCheck -->|Yes, skip Level 1| GetKeys
|
||
PrefixCheck -->|No| LBProvider --> AddLBPrefix --> GetKeys
|
||
GetKeys --> ScoreKeys --> SelectKey --> Execute["Execute request<br/>with selected provider + key"]
|
||
```
|
||
|
||
### Execution Order
|
||
|
||
1. **HTTPTransportIntercept** (Governance Plugin - Provider Level)
|
||
- Runs first in the request pipeline
|
||
- Checks if Virtual Key has `provider_configs`
|
||
- If yes: adds provider prefix (e.g., `azure/gpt-4o`)
|
||
- **Result**: Provider is selected by governance rules
|
||
|
||
2. **Middleware** (Load Balancing Plugin - Provider Level / Direction)
|
||
- Runs after HTTPTransportIntercept
|
||
- Checks if model string contains "/"
|
||
- If yes: **skips provider selection** (already determined by governance or user)
|
||
- If no: performs performance-based provider selection
|
||
- **Result**: Provider prefix added if not already present
|
||
|
||
3. **KeySelector** (Load Balancing - Key Level / Route)
|
||
- **Always runs** during request execution in Bifrost core
|
||
- Gets all keys for the selected provider
|
||
- Filters keys based on model restrictions
|
||
- Scores each key by performance metrics
|
||
- Selects best key using weighted random + exploration
|
||
- **Result**: Optimal key selected within the provider
|
||
|
||
<Info>
|
||
**Important**: Even when governance specifies `azure/gpt-4o`, load balancing **still optimizes which Azure key to use** based on performance metrics. This is the power of the two-level architecture!
|
||
</Info>
|
||
|
||
### Example Scenarios
|
||
|
||
<Tabs>
|
||
<Tab title="Governance Only">
|
||
|
||
**Setup:**
|
||
- Virtual Key has `provider_configs` defined
|
||
- No adaptive load balancing enabled
|
||
|
||
**Request:**
|
||
```bash
|
||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||
-H "x-bf-vk: vk-prod-main" \
|
||
-d '{"model": "gpt-4o", "messages": [...]}'
|
||
```
|
||
|
||
**Behavior:**
|
||
1. **Governance** applies weighted provider routing → selects Azure (70% weight)
|
||
2. Model becomes `azure/gpt-4o`
|
||
3. **Standard key selection** (non-adaptive) chooses an Azure key based on static weights
|
||
4. Request forwarded to Azure with selected key
|
||
|
||
</Tab>
|
||
|
||
<Tab title="Load Balancing Only">
|
||
|
||
**Setup:**
|
||
- **No Virtual Key** (do not send `x-bf-vk`) → this is the **Load Balancing–only** setup
|
||
- **Virtual Key with empty / missing `provider_configs`** → **blocks all providers** (deny-by-default) and therefore is **NOT** an LB-only setup
|
||
- Adaptive load balancing enabled
|
||
|
||
**Request:**
|
||
```bash
|
||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||
-d '{"model": "gpt-4o", "messages": [...]}'
|
||
```
|
||
|
||
**Behavior:**
|
||
1. **Load Balancing Level 1** applies performance-based provider routing → selects OpenAI (best performing)
|
||
2. Model becomes `openai/gpt-4o`
|
||
3. **Load Balancing Level 2** selects best OpenAI key based on performance metrics (error rate, latency, TPM status)
|
||
4. Request forwarded to OpenAI with optimal key
|
||
|
||
</Tab>
|
||
|
||
<Tab title="Both Available (Governance + Load Balancing)">
|
||
|
||
**Setup:**
|
||
- Virtual Key has `provider_configs` defined
|
||
- Adaptive load balancing enabled
|
||
- Azure has 3 keys: `azure-key-1`, `azure-key-2`, `azure-key-3`
|
||
|
||
**Request:**
|
||
```bash
|
||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||
-H "x-bf-vk: vk-prod-main" \
|
||
-d '{"model": "gpt-4o", "messages": [...]}'
|
||
```
|
||
|
||
**Behavior:**
|
||
1. **Governance** applies first (respects explicit user config) → selects Azure provider
|
||
2. Model becomes `azure/gpt-4o`
|
||
3. **Load Balancing Level 1** sees "/" and **skips provider selection** (already decided)
|
||
4. **Load Balancing Level 2** still runs! Selects best Azure key based on performance:
|
||
- `azure-key-1`: 99% success rate, 150ms avg latency → score 0.95
|
||
- `azure-key-2`: 85% success rate, 200ms avg latency → score 0.60 (degraded)
|
||
- `azure-key-3`: Hit TPM limit → score 0.0 (circuit broken)
|
||
- **Selects `azure-key-1`** (highest score)
|
||
5. Request forwarded to Azure with `azure-key-1`
|
||
|
||
**Why?** Governance controls provider selection (explicit user intent), but load balancing still optimizes key selection (automatic performance optimization).
|
||
|
||
</Tab>
|
||
|
||
<Tab title="Manual Provider Selection">
|
||
|
||
**Setup:**
|
||
- Both governance and load balancing enabled
|
||
- OpenAI has 2 keys available
|
||
|
||
**Request:**
|
||
```bash
|
||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||
-d '{"model": "openai/gpt-4o", "messages": [...]}'
|
||
```
|
||
|
||
**Behavior:**
|
||
1. **Governance** sees "/" and skips
|
||
2. **Load Balancing Level 1** sees "/" and **skips provider selection**
|
||
3. **Load Balancing Level 2** still runs! Selects best OpenAI key based on current metrics
|
||
4. Request forwarded to OpenAI with optimal key
|
||
|
||
**Why?** User explicitly specified the provider, but key-level optimization still provides value by selecting the best-performing OpenAI key.
|
||
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
### Provider vs Key Selection Rules
|
||
|
||
| Scenario | Provider Selection | Key Selection |
|
||
|----------|-------------------|---------------|
|
||
| VK with provider_configs | **Governance** (weighted random) | **Standard** or **Adaptive** (if enabled) |
|
||
| VK without provider_configs + LB | **Blocked** (empty = no providers allowed) | N/A |
|
||
| No VK + LB | **Load Balancing Level 1** (performance) | **Load Balancing Level 2** (performance) |
|
||
| Model with provider prefix + LB | **Skip** (already specified) | **Load Balancing Level 2** (performance) ✅ |
|
||
| No Load Balancing enabled | **Governance** or **User** or **Model Catalog** | **Standard** (static weights) |
|
||
|
||
<Note>
|
||
**Critical Insight**:
|
||
- **Provider selection** respects the hierarchy: Governance → Load Balancing Level 1 → User specification
|
||
- **Key selection** runs independently and benefits from load balancing **even when provider is predetermined**
|
||
|
||
This separation is what makes the two-level architecture so powerful!
|
||
</Note>
|
||
|
||
---
|
||
|
||
## Routing Rules (Dynamic Expression-Based Routing)
|
||
|
||
<Info>
|
||
**Position in routing pipeline**: Routing Rules execute **before governance provider selection** and can override it. They are evaluated before adaptive load balancing, enabling dynamic provider/model overrides based on runtime conditions like headers, parameters, capacity metrics, and organizational hierarchy.
|
||
</Info>
|
||
|
||
### Overview
|
||
|
||
Routing Rules provide sophisticated, expression-based control over request routing using CEL expressions. Unlike governance routing (static weights), routing rules evaluate conditions dynamically at request time.
|
||
|
||
### When Routing Rules Execute
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
Start["Request: model + provider"]
|
||
|
||
subgraph Rules["1. Routing Rules Layer (Evaluated First)"]
|
||
RuleMatch{"CEL Expression<br/>Matches?"}
|
||
RuleDecision["Override:<br/>New provider/model/fallbacks"]
|
||
NoMatch["No match:<br/>Continue to Governance"]
|
||
end
|
||
|
||
subgraph Gov["2. Governance Layer (if no routing rule matched)"]
|
||
VKValidation["Virtual Key Validation"]
|
||
GovRouting["Provider Governance Routing<br/>(weighted random)"]
|
||
end
|
||
|
||
subgraph LB["3. Load Balancing Layer"]
|
||
LB1["Level 1: Provider Selection"]
|
||
LB2["Level 2: Key Selection"]
|
||
end
|
||
|
||
Start --> RuleMatch
|
||
RuleMatch -->|Yes| RuleDecision --> LB1
|
||
RuleMatch -->|No| NoMatch --> VKValidation --> GovRouting --> LB1
|
||
LB1 --> LB2 --> Execute["Execute with<br/>selected provider + key"]
|
||
```
|
||
|
||
### How It Works
|
||
|
||
1. **Routing rules evaluate first** in scope precedence order (VirtualKey → Team → Customer → Global)
|
||
2. **If a routing rule matches**: provider/model/fallbacks are overridden, governance provider_configs are skipped
|
||
3. **If no routing rule matches**: governance provider selection runs (weighted random)
|
||
4. **Load balancing Level 1**: skipped if provider already determined (has "/" prefix)
|
||
5. **Load balancing Level 2** (key selection): always runs to select the best key within the determined provider
|
||
|
||
### Available CEL Variables
|
||
|
||
Routing rules access request context through CEL variables:
|
||
|
||
```cel
|
||
// Request context
|
||
model // Requested model
|
||
provider // Current provider
|
||
|
||
// Headers and parameters (case-insensitive)
|
||
headers["x-tier"] // Request header
|
||
params["region"] // Query parameter
|
||
|
||
// Organization context
|
||
virtual_key_id // VirtualKey ID
|
||
team_name // Team name
|
||
customer_id // Customer ID
|
||
|
||
// Capacity metrics (0-100 percentage)
|
||
budget_used // Budget usage %
|
||
tokens_used // Token rate limit usage %
|
||
request // Request rate limit usage %
|
||
```
|
||
|
||
### Examples
|
||
|
||
#### Route based on user tier
|
||
```cel
|
||
headers["x-tier"] == "premium" // → openai/gpt-4o
|
||
```
|
||
|
||
#### Route to fallback when budget high
|
||
```cel
|
||
budget_used > 85 // → groq/llama-2 (cheaper)
|
||
```
|
||
|
||
#### Route by team
|
||
```cel
|
||
team_name == "ml-research" // → anthropic/claude-3-opus
|
||
```
|
||
|
||
#### Complex multi-condition routing
|
||
```cel
|
||
headers["x-environment"] == "production" &&
|
||
tokens_used < 75 &&
|
||
team_name == "ai-platform" // → openai/gpt-4o
|
||
```
|
||
|
||
### Scope Hierarchy
|
||
|
||
Rules are evaluated in organizational precedence order (first-match-wins):
|
||
|
||
```
|
||
1. VirtualKey scope (highest priority)
|
||
2. Team scope
|
||
3. Customer scope
|
||
4. Global scope (lowest priority)
|
||
```
|
||
|
||
Within each scope, rules are sorted by **priority** (ascending: 0 before 10).
|
||
|
||
### Key Features
|
||
|
||
| Feature | Description |
|
||
|---------|-------------|
|
||
| **CEL Expressions** | Powerful, composable condition language with multiple operators |
|
||
| **Scope Hierarchy** | Rules at VirtualKey/Team/Customer/Global levels with proper precedence |
|
||
| **Dynamic Override** | Override provider and/or model based on runtime conditions |
|
||
| **Fallback Chains** | Define multiple fallback providers for automatic failover |
|
||
| **Priority Ordering** | Lower priority evaluated first within same scope |
|
||
| **Capacity Awareness** | Access real-time budget and rate limit usage percentages |
|
||
|
||
### Integration with Governance
|
||
|
||
Routing Rules execute **before** governance provider selection and can override it:
|
||
|
||
**If a routing rule matches**:
|
||
```
|
||
Routing Rules evaluate
|
||
↓
|
||
Rule matches: budget_used > 85
|
||
↓
|
||
Override: groq/llama-2 (cheaper provider)
|
||
↓
|
||
Governance provider_configs SKIPPED
|
||
↓
|
||
Load Balancing selects best key
|
||
```
|
||
|
||
**If no routing rule matches**:
|
||
```
|
||
Routing Rules evaluate
|
||
↓
|
||
No matching rule
|
||
↓
|
||
Governance decides: azure/gpt-4o (70% weight)
|
||
↓
|
||
Load Balancing selects best key
|
||
```
|
||
|
||
**Key Insight**: Routing rules have higher precedence than governance provider_configs. If a routing rule matches, governance provider_configs are bypassed entirely.
|
||
|
||
### Integration with Load Balancing
|
||
|
||
Routing Rules work **before** load balancing:
|
||
|
||
```
|
||
Routing Rules decide: openai/gpt-4o
|
||
↓
|
||
Load Balancing Level 1: Skipped (provider already determined)
|
||
↓
|
||
Load Balancing Level 2: Selects best OpenAI key based on performance
|
||
```
|
||
|
||
Even when routing rules determine the provider, load balancing Level 2 still optimizes which API key to use within that provider.
|
||
|
||
### Use Cases
|
||
|
||
- **Tier-based routing**: Premium users → fast providers
|
||
- **Capacity failover**: High budget usage → cheaper providers
|
||
- **Team preferences**: Different teams → different providers
|
||
- **A/B testing**: Route subset of traffic to test models
|
||
- **Regional routing**: EU users → EU providers (data residency)
|
||
- **Complex logic**: Combine multiple conditions for sophisticated routing
|
||
|
||
### Dashboard & API
|
||
|
||
Routing rules can be configured through:
|
||
|
||
- **Dashboard**: Visual rule builder with CEL expression editor
|
||
- **API**: `POST /api/governance/routing-rules` and related endpoints
|
||
- **Scope**: Create rules at global, customer, team, or virtual key levels
|
||
- **Priority**: Order rules within scope with numeric priority
|
||
|
||
For complete documentation, see [Routing Rules Documentation](/providers/routing-rules).
|
||
|
||
---
|
||
|
||
## Choosing the Right Approach
|
||
|
||
1. **Use Governance When:**
|
||
|
||
✅ **Compliance requirements**: Need to ensure data stays in specific regions or providers
|
||
✅ **Cost optimization**: Want explicit control over traffic distribution to cheaper providers
|
||
✅ **Budget enforcement**: Need hard limits on spending per provider
|
||
✅ **Environment separation**: Different teams/apps need different provider access
|
||
✅ **Rate limit management**: Need to respect provider-specific rate limits
|
||
|
||
2. **Use Routing Rules When:**
|
||
|
||
✅ **Dynamic routing**: Route based on runtime request context (headers, parameters)
|
||
✅ **Capacity-aware routing**: Switch to fallback when budget/rate limits high
|
||
✅ **Organization-based routing**: Different rules for teams/customers
|
||
✅ **A/B testing**: Route subset of traffic to test new models
|
||
✅ **Complex conditions**: Multiple criteria (e.g., tier + capacity + team)
|
||
|
||
3. **Use Load Balancing When:**
|
||
|
||
✅ **Performance optimization**: Want automatic routing to best-performing providers
|
||
✅ **Minimal configuration**: Prefer hands-off operation with intelligent defaults
|
||
✅ **Dynamic workloads**: Traffic patterns change frequently
|
||
✅ **Automatic failover**: Need instant adaptation to provider issues
|
||
✅ **Multi-provider redundancy**: Want seamless provider switching based on availability
|
||
|
||
4. **Use All Three Together:**
|
||
|
||
✅ **Complete solution**: Governance provides base routing, routing rules add dynamic override, load balancing optimizes keys
|
||
✅ **Maximum flexibility**: Different Virtual Keys use different strategies (governance vs routing rules vs load balancing)
|
||
✅ **Enterprise deployments**: Complex organizations with multiple requirements per layer
|
||
|
||
---
|
||
|
||
## Additional Resources
|
||
|
||
<CardGroup cols={2}>
|
||
<Card title="Governance Routing" icon="shield-check" href="/features/governance/routing">
|
||
Configuration instructions for setting up governance routing via Virtual Keys (Web UI, API, config.json)
|
||
</Card>
|
||
<Card title="Routing Rules" icon="sliders" href="/providers/routing-rules">
|
||
Dynamic, expression-based routing using CEL expressions for runtime conditions
|
||
</Card>
|
||
<Card title="Adaptive Load Balancing" icon="brain" href="/enterprise/adaptive-load-balancing">
|
||
Technical implementation details: scoring algorithms, weight calculations, and performance characteristics
|
||
</Card>
|
||
<Card title="Virtual Keys" icon="key" href="/features/governance/virtual-keys">
|
||
Learn how to create and configure Virtual Keys
|
||
</Card>
|
||
<Card title="Fallbacks" icon="arrow-rotate-right" href="/features/fallbacks">
|
||
Understand how automatic fallbacks work across providers
|
||
</Card>
|
||
</CardGroup>
|