bifrost/docs/providers/provider-routing.mdx

---
title: "Provider Routing"
description: "Understand how Bifrost routes requests across AI providers using governance rules and adaptive load balancing."
icon: "route"
---

## Overview

Bifrost offers two powerful methods for routing requests across AI providers, each serving different use cases:

1. **Governance-based Routing**: Explicit, user-defined routing rules configured via Virtual Keys
2. **Adaptive Load Balancing**: Automatic, performance-based routing powered by real-time metrics (Enterprise feature)

When both methods are available, **governance takes precedence** because users have explicitly defined their routing preferences through provider configurations on Virtual Keys.

<Info>
**When to use which method:**
- Use **Governance** when you need explicit control, compliance requirements, or specific cost optimization strategies
- Use **Adaptive Load Balancing** for automatic performance optimization and minimal configuration overhead
</Info>

---

## The Model Catalog

The Model Catalog is Bifrost's central registry that tracks which models are available from which providers. It powers both governance-based routing and adaptive load balancing by maintaining an up-to-date mapping of models to providers.

<Info>
**Architecture Documentation**: For detailed technical documentation on the Model Catalog implementation, including API reference, thread safety, and advanced usage patterns, see [Model Catalog Architecture](/architecture/framework/model-catalog).
</Info>

### Data Sources

The Model Catalog combines two data sources to maintain a comprehensive and up-to-date model registry:

1. **Pricing Data** (Primary source)
   - Downloaded from a remote URL (configurable, defaults to `https://getbifrost.ai/datasheet`)
   - Contains model names, pricing tiers, and provider mappings
   - Synced to database on startup and refreshed periodically (default: every 24 hours)
   - Used for cost calculation and initial model-to-provider mapping
   - **Stored as**: In-memory map `pricingData[model|provider|mode]` for O(1) lookups

2. **Provider List Models API** (Secondary source)
   - Calls each provider's `/v1/models` endpoint during startup
   - Enriches the catalog with provider-specific models and aliases
   - Re-fetched when providers are added/updated via API or dashboard
   - Adds models that may not be in pricing data yet (e.g., newly released models)
   - **Stored as**: In-memory map `modelPool[provider][]models`

<Info>
**Why two sources?** Pricing data provides comprehensive model coverage with cost information, while the List Models API ensures you can use newly released models immediately without waiting for pricing data updates.
</Info>

### How Model Availability is Determined

Bifrost uses a sophisticated multi-step process to determine if a model is available for a provider:

<AccordionGroup>
  <Accordion title="GetModelsForProvider(provider)">
    **Purpose**: Find all models available for a specific provider

    **Lookup Process**:
    1. Check `modelPool[provider]` for direct matches
    2. Return all models in that provider's slice

    **Example**:
    ```go
    models := GetModelsForProvider("openai")
    // Returns: ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo", ...]
    ```

    **Used by**:
    - Routing Methods to validate `allowed_models`
    - Dashboard model selector dropdowns
    - API responses for `/v1/models?provider=openai`
  </Accordion>

  <Accordion title="GetProvidersForModel(model)">
    **Purpose**: Find all providers that support a specific model

    **Lookup Process**:
    1. **Direct lookup**: Check each provider's model list in `modelPool`
    2. **Cross-provider resolution**: Apply special handling for proxy providers

    **Special Cross-Provider Rules**:

    <Steps>
      <Step title="OpenRouter Format">
        If model is not found directly, check if `provider/model` exists in OpenRouter
        ```go
        // Request: claude-3-5-sonnet
        // Checks: openrouter models for "anthropic/claude-3-5-sonnet"
        // Result: Adds "openrouter" to providers list
        ```
      </Step>

      <Step title="Vertex Format">
        If model is not found directly, check if `provider/model` exists in Vertex
        ```go
        // Request: claude-3-5-sonnet
        // Checks: vertex models for "anthropic/claude-3-5-sonnet"
        // Result: Adds "vertex" to providers list
        ```
      </Step>

      <Step title="Groq OpenAI Compatibility">
        For GPT models, check if `openai/model` exists in Groq
        ```go
        // Request: gpt-3.5-turbo
        // Checks: groq models for "openai/gpt-3.5-turbo"
        // Result: Adds "groq" to providers list
        ```
      </Step>

      <Step title="Bedrock Claude Models">
        For Claude models, check Bedrock with flexible matching
        ```go
        // Request: claude-3-5-sonnet
        // Checks: bedrock models containing "claude-3-5-sonnet"
        // Matches: "anthropic.claude-3-5-sonnet-20240620-v1:0"
        // Result: Adds "bedrock" to providers list
        ```
      </Step>
    </Steps>

    **Example**:
    ```go
    providers := GetProvidersForModel("claude-3-5-sonnet")
    // Returns: ["anthropic", "vertex", "bedrock", "openrouter"]
    // Even though the request was just "claude-3-5-sonnet"!
    ```

    **Used by**:
    - Load balancing to find candidate providers
    - Fallback generation
    - Model validation in requests
  </Accordion>

  <Accordion title="Pricing Lookup with Fallbacks">
    **Purpose**: Get pricing data for cost calculation and model validation

    **Lookup Key**: `model|provider|mode` (e.g., `gpt-4o|openai|chat`)

    **Fallback Chain**:
    1. **Primary lookup**: `model|provider|requestType`
    2. **Gemini → Vertex**: If Gemini not found, try Vertex with same model
    3. **Vertex format stripping**: For `provider/model`, strip prefix and retry
    4. **Bedrock prefix handling**: For Claude models, try with `anthropic.` prefix
    5. **Responses → Chat**: If Responses mode not found, try Chat mode

    **Example Flow**:
    ```go
    // Request: claude-3-5-sonnet on Gemini (Responses API)

    // 1. Try: claude-3-5-sonnet|gemini|responses → Not found
    // 2. Try: claude-3-5-sonnet|vertex|responses → Not found
    // 3. Try: claude-3-5-sonnet|vertex|chat → ✅ Found!

    // Pricing returned from vertex/chat mode
    ```

    **Used by**:
    - Cost calculation for billing
    - Model validation during routing
    - Budget enforcement
  </Accordion>
</AccordionGroup>

### Syncing Behavior

<AccordionGroup>
  <Accordion title="Initial Sync (Startup)">
    When Bifrost starts, it performs a complete model catalog initialization:

    **Step-by-step process** (from `server.go:Bootstrap()`):

    <Steps>
      <Step title="Load Pricing Data">
        ```go
        // 1. Download from URL
        pricingData := loadPricingFromURL(ctx)

        // 2. Store in database (if configStore available)
        configStore.CreateModelPrices(ctx, pricingData)

        // 3. Load into memory cache
        mc.pricingData = map[string]TableModelPricing{...}
        ```
      </Step>

      <Step title="Populate Initial Model Pool">
        ```go
        // Build modelPool from pricing data
        mc.populateModelPoolFromPricingData()
        // Result: modelPool[provider] = [models from pricing]
        ```
      </Step>

      <Step title="Fetch Dynamic Models">
        ```go
        // Call ListAllModels for all configured providers
        modelData, err := client.ListAllModels(ctx, nil)

        // Add results to model pool
        mc.AddModelDataToPool(modelData)
        // Result: modelPool enriched with provider-specific models
        ```
      </Step>

      <Step title="Handle Failures Gracefully">
        If list models API fails for a provider:
        ```json
        {"level":"warn","message":"failed to list models for provider ollama: connection refused"}
        ```
        - Logged as warning, **does not stop startup**
        - Provider remains usable with models from pricing data
        - Can be manually refreshed later via API
      </Step>
    </Steps>

    **Result**: Bifrost is ready with a comprehensive model catalog combining both sources.
  </Accordion>

  <Accordion title="Ongoing Sync (Background)">
    While Bifrost is running, the catalog stays up-to-date through background workers:

    **Pricing Data Sync**:
    - Background worker runs every **1 hour** (ticker interval)
    - Checks if **24 hours** have elapsed since last sync (configurable)
    - If yes, downloads fresh pricing data and updates database + memory cache
    - Timer resets after successful sync

    **List Models API Sync**:
    Triggered by these events:
    1. **Provider Added**: When a new provider is configured
       ```bash
       POST /api/v1/providers
       # Automatically calls ListModels for the new provider
       ```

    2. **Provider Updated**: When provider config changes (keys, endpoints, etc.)
       ```bash
       PUT /api/v1/providers/{provider}
       # Refetches models to detect changes
       ```

    3. **Manual Refresh**: Via API endpoint
       ```bash
       POST /api/v1/providers/{provider}/models/refetch
       # Explicitly refetches models for a provider
       ```

    4. **Manual Delete + Refetch**: Clear and reload models
       ```bash
       DELETE /api/v1/providers/{provider}/models
       POST /api/v1/providers/{provider}/models/refetch
       # Useful when models are out of sync
       ```

    **Failure Handling**:
    - Pricing URL fails but database has data → Use cached database records
    - Pricing URL fails and no database data → Error logged, existing memory cache retained
    - List models API fails → Log warning, retain existing model pool entries
  </Accordion>

  <Accordion title="Fallback Strategy">
    Bifrost's multi-layered approach ensures high availability:

    **Layer 1: Pricing Data Persistence**
    ```
    URL fails → Database → Memory cache → Continue operation
    ```

    **Layer 2: Model Pool Redundancy**
    ```
    ListModels fails → Pricing data models → Continue with reduced catalog
    ```

    **Layer 3: Runtime Validation**
    ```
    Model not in catalog → Special cross-provider rules → May still work
    ```

    **Example Scenario**:
    ```
    Situation:
    - Pricing URL is down
    - OpenAI ListModels API is down
    - User requests gpt-4o on OpenAI

    Bifrost's Response:
    1. ✅ Pricing data available from database (last sync 12h ago)
    2. ✅ Model pool has gpt-4o from previous ListModels call
    3. ✅ Request proceeds normally
    4. 📊 Cost calculated from cached pricing data
    ```

    This design ensures **requests never fail due to sync issues** as long as one data source is available.
  </Accordion>
</AccordionGroup>

### Allowed Models Behavior with Examples

The `allowed_models` field in provider configs controls which models can be used with that provider. Understanding its behavior is crucial for governance routing.

<Tabs>
<Tab title="Wildcard allowed_models (Use Catalog)">

**Configuration**:
```json
{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["*"],
      "weight": 1.0
    }
  ]
}
```

**Behavior**:
- Bifrost calls `GetModelsForProvider("openai")`
- Returns all models in `modelPool["openai"]`
- Request validated against catalog

**Examples**:
```bash
# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

# ✅ Allowed (in catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-3.5-turbo"}'

# ❌ Rejected (not in OpenAI catalog)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'
```

**Use Cases**:
- Default behavior for most deployments
- Automatically stays up-to-date with provider's model offerings
- No manual model list maintenance required

<Warning>
Using `"allowed_models": []` (empty array) means **deny all models** — no requests will be served. Use `["*"]` to allow all models via the catalog.
</Warning>

</Tab>

<Tab title="Explicit allowed_models (Strict Control)">

**Configuration**:
```json
{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o", "gpt-4o-mini"],  // Only these two
      "weight": 1.0
    },
    {
      "provider": "anthropic",
      "allowed_models": ["claude-3-5-sonnet-20241022"],  // Specific version
      "weight": 1.0
    }
  ]
}
```

**Behavior**:
- Bifrost validates request model against explicit list
- Catalog is **ignored** for this provider
- Supports both direct matches and provider-prefixed entries
- Case-sensitive matching

**Examples**:
```bash
# ✅ Allowed (in explicit list)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

# ❌ Rejected (not in explicit list)
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4-turbo"}'
# Even though gpt-4-turbo is in the OpenAI catalog!

# ✅ Allowed (exact match for Anthropic)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20241022"}'

# ❌ Rejected (version mismatch)
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet-20240620"}'
```

**Provider-Prefixed Entries**:

You can also use provider-prefixed model names in `allowed_models`. Bifrost will strip the prefix and match against the requested model:

```json
{
  "provider_configs": [
    {
      "provider": "openrouter",
      "allowed_models": ["openai/gpt-4o", "anthropic/claude-3-5-sonnet"],
      "weight": 1.0
    }
  ]
}
```

**How it works**:
```bash
# Request without prefix
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

# 1. Checks: "openai/gpt-4o" in allowed_models
# 2. Strips prefix: "openai/gpt-4o" → "gpt-4o"
# 3. Compares: "gpt-4o" == "gpt-4o" ✅
# 4. Result: Allowed and routed to OpenRouter
```

This is particularly useful for proxy providers (OpenRouter, Vertex) where you want to explicitly control which upstream models are accessible.

**Use Cases**:
- Compliance requirements (only approved models)
- Cost control (restrict to cheaper models)
- Version pinning (prevent automatic updates)
- Testing specific model versions
- **Explicit cross-provider routing** (e.g., only allow OpenAI models via OpenRouter)

</Tab>

<Tab title="Aliases (Key-Level)">

**Key Concept**: Aliases are **key-level** mappings that allow user-friendly model names to map to provider-specific identifiers.

**How Aliases Work**:
- Defined at the **Key level**, not Virtual Key level
- Structure: `aliases: {"user-facing-name": "provider-specific-id"}`
- **Alias key** (left side): User-facing model name used in requests
- **Provider ID** (right side): Provider-specific identifier sent to the API

**Azure OpenAI Example**:

Provider configuration with alias mapping:
```json
{
  "providers": {
    "azure": {
      "keys": [
        {
          "name": "azure-prod-key",
          "value": "your-api-key",
          "aliases": {
            "gpt-4o": "my-prod-gpt4o-deployment",
            "gpt-4o-mini": "my-mini-deployment"
          },
          "azure_key_config": {
            "endpoint": "https://your-resource.openai.azure.com"
          }
        }
      ]
    }
  }
}
```

**What Happens**:
1. **Allowed models derived from aliases**: `["gpt-4o", "gpt-4o-mini"]`
2. **User requests with alias**: `{"model": "gpt-4o"}`
3. **Bifrost validates**: `gpt-4o` is in derived allowed models ✅
4. **Bifrost resolves alias**: `gpt-4o` → `my-prod-gpt4o-deployment`
5. **Sent to Azure**: Uses `my-prod-gpt4o-deployment` as the deployment name
6. **Pricing lookup**: If pricing for resolved ID not found, falls back to alias `gpt-4o`

**Bedrock Example with Inference Profiles**:

```json
{
  "providers": {
    "bedrock": {
      "keys": [
        {
          "name": "bedrock-key",
          "aliases": {
            "claude-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
            "claude-opus": "us.anthropic.claude-3-opus-20240229-v1:0"
          },
          "bedrock_key_config": {
            "access_key": "your-access-key",
            "secret_key": "your-secret-key",
            "region": "us-east-1"
          }
        }
      ]
    }
  }
}
```

**What Happens**:
1. **Allowed models**: `["claude-sonnet", "claude-opus"]` (from alias keys)
2. **User requests**: `{"model": "claude-sonnet"}`
3. **Bifrost validates**: `claude-sonnet` in allowed models ✅
4. **Resolves alias**: `claude-sonnet` → `us.anthropic.claude-3-5-sonnet-20241022-v2:0`
5. **Sent to Bedrock**: Full ARN used in API call

**Priority of Model Restrictions**:

When determining allowed models for a key:
```
1. If key.models is NOT empty → Use key.models
2. Else if aliases exist → Use alias keys
3. Else → All models allowed (use Model Catalog)
```

**Example with Both**:
```json
{
  "keys": [
    {
      "models": ["gpt-4o", "gpt-3.5-turbo"],  // Explicit restriction
      "aliases": {
        "gpt-4o": "my-deployment",
        "gpt-4-turbo": "another-deployment"  // NOT accessible!
      },
      "azure_key_config": {
        "endpoint": "https://your-resource.openai.azure.com"
      }
    }
  ]
}
```
Result: Only `["gpt-4o", "gpt-3.5-turbo"]` allowed (models field takes priority)

**Vertex Example** (similar pattern):
```json
{
  "keys": [
    {
      "aliases": {
        "claude-3-5-sonnet": "anthropic/claude-3-5-sonnet@20241022",
        "gemini-pro": "google/gemini-1.5-pro"
      },
      "vertex_key_config": {
        "project_id": "my-project",
        "region": "us-central1"
      }
    }
  ]
}
```

**Use Cases for Aliases**:
- **Azure**: Map generic model names to specific deployment names in your Azure resource
- **Bedrock**: Use short aliases for long inference profile ARNs
- **Vertex**: Map to specific model versions or regional endpoints
- **Multi-environment**: Different aliases per key (dev/staging/prod)

**Key Insight**:
```
User Request: {"model": "gpt-4o"}
              ↓
Validation: Check if "gpt-4o" in allowed models (derived from aliases)
              ↓
Mapping: aliases["gpt-4o"] → "my-prod-gpt4o-deployment"
              ↓
API Call: Uses "my-prod-gpt4o-deployment" as deployment ID
              ↓
Pricing: Falls back to "gpt-4o" if resolved ID not in pricing data
```

This allows user-friendly model names in requests while supporting provider-specific identifier patterns at the key level.

</Tab>

<Tab title="Cross-Provider Model Routing">

**Configuration**:
```json
{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o"],
      "weight": 0.5
    },
    {
      "provider": "azure",
      "allowed_models": ["gpt-4o"],
      "weight": 0.5
    }
  ]
}
```

**Request**:
```bash
curl -H "x-bf-vk: vk-123" \
     -d '{"model": "gpt-4o"}'
```

**Routing Behavior**:
1. **Model validation**: Both providers have `gpt-4o` in allowed_models ✅
2. **Weighted selection**: 50% chance each
3. **Provider selected**: Let's say Azure
4. **Model transformation**: `gpt-4o` → `azure/gpt-4o`
5. **Fallbacks**: `["openai/gpt-4o"]` (remaining providers)

**Special Cross-Provider Scenarios**:

<Steps>
  <Step title="OpenRouter as Universal Proxy">
    ```json
    {
      "provider_configs": [
        {
          "provider": "openrouter",
          "allowed_models": ["*"]
        }
      ]
    }
    ```

    Request `claude-3-5-sonnet`:
    - Bifrost checks: `GetModelsForProvider("openrouter")`
    - Finds: `anthropic/claude-3-5-sonnet` in OpenRouter catalog
    - ✅ Allowed, routes to OpenRouter
  </Step>

  <Step title="Weighted Routing via Proxy Provider">
    **Use Case**: Route 99% of OpenAI traffic through OpenRouter for cost savings, keep 1% direct for fallback

    ```json
    {
      "provider_configs": [
        {
          "provider": "openai",
          "allowed_models": ["gpt-4o"],
          "weight": 0.01  // 1% direct to OpenAI
        },
        {
          "provider": "openrouter",
          "allowed_models": ["openai/gpt-4o"],  // Provider-prefixed
          "weight": 0.99  // 99% via OpenRouter
        }
      ]
    }
    ```

    Request `gpt-4o`:
    - **OpenAI check**: `"gpt-4o"` in `["gpt-4o"]` → ✅ Allowed
    - **OpenRouter check**: Strips prefix from `"openai/gpt-4o"` → matches `"gpt-4o"` → ✅ Allowed
    - **Weighted selection**: 99% chance → OpenRouter selected
    - **Final model**: `openrouter/gpt-4o`
    - **Fallbacks**: `["openai/gpt-4o"]` (1% provider as fallback)

    **Why this works**: Bifrost now supports provider-prefixed entries in `allowed_models`, so `"openai/gpt-4o"` matches requests for `"gpt-4o"`.
  </Step>

  <Step title="Vertex as Multi-Provider Gateway">
    ```json
    {
      "provider_configs": [
        {
          "provider": "vertex",
          "allowed_models": ["claude-3-5-sonnet", "gemini-1.5-pro"]
        }
      ]
    }
    ```

    Request `claude-3-5-sonnet`:
    - Model catalog lookup: `GetProvidersForModel("claude-3-5-sonnet")`
    - Finds: `["anthropic", "vertex", "bedrock"]`
    - Validation: `claude-3-5-sonnet` in allowed_models ✅
    - Sends to Vertex as: `anthropic/claude-3-5-sonnet`
  </Step>

  <Step title="Groq OpenAI Compatibility">
    ```json
    {
      "provider_configs": [
        {
          "provider": "groq",
          "allowed_models": ["gpt-3.5-turbo"]
        }
      ]
    }
    ```

    Request `gpt-3.5-turbo`:
    - Special handling: Checks Groq catalog for `openai/gpt-3.5-turbo`
    - ✅ Found, validation passes
    - Sends to Groq as: `openai/gpt-3.5-turbo`
  </Step>
</Steps>

</Tab>
</Tabs>

### How It's Used in Routing

<Tabs>
<Tab title="Governance Routing">

When a Virtual Key has `provider_configs`, governance uses the model catalog for validation:

**Wildcard allowed_models Example**:
```json
{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["*"],
      "weight": 0.5
    }
  ]
}
```

**Request Flow**:
```bash
curl -H "x-bf-vk: vk-123" -d '{"model": "gpt-4o"}'

# 1. Governance checks: Is "gpt-4o" in GetModelsForProvider("openai")?
# 2. Catalog lookup: modelPool["openai"] contains "gpt-4o" ✅
# 3. Validation passes, provider selected
# 4. Model becomes: "openai/gpt-4o"
```

**Rejection Example**:
```bash
curl -H "x-bf-vk: vk-123" -d '{"model": "claude-3-5-sonnet"}'

# 1. Governance checks: Is "claude-3-5-sonnet" in GetModelsForProvider("openai")?
# 2. Catalog lookup: modelPool["openai"] does NOT contain "claude-3-5-sonnet" ❌
# 3. Validation fails, request rejected
# 4. Error: "model not allowed for any configured provider"
```

</Tab>

<Tab title="Load Balancing">

When load balancing selects providers, it queries the catalog to find candidates:

**Request Flow**:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "gpt-4o", "messages": [...]}'

# 1. Load balancer: GetProvidersForModel("gpt-4o")
# 2. Catalog returns: ["openai", "azure", "groq"]
# 3. Filter by configured providers: ["openai", "azure"]  (groq not configured)
# 4. Performance scoring: openai=0.95, azure=0.87
# 5. Select: openai (highest score)
# 6. Model becomes: "openai/gpt-4o"
# 7. Fallbacks: ["azure/gpt-4o"]
```

**Cross-Provider Discovery**:
```bash
curl -d '{"model": "claude-3-5-sonnet"}'

# 1. Load balancer: GetProvidersForModel("claude-3-5-sonnet")
# 2. Catalog checks:
#    - Direct: ["anthropic"] ✅
#    - OpenRouter: Has "anthropic/claude-3-5-sonnet" ✅
#    - Vertex: Has "anthropic/claude-3-5-sonnet" ✅
#    - Bedrock: Has "anthropic.claude-3-5-sonnet-..." ✅
# 3. Catalog returns: ["anthropic", "openrouter", "vertex", "bedrock"]
# 4. Performance scoring across all four
# 5. Best performer selected
```

This is how Bifrost achieves **intelligent cross-provider routing** without manual configuration.

</Tab>
</Tabs>

<Note>
**Model Catalog is essential for cross-provider routing**. Without it, Bifrost wouldn't know that `gpt-4o` is available from OpenAI, Azure, and Groq, or that `claude-3-5-sonnet` can be routed through Anthropic, Vertex, Bedrock, and OpenRouter. This knowledge powers both governance validation and load balancing provider discovery.
</Note>

---

## Governance-based Routing

Governance-based routing allows you to explicitly define which providers and models should handle requests for a specific Virtual Key. This method provides precise control over routing decisions.

### How It Works

When a Virtual Key has `provider_configs` defined:

1. **Request arrives** with a Virtual Key (e.g., `x-bf-vk: vk-prod-main`)
2. **Model validation**: Bifrost checks if the requested model is allowed for any configured provider
3. **Provider filtering**: Providers are filtered based on:
   - Model availability in `allowed_models`
   - Budget limits (current usage vs max limit)
   - Rate limits (tokens/requests per time window)
4. **Weighted selection**: A provider is selected using weighted random distribution
5. **Provider prefix added**: Model string becomes `provider/model` (e.g., `openai/gpt-4o`)
6. **Fallbacks created**: Remaining providers sorted by weight (descending) are added as fallbacks

### Configuration Example

```json
{
  "provider_configs": [
    {
      "provider": "openai",
      "allowed_models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 0.3,
      "budget": {
        "max_limit": 100.0,
        "current_usage": 45.0
      }
    },
    {
      "provider": "azure",
      "allowed_models": ["gpt-4o"],
      "weight": 0.7,
      "rate_limit": {
        "token_max_limit": 100000,
        "token_reset_duration": "1m"
      }
    }
  ]
}
```

### Request Flow

<Steps>
  <Step title="Request with Virtual Key">
    ```bash
    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "x-bf-vk: vk-prod-main" \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```
  </Step>
  <Step title="Governance Evaluation">
    - OpenAI: ✅ Has `gpt-4o` in allowed_models, budget OK, weight 0.3
    - Azure: ✅ Has `gpt-4o` in allowed_models, rate limit OK, weight 0.7
  </Step>
  <Step title="Weighted Selection">
    - 70% chance → Azure
    - 30% chance → OpenAI
  </Step>
  <Step title="Request Transformation">
    ```json
    {
      "model": "azure/gpt-4o",
      "messages": [...],
      "fallbacks": ["openai/gpt-4o"]
    }
    ```
  </Step>
</Steps>

### Key Features

| Feature | Description |
|---------|-------------|
| **Explicit Control** | Define exactly which providers and models are accessible |
| **Budget Enforcement** | Automatically exclude providers exceeding budget limits |
| **Rate Limit Protection** | Skip providers that have hit rate limits |
| **Weighted Distribution** | Control traffic distribution with custom weights |
| **Automatic Fallbacks** | Failed providers automatically retry with next highest weight |

### Best Practices

<AccordionGroup>
  <Accordion title="Cost Optimization">
    Assign higher weights to cheaper providers for cost-sensitive workloads:
    ```json
    {
      "provider_configs": [
        {"provider": "groq", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.7},
        {"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"], "weight": 0.3}
      ]
    }
    ```
  </Accordion>

  <Accordion title="Environment Separation">
    Create different Virtual Keys for dev/staging/prod with different provider access:
    ```json
    {
      "virtual_keys": [
        {
          "id": "vk-dev",
          "provider_configs": [{"provider": "ollama", "allowed_models": ["*"], "key_ids": ["*"]}]
        },
        {
          "id": "vk-prod",
          "provider_configs": [
            {"provider": "openai", "allowed_models": ["*"], "key_ids": ["*"]},
            {"provider": "azure", "allowed_models": ["*"], "key_ids": ["*"]}
          ]
        }
      ]
    }
    ```
  </Accordion>

  <Accordion title="Compliance & Data Residency">
    Restrict specific Virtual Keys to compliant providers:
    ```json
    {
      "provider_configs": [
        {"provider": "azure", "allowed_models": ["gpt-4o"]},
        {"provider": "bedrock", "allowed_models": ["claude-3-sonnet-20240229"]}
      ]
    }
    ```
  </Accordion>
</AccordionGroup>

<Note>
**`allowed_models: ["*"]`**: Allows all models supported by the provider, validated via the Model Catalog (populated from pricing data and the provider's list models API). See the [Model Catalog section](#the-model-catalog) above for how syncing works. For configuration instructions, see [Governance Routing](/features/governance/routing).

**`allowed_models: []` (empty array)**: Denies **all** models — no requests will be served for this provider config. This is deny-by-default behavior introduced in v1.5.0.

**Empty `provider_configs`**: When `provider_configs` is empty (no providers configured), **all providers are blocked** (deny-by-default). You must explicitly add provider configurations to allow traffic through a Virtual Key.
</Note>

---

## Adaptive Load Balancing

<Info>
**Enterprise Feature**: Adaptive Load Balancing is available in Bifrost Enterprise. [Contact us](https://www.getmaxim.ai/bifrost/enterprise) to enable it.
</Info>

Adaptive Load Balancing automatically optimizes routing based on real-time performance metrics. It operates at **two levels** to provide both macro-level provider selection and micro-level key optimization.

### Two-Level Architecture

<Card title="Why Two Levels?" icon="layer-group">
Separating provider selection (direction) from key selection (route) enables:
- **Provider-level optimization**: Choose the best provider for a model based on aggregate performance
- **Key-level optimization**: Within that provider, choose the best API key based on individual key performance
- **Resilience**: Even when provider is specified (by governance or user), key-level load balancing still optimizes which API key to use
</Card>

```mermaid
flowchart TB
    Request["Request: gpt-4o"]

    subgraph Level1["Level 1: Direction (Provider Selection)"]
        Cat["Model Catalog Lookup"]
        Providers["Candidate Providers:<br/>openai, azure, groq"]
        Filter["Filter by allowed_models<br/>and key availability"]
        Score["Score by performance:<br/>error rate, latency, utilization"]
        Select["Select: openai"]
    end

    subgraph Level2["Level 2: Route (Key Selection)"]
        Keys["Available OpenAI Keys:<br/>key-1, key-2, key-3"]
        KeyScore["Score each key:<br/>error rate, latency, TPM hits"]
        KeySelect["Select: key-2<br/>(best performing)"]
    end

    Request --> Cat --> Providers --> Filter --> Score --> Select
    Select --> Keys --> KeyScore --> KeySelect --> Response["Execute with<br/>openai/gpt-4o + key-2"]
```

### Level 1: Direction (Provider Selection)

**When it runs**: Only when the model string has **no** provider prefix (e.g., `gpt-4o`)

**How it works**:

1. **Model catalog lookup**: Find all configured providers that support the requested model
2. **Provider filtering**: Filter based on:
   - Allowed models from keys configuration
   - Keys availability for the provider
3. **Performance scoring**: Calculate scores for each provider based on:
   - Error rates (50% weight)
   - Latency (20% weight, using MV-TACOS algorithm)
   - Utilization (5% weight)
   - Momentum bias (recovery acceleration)
4. **Smart selection**: Choose provider using weighted random with jitter and exploration
5. **Fallbacks created**: Remaining providers sorted by performance score (descending) are added as fallbacks

### Level 2: Route (Key Selection)

**When it runs**: **Always**, even when provider is already specified (by governance, user, or Level 1)

**How it works**:

1. **Get available keys**: Fetch all keys for the selected provider
2. **Filter by configuration**: Apply model restrictions from key configuration
3. **Performance scoring**: Calculate score for each key based on:
   - Error rates (recent failures)
   - Latency (response time)
   - TPM hits (rate limit violations)
   - Current state (Healthy, Degraded, Failed, Recovering)
4. **Weighted random selection**: Choose key with exploration (25% chance to probe recovering keys)
5. **Circuit breaker**: Skip keys with zero weight (TPM hits, repeated failures)

### Scoring Algorithm

The load balancer computes a performance score for each provider-model combination:

$$
Score = (P_{error} \times 0.5) + (P_{latency} \times 0.2) + (P_{util} \times 0.05) - M_{momentum}
$$

<Tip>
Lower penalties = Higher weights = More traffic. The system self-heals by quickly penalizing failing routes but enabling fast recovery once issues are resolved.
</Tip>

### Request Flow

<Steps>
  <Step title="Request without Provider Prefix">
    ```bash
    curl -X POST http://localhost:8080/v1/chat/completions \
      -d '{"model": "gpt-4o", "messages": [...]}'
    ```
  </Step>
  <Step title="Model Catalog Lookup">
    Providers supporting `gpt-4o`: [openai, azure, groq]
  </Step>
  <Step title="Performance Evaluation">
    - OpenAI: Score 0.92 (low latency, 99% success rate)
    - Azure: Score 0.85 (medium latency, 98% success rate)
    - Groq: Score 0.65 (high latency recently)
  </Step>
  <Step title="Provider Selection">
    OpenAI selected (highest score within jitter band)
  </Step>
  <Step title="Request Transformation">
    ```json
    {
      "model": "openai/gpt-4o",
      "messages": [...],
      "fallbacks": ["azure/gpt-4o", "groq/gpt-4o"]
    }
    ```
  </Step>
</Steps>

### Key Features

| Feature | Description |
|---------|-------------|
| **Automatic Optimization** | No manual weight tuning required |
| **Real-time Adaptation** | Weights recomputed every 5 seconds based on live metrics |
| **Circuit Breakers** | Failing routes automatically removed from rotation |
| **Fast Recovery** | 90% penalty reduction in 30 seconds after issues resolve |
| **Health States** | Routes transition between Healthy, Degraded, Failed, and Recovering |
| **Smart Exploration** | 25% chance to probe potentially recovered routes |


### Dashboard Visibility

Monitor load balancing performance in real-time:

<Frame>
  <img src="/media/ui-load-balancing.png" alt="Adaptive Load Balancing Dashboard" />
</Frame>

The dashboard shows:
- Weight distribution across provider-model-key routes
- Performance metrics (error rates, latency, success rates)
- State transitions (Healthy → Degraded → Failed → Recovering)
- Actual vs expected traffic distribution

---

## How Governance and Load Balancing Interact

When both methods are available in your Bifrost deployment, they work together in a complementary way across two levels.

<Warning>
**Key Insight**: Load balancing has **two levels**:
- **Level 1 (Direction/Provider)**: Skipped when provider is already specified
- **Level 2 (Route/Key)**: **Always runs**, even when provider is specified

This means key-level optimization works regardless of how the provider was chosen!
</Warning>

### Execution Flow

```mermaid
flowchart TD
    Start["Request: gpt-4o"]

    subgraph Governance["Governance Plugin (HTTPTransportIntercept)"]
        HasVK{"Has VK with<br/>provider_configs?"}
        GovRoute["Provider Selection:<br/>Weighted random"]
        AddPrefix["Add prefix:<br/>azure/gpt-4o"]
    end

    subgraph LB1["Load Balancer Level 1 (Middleware)"]
        PrefixCheck{"Has provider<br/>prefix?"}
        LBProvider["Provider Selection:<br/>Performance-based"]
        AddLBPrefix["Add prefix:<br/>openai/gpt-4o"]
    end

    subgraph LB2["Load Balancer Level 2 (Key Selector)"]
        GetKeys["Get available keys<br/>for selected provider"]
        ScoreKeys["Score keys by<br/>performance metrics"]
        SelectKey["Select best key"]
    end

    Start --> HasVK
    HasVK -->|Yes| GovRoute --> AddPrefix
    HasVK -->|No| PrefixCheck
    AddPrefix --> PrefixCheck
    PrefixCheck -->|Yes, skip Level 1| GetKeys
    PrefixCheck -->|No| LBProvider --> AddLBPrefix --> GetKeys
    GetKeys --> ScoreKeys --> SelectKey --> Execute["Execute request<br/>with selected provider + key"]
```

### Execution Order

1. **HTTPTransportIntercept** (Governance Plugin - Provider Level)
   - Runs first in the request pipeline
   - Checks if Virtual Key has `provider_configs`
   - If yes: adds provider prefix (e.g., `azure/gpt-4o`)
   - **Result**: Provider is selected by governance rules

2. **Middleware** (Load Balancing Plugin - Provider Level / Direction)
   - Runs after HTTPTransportIntercept
   - Checks if model string contains "/"
   - If yes: **skips provider selection** (already determined by governance or user)
   - If no: performs performance-based provider selection
   - **Result**: Provider prefix added if not already present

3. **KeySelector** (Load Balancing - Key Level / Route)
   - **Always runs** during request execution in Bifrost core
   - Gets all keys for the selected provider
   - Filters keys based on model restrictions
   - Scores each key by performance metrics
   - Selects best key using weighted random + exploration
   - **Result**: Optimal key selected within the provider

<Info>
**Important**: Even when governance specifies `azure/gpt-4o`, load balancing **still optimizes which Azure key to use** based on performance metrics. This is the power of the two-level architecture!
</Info>

### Example Scenarios

<Tabs>
<Tab title="Governance Only">

**Setup:**
- Virtual Key has `provider_configs` defined
- No adaptive load balancing enabled

**Request:**
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": [...]}'
```

**Behavior:**
1. **Governance** applies weighted provider routing → selects Azure (70% weight)
2. Model becomes `azure/gpt-4o`
3. **Standard key selection** (non-adaptive) chooses an Azure key based on static weights
4. Request forwarded to Azure with selected key

</Tab>

<Tab title="Load Balancing Only">

**Setup:**
- **No Virtual Key** (do not send `x-bf-vk`) → this is the **Load Balancing–only** setup
- **Virtual Key with empty / missing `provider_configs`** → **blocks all providers** (deny-by-default) and therefore is **NOT** an LB-only setup
- Adaptive load balancing enabled

**Request:**
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "gpt-4o", "messages": [...]}'
```

**Behavior:**
1. **Load Balancing Level 1** applies performance-based provider routing → selects OpenAI (best performing)
2. Model becomes `openai/gpt-4o`
3. **Load Balancing Level 2** selects best OpenAI key based on performance metrics (error rate, latency, TPM status)
4. Request forwarded to OpenAI with optimal key

</Tab>

<Tab title="Both Available (Governance + Load Balancing)">

**Setup:**
- Virtual Key has `provider_configs` defined
- Adaptive load balancing enabled
- Azure has 3 keys: `azure-key-1`, `azure-key-2`, `azure-key-3`

**Request:**
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "x-bf-vk: vk-prod-main" \
  -d '{"model": "gpt-4o", "messages": [...]}'
```

**Behavior:**
1. **Governance** applies first (respects explicit user config) → selects Azure provider
2. Model becomes `azure/gpt-4o`
3. **Load Balancing Level 1** sees "/" and **skips provider selection** (already decided)
4. **Load Balancing Level 2** still runs! Selects best Azure key based on performance:
   - `azure-key-1`: 99% success rate, 150ms avg latency → score 0.95
   - `azure-key-2`: 85% success rate, 200ms avg latency → score 0.60 (degraded)
   - `azure-key-3`: Hit TPM limit → score 0.0 (circuit broken)
   - **Selects `azure-key-1`** (highest score)
5. Request forwarded to Azure with `azure-key-1`

**Why?** Governance controls provider selection (explicit user intent), but load balancing still optimizes key selection (automatic performance optimization).

</Tab>

<Tab title="Manual Provider Selection">

**Setup:**
- Both governance and load balancing enabled
- OpenAI has 2 keys available

**Request:**
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
  -d '{"model": "openai/gpt-4o", "messages": [...]}'
```

**Behavior:**
1. **Governance** sees "/" and skips
2. **Load Balancing Level 1** sees "/" and **skips provider selection**
3. **Load Balancing Level 2** still runs! Selects best OpenAI key based on current metrics
4. Request forwarded to OpenAI with optimal key

**Why?** User explicitly specified the provider, but key-level optimization still provides value by selecting the best-performing OpenAI key.

</Tab>
</Tabs>

### Provider vs Key Selection Rules

| Scenario | Provider Selection | Key Selection |
|----------|-------------------|---------------|
| VK with provider_configs | **Governance** (weighted random) | **Standard** or **Adaptive** (if enabled) |
| VK without provider_configs + LB | **Blocked** (empty = no providers allowed) | N/A |
| No VK + LB | **Load Balancing Level 1** (performance) | **Load Balancing Level 2** (performance) |
| Model with provider prefix + LB | **Skip** (already specified) | **Load Balancing Level 2** (performance) ✅ |
| No Load Balancing enabled | **Governance** or **User** or **Model Catalog** | **Standard** (static weights) |

<Note>
**Critical Insight**:
- **Provider selection** respects the hierarchy: Governance → Load Balancing Level 1 → User specification
- **Key selection** runs independently and benefits from load balancing **even when provider is predetermined**

This separation is what makes the two-level architecture so powerful!
</Note>

---

## Routing Rules (Dynamic Expression-Based Routing)

<Info>
**Position in routing pipeline**: Routing Rules execute **before governance provider selection** and can override it. They are evaluated before adaptive load balancing, enabling dynamic provider/model overrides based on runtime conditions like headers, parameters, capacity metrics, and organizational hierarchy.
</Info>

### Overview

Routing Rules provide sophisticated, expression-based control over request routing using CEL expressions. Unlike governance routing (static weights), routing rules evaluate conditions dynamically at request time.

### When Routing Rules Execute

```mermaid
flowchart TD
    Start["Request: model + provider"]

    subgraph Rules["1. Routing Rules Layer (Evaluated First)"]
        RuleMatch{"CEL Expression<br/>Matches?"}
        RuleDecision["Override:<br/>New provider/model/fallbacks"]
        NoMatch["No match:<br/>Continue to Governance"]
    end

    subgraph Gov["2. Governance Layer (if no routing rule matched)"]
        VKValidation["Virtual Key Validation"]
        GovRouting["Provider Governance Routing<br/>(weighted random)"]
    end

    subgraph LB["3. Load Balancing Layer"]
        LB1["Level 1: Provider Selection"]
        LB2["Level 2: Key Selection"]
    end

    Start --> RuleMatch
    RuleMatch -->|Yes| RuleDecision --> LB1
    RuleMatch -->|No| NoMatch --> VKValidation --> GovRouting --> LB1
    LB1 --> LB2 --> Execute["Execute with<br/>selected provider + key"]
```

### How It Works

1. **Routing rules evaluate first** in scope precedence order (VirtualKey → Team → Customer → Global)
2. **If a routing rule matches**: provider/model/fallbacks are overridden, governance provider_configs are skipped
3. **If no routing rule matches**: governance provider selection runs (weighted random)
4. **Load balancing Level 1**: skipped if provider already determined (has "/" prefix)
5. **Load balancing Level 2** (key selection): always runs to select the best key within the determined provider

### Available CEL Variables

Routing rules access request context through CEL variables:

```cel
// Request context
model                      // Requested model
provider                   // Current provider

// Headers and parameters (case-insensitive)
headers["x-tier"]          // Request header
params["region"]           // Query parameter

// Organization context
virtual_key_id             // VirtualKey ID
team_name                  // Team name
customer_id                // Customer ID

// Capacity metrics (0-100 percentage)
budget_used                // Budget usage %
tokens_used                // Token rate limit usage %
request                    // Request rate limit usage %
```

### Examples

#### Route based on user tier
```cel
headers["x-tier"] == "premium"   // → openai/gpt-4o
```

#### Route to fallback when budget high
```cel
budget_used > 85                 // → groq/llama-2 (cheaper)
```

#### Route by team
```cel
team_name == "ml-research"       // → anthropic/claude-3-opus
```

#### Complex multi-condition routing
```cel
headers["x-environment"] == "production" &&
tokens_used < 75 &&
team_name == "ai-platform"       // → openai/gpt-4o
```

### Scope Hierarchy

Rules are evaluated in organizational precedence order (first-match-wins):

```
1. VirtualKey scope (highest priority)
2. Team scope
3. Customer scope
4. Global scope (lowest priority)
```

Within each scope, rules are sorted by **priority** (ascending: 0 before 10).

### Key Features

| Feature | Description |
|---------|-------------|
| **CEL Expressions** | Powerful, composable condition language with multiple operators |
| **Scope Hierarchy** | Rules at VirtualKey/Team/Customer/Global levels with proper precedence |
| **Dynamic Override** | Override provider and/or model based on runtime conditions |
| **Fallback Chains** | Define multiple fallback providers for automatic failover |
| **Priority Ordering** | Lower priority evaluated first within same scope |
| **Capacity Awareness** | Access real-time budget and rate limit usage percentages |

### Integration with Governance

Routing Rules execute **before** governance provider selection and can override it:

**If a routing rule matches**:
```
Routing Rules evaluate
                    ↓
Rule matches: budget_used > 85
                    ↓
Override: groq/llama-2 (cheaper provider)
                    ↓
Governance provider_configs SKIPPED
                    ↓
Load Balancing selects best key
```

**If no routing rule matches**:
```
Routing Rules evaluate
                    ↓
No matching rule
                    ↓
Governance decides: azure/gpt-4o (70% weight)
                    ↓
Load Balancing selects best key
```

**Key Insight**: Routing rules have higher precedence than governance provider_configs. If a routing rule matches, governance provider_configs are bypassed entirely.

### Integration with Load Balancing

Routing Rules work **before** load balancing:

```
Routing Rules decide: openai/gpt-4o
                    ↓
Load Balancing Level 1: Skipped (provider already determined)
                    ↓
Load Balancing Level 2: Selects best OpenAI key based on performance
```

Even when routing rules determine the provider, load balancing Level 2 still optimizes which API key to use within that provider.

### Use Cases

- **Tier-based routing**: Premium users → fast providers
- **Capacity failover**: High budget usage → cheaper providers
- **Team preferences**: Different teams → different providers
- **A/B testing**: Route subset of traffic to test models
- **Regional routing**: EU users → EU providers (data residency)
- **Complex logic**: Combine multiple conditions for sophisticated routing

### Dashboard & API

Routing rules can be configured through:

- **Dashboard**: Visual rule builder with CEL expression editor
- **API**: `POST /api/governance/routing-rules` and related endpoints
- **Scope**: Create rules at global, customer, team, or virtual key levels
- **Priority**: Order rules within scope with numeric priority

For complete documentation, see [Routing Rules Documentation](/providers/routing-rules).

---

## Choosing the Right Approach

1. **Use Governance When:**

   ✅ **Compliance requirements**: Need to ensure data stays in specific regions or providers
   ✅ **Cost optimization**: Want explicit control over traffic distribution to cheaper providers
   ✅ **Budget enforcement**: Need hard limits on spending per provider
   ✅ **Environment separation**: Different teams/apps need different provider access
   ✅ **Rate limit management**: Need to respect provider-specific rate limits

2. **Use Routing Rules When:**

   ✅ **Dynamic routing**: Route based on runtime request context (headers, parameters)
   ✅ **Capacity-aware routing**: Switch to fallback when budget/rate limits high
   ✅ **Organization-based routing**: Different rules for teams/customers
   ✅ **A/B testing**: Route subset of traffic to test new models
   ✅ **Complex conditions**: Multiple criteria (e.g., tier + capacity + team)

3. **Use Load Balancing When:**

   ✅ **Performance optimization**: Want automatic routing to best-performing providers
   ✅ **Minimal configuration**: Prefer hands-off operation with intelligent defaults
   ✅ **Dynamic workloads**: Traffic patterns change frequently
   ✅ **Automatic failover**: Need instant adaptation to provider issues
   ✅ **Multi-provider redundancy**: Want seamless provider switching based on availability

4. **Use All Three Together:**

   ✅ **Complete solution**: Governance provides base routing, routing rules add dynamic override, load balancing optimizes keys
   ✅ **Maximum flexibility**: Different Virtual Keys use different strategies (governance vs routing rules vs load balancing)
   ✅ **Enterprise deployments**: Complex organizations with multiple requirements per layer

---

## Additional Resources

<CardGroup cols={2}>
  <Card title="Governance Routing" icon="shield-check" href="/features/governance/routing">
    Configuration instructions for setting up governance routing via Virtual Keys (Web UI, API, config.json)
  </Card>
  <Card title="Routing Rules" icon="sliders" href="/providers/routing-rules">
    Dynamic, expression-based routing using CEL expressions for runtime conditions
  </Card>
  <Card title="Adaptive Load Balancing" icon="brain" href="/enterprise/adaptive-load-balancing">
    Technical implementation details: scoring algorithms, weight calculations, and performance characteristics
  </Card>
  <Card title="Virtual Keys" icon="key" href="/features/governance/virtual-keys">
    Learn how to create and configure Virtual Keys
  </Card>
  <Card title="Fallbacks" icon="arrow-rotate-right" href="/features/fallbacks">
    Understand how automatic fallbacks work across providers
  </Card>
</CardGroup>