first commit
This commit is contained in:
416
docs/deployment-guides/config-json.mdx
Normal file
416
docs/deployment-guides/config-json.mdx
Normal file
@@ -0,0 +1,416 @@
|
||||
---
|
||||
title: "Quick Start"
|
||||
description: "Configure Bifrost using a config.json file — GitOps-friendly, no-UI deployments, and multinode OSS setups"
|
||||
icon: "file-code"
|
||||
---
|
||||
|
||||
<Note>
|
||||
**Full schema reference:** [`https://www.getbifrost.ai/schema`](https://www.getbifrost.ai/schema)
|
||||
</Note>
|
||||
|
||||
`config.json` lets you configure every aspect of Bifrost through a single declarative file. It is the right choice for GitOps workflows, CI/CD pipelines, headless deployments, and multinode OSS setups where a central configuration file is shared across all replicas.
|
||||
|
||||
---
|
||||
|
||||
## Two Configuration Modes
|
||||
|
||||
Bifrost supports **two mutually exclusive modes**. You cannot run both at the same time.
|
||||
|
||||
| Mode | When | Behaviour |
|
||||
|------|------|-----------|
|
||||
| **Web UI / database** | No `config.json`, or `config.json` with `config_store` enabled | Full UI available, configuration stored in SQLite or PostgreSQL |
|
||||
| **File-based (`config.json`)** | `config.json` present, `config_store` disabled | UI disabled, all config loaded from file at startup, restart required for changes |
|
||||
|
||||
<Note>
|
||||
See [Setting Up](/quickstart/gateway/setting-up#two-configuration-modes) for a full explanation of both modes and how `config_store` bootstrapping works.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Minimal Working Example
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
"client": {
|
||||
"drop_excess_requests": false,
|
||||
"enable_logging": true
|
||||
},
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "openai-primary",
|
||||
"value": "env.OPENAI_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"config_store": {
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Save this as `config.json` in your app directory and start Bifrost:
|
||||
|
||||
```bash
|
||||
# NPX
|
||||
npx -y @maximhq/bifrost -app-dir ./data
|
||||
|
||||
# Docker
|
||||
docker run -p 8080:8080 \
|
||||
-v $(pwd)/data:/app/data \
|
||||
-e OPENAI_API_KEY=sk-... \
|
||||
-e BIFROST_ENCRYPTION_KEY=your-32-byte-key \
|
||||
maximhq/bifrost
|
||||
```
|
||||
|
||||
Make your first call:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "openai/gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variable References
|
||||
|
||||
Never put secrets directly in `config.json`. Use the `env.` prefix to reference any environment variable:
|
||||
|
||||
```json
|
||||
{
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "primary",
|
||||
"value": "env.OPENAI_API_KEY",
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Set the actual values through your deployment platform — shell environment, Docker `-e`, Kubernetes Secrets mounted as env vars, or a `.env` file.
|
||||
|
||||
---
|
||||
|
||||
## Schema Validation
|
||||
|
||||
Add `$schema` to every `config.json` for IDE autocomplete and inline validation:
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema"
|
||||
}
|
||||
```
|
||||
|
||||
Editors (VS Code, JetBrains, Neovim with LSP) will show completions and flag invalid fields as you type.
|
||||
|
||||
---
|
||||
|
||||
## Production Example
|
||||
|
||||
A production-ready file with PostgreSQL storage, multi-provider setup, governance, and common plugins:
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
|
||||
"client": {
|
||||
"initial_pool_size": 500,
|
||||
"drop_excess_requests": true,
|
||||
"enable_logging": true,
|
||||
"log_retention_days": 90,
|
||||
"enforce_auth_on_inference": true,
|
||||
"allow_direct_keys": false,
|
||||
"allowed_origins": ["https://app.yourcompany.com"]
|
||||
},
|
||||
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "openai-primary",
|
||||
"value": "env.OPENAI_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
],
|
||||
"network_config": {
|
||||
"default_request_timeout_in_seconds": 120,
|
||||
"max_retries": 3
|
||||
}
|
||||
},
|
||||
"anthropic": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "anthropic-primary",
|
||||
"value": "env.ANTHROPIC_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require"
|
||||
}
|
||||
},
|
||||
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Enterprise Example: Postgres + etcd + Access Profiles
|
||||
|
||||
Use this pattern when you want enterprise access-profile configuration to be seeded directly from `config.json`, while running clustered nodes with etcd discovery.
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"cluster_config": {
|
||||
"enabled": true,
|
||||
"discovery": {
|
||||
"enabled": true,
|
||||
"type": "etcd",
|
||||
"service_name": "bifrost-cluster",
|
||||
"etcd_endpoints": ["http://localhost:2379"]
|
||||
}
|
||||
},
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "localhost",
|
||||
"port": "5432",
|
||||
"user": "postgres",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost-config",
|
||||
"ssl_mode": "disable"
|
||||
}
|
||||
},
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "localhost",
|
||||
"port": "5432",
|
||||
"user": "postgres",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost-config",
|
||||
"ssl_mode": "disable"
|
||||
}
|
||||
},
|
||||
"mcp": {
|
||||
"client_configs": [
|
||||
{
|
||||
"client_id": "echo_http",
|
||||
"name": "echo_http",
|
||||
"connection_type": "http",
|
||||
"connection_string": "https://mcpplaygroundonline.com/mcp-echo-server",
|
||||
"auth_type": "none",
|
||||
"tools_to_execute": ["echo"]
|
||||
}
|
||||
]
|
||||
},
|
||||
"access_profiles": [
|
||||
{
|
||||
"name": "platform-default",
|
||||
"description": "Default profile for enterprise access-profile testing",
|
||||
"is_active": true,
|
||||
"tags": ["platform", "test"],
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider_name": "OpenAi",
|
||||
"all_models_allowed": false,
|
||||
"allowed_models": ["gpt-4o-mini"]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "platform-readonly-mcp",
|
||||
"description": "Profile for validating MCP include/exclude behavior",
|
||||
"is_active": true,
|
||||
"tags": ["mcp", "test"],
|
||||
"mcp_servers": [
|
||||
{
|
||||
"mcp_server_id": "echo_http"
|
||||
}
|
||||
],
|
||||
"mcp_tool_overrides": [
|
||||
{
|
||||
"mcp_client_id": "echo_http",
|
||||
"tool_name": "echo",
|
||||
"action": "include"
|
||||
},
|
||||
{
|
||||
"mcp_client_id": "github",
|
||||
"tool_name": "create_pull_request",
|
||||
"action": "exclude"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
`access_profiles` is an enterprise capability. For OSS-only deployments, use `governance.virtual_keys` and related governance resources instead.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Example Configs
|
||||
|
||||
Ready-to-use reference configurations from the [examples/configs](https://github.com/maximhq/bifrost/tree/main/examples/configs) directory on GitHub:
|
||||
|
||||
<AccordionGroup>
|
||||
|
||||
<Accordion title="Minimal / File-only">
|
||||
|
||||
| Example | Description |
|
||||
|---------|-------------|
|
||||
| [noconfigstorenologstore](https://github.com/maximhq/bifrost/blob/main/examples/configs/noconfigstorenologstore/config.json) | Bare-minimum file-only mode — no database, no UI, providers loaded from file |
|
||||
| [partial](https://github.com/maximhq/bifrost/blob/main/examples/configs/partial/config.json) | SQLite config store with a minimal provider setup |
|
||||
| [v1compat](https://github.com/maximhq/bifrost/blob/main/examples/configs/v1compat/config.json) | `"version": 1` for v1.4.x array semantics (empty = allow all) |
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Storage">
|
||||
|
||||
| Example | Description |
|
||||
|---------|-------------|
|
||||
| [withconfigstore](https://github.com/maximhq/bifrost/blob/main/examples/configs/withconfigstore/config.json) | SQLite config store (Web UI enabled) |
|
||||
| [withconfigstorelogsstorepostgres](https://github.com/maximhq/bifrost/blob/main/examples/configs/withconfigstorelogsstorepostgres/config.json) | PostgreSQL for both config store and logs store |
|
||||
| [withlogstore](https://github.com/maximhq/bifrost/blob/main/examples/configs/withlogstore/config.json) | SQLite logs store |
|
||||
| [withobjectstorages3](https://github.com/maximhq/bifrost/blob/main/examples/configs/withobjectstorages3/config.json) | S3 object storage offload for logs |
|
||||
| [withobjectstoragegcs](https://github.com/maximhq/bifrost/blob/main/examples/configs/withobjectstoragegcs/config.json) | GCS object storage offload for logs |
|
||||
| [withvectorstoreweaviate](https://github.com/maximhq/bifrost/blob/main/examples/configs/withvectorstoreweaviate/config.json) | Weaviate vector store (with [docker-compose](https://github.com/maximhq/bifrost/blob/main/examples/configs/withvectorstoreweaviate/docker-compose.yml)) |
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Semantic Cache">
|
||||
|
||||
| Example | Description |
|
||||
|---------|-------------|
|
||||
| [withsemanticcache](https://github.com/maximhq/bifrost/blob/main/examples/configs/withsemanticcache/config.json) | Semantic cache backed by Weaviate |
|
||||
| [withsemanticcachevalkey](https://github.com/maximhq/bifrost/blob/main/examples/configs/withsemanticcachevalkey/config.json) | Semantic cache backed by Valkey / Redis |
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Governance">
|
||||
|
||||
| Example | Description |
|
||||
|---------|-------------|
|
||||
| [withauth](https://github.com/maximhq/bifrost/blob/main/examples/configs/withauth/config.json) | Admin username/password auth (`governance.auth_config`) |
|
||||
| [withvirtualkeys](https://github.com/maximhq/bifrost/blob/main/examples/configs/withvirtualkeys/config.json) | Virtual keys with provider/model allowlists |
|
||||
| [withteamscustomers](https://github.com/maximhq/bifrost/blob/main/examples/configs/withteamscustomers/config.json) | Teams and customers with budgets and rate limits |
|
||||
| [withroutingrules](https://github.com/maximhq/bifrost/blob/main/examples/configs/withroutingrules/config.json) | CEL-based routing rules for dynamic provider/model selection |
|
||||
| [withpricingoverridesnostore](https://github.com/maximhq/bifrost/blob/main/examples/configs/withpricingoverridesnostore/config.json) | Pricing overrides in file-only mode |
|
||||
| [withpricingoverridessqlite](https://github.com/maximhq/bifrost/blob/main/examples/configs/withpricingoverridessqlite/config.json) | Pricing overrides with SQLite config store |
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Observability">
|
||||
|
||||
| Example | Description |
|
||||
|---------|-------------|
|
||||
| [withobservability](https://github.com/maximhq/bifrost/blob/main/examples/configs/withobservability/config.json) | Prometheus metrics (telemetry always active, custom labels via `client.prometheus_labels`) |
|
||||
| [withprompushgateway](https://github.com/maximhq/bifrost/blob/main/examples/configs/withprompushgateway/config.json) | Prometheus Push Gateway for multi-instance deployments |
|
||||
| [withotel](https://github.com/maximhq/bifrost/blob/main/examples/configs/withotel/config.json) | OpenTelemetry traces and metrics |
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Plugins & Advanced">
|
||||
|
||||
| Example | Description |
|
||||
|---------|-------------|
|
||||
| [withdynamicplugin](https://github.com/maximhq/bifrost/blob/main/examples/configs/withdynamicplugin/config.json) | Loading a custom `.so` plugin at startup |
|
||||
| [withcompat](https://github.com/maximhq/bifrost/blob/main/examples/configs/withcompat/config.json) | SDK compatibility shims (`should_drop_params`, `convert_text_to_chat`) |
|
||||
| [withframework](https://github.com/maximhq/bifrost/blob/main/examples/configs/withframework/config.json) | Custom model pricing catalog URL and sync interval |
|
||||
| [withlargepayload](https://github.com/maximhq/bifrost/blob/main/examples/configs/withlargepayload/config.json) | Large payload optimization (streaming without full materialisation) |
|
||||
| [withwebsocket](https://github.com/maximhq/bifrost/blob/main/examples/configs/withwebsocket/config.json) | WebSocket / Realtime API connection pool tuning |
|
||||
| [withnginxreverseproxy](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/config.json) | 3-node Bifrost behind NGINX reverse proxy (includes [docker-compose](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/docker-compose.yml), [nginx.conf](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/nginx.conf), [helm values](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/helm-values.yaml), and [k8s ingress](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/k8s-ingress.yaml)) |
|
||||
| [withpostgresmcpclientsinconfig](https://github.com/maximhq/bifrost/blob/main/examples/configs/withpostgresmcpclientsinconfig/config.json) | MCP client definitions seeded from config.json with PostgreSQL store |
|
||||
| [encryptionmigration](https://github.com/maximhq/bifrost/blob/main/examples/configs/encryptionmigration/config.json) | Migrating to a new encryption key |
|
||||
|
||||
</Accordion>
|
||||
|
||||
</AccordionGroup>
|
||||
|
||||
---
|
||||
|
||||
## Configuration Guides
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Schema Reference" icon="brackets-curly" href="/deployment-guides/config-json/schema-reference">
|
||||
Every top-level key, its type, default, and where it is documented
|
||||
</Card>
|
||||
<Card title="Client Configuration" icon="gear" href="/deployment-guides/config-json/client">
|
||||
Pool size, logging, CORS, header filtering, compat shims, MCP settings
|
||||
</Card>
|
||||
<Card title="Provider Setup" icon="plug" href="/deployment-guides/config-json/providers">
|
||||
OpenAI, Anthropic, Azure, Bedrock, Vertex, Groq, self-hosted
|
||||
</Card>
|
||||
<Card title="Storage" icon="database" href="/deployment-guides/config-json/storage">
|
||||
config_store, logs_store, vector_store — SQLite, PostgreSQL, object storage
|
||||
</Card>
|
||||
<Card title="Plugins" icon="puzzle-piece" href="/deployment-guides/config-json/plugins">
|
||||
Semantic cache, OTel, Maxim, Datadog, custom plugins
|
||||
</Card>
|
||||
<Card title="Cluster" icon="circle-nodes" href="/deployment-guides/config-json/cluster">
|
||||
Cluster mode with static peers or discovery backends (enterprise)
|
||||
</Card>
|
||||
<Card title="Governance" icon="shield-check" href="/deployment-guides/config-json/governance">
|
||||
Virtual keys, budgets, rate limits, routing rules, admin auth
|
||||
</Card>
|
||||
<Card title="Guardrails" icon="shield-halved" href="/deployment-guides/config-json/guardrails">
|
||||
Content moderation providers and CEL-based rules (enterprise)
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Configure [provider keys](/providers/supported-providers/overview)
|
||||
2. Enable [plugins](/plugins/getting-started)
|
||||
3. Set up [observability](/features/observability/default)
|
||||
4. Configure [governance](/features/governance/virtual-keys)
|
||||
5. Deploy [multiple nodes](/deployment-guides/how-to/multinode) with a shared `config.json`
|
||||
276
docs/deployment-guides/config-json/client.mdx
Normal file
276
docs/deployment-guides/config-json/client.mdx
Normal file
@@ -0,0 +1,276 @@
|
||||
---
|
||||
title: "Client Configuration"
|
||||
description: "Configure the Bifrost client in config.json — connection pool, logging, CORS, header filtering, compat shims, and MCP settings"
|
||||
icon: "gear"
|
||||
---
|
||||
|
||||
The `client` block controls how Bifrost manages its internal worker pool, request logging, authentication enforcement, header policies, SDK compatibility shims, and MCP agent behaviour.
|
||||
|
||||
---
|
||||
|
||||
## Connection Pool
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `initial_pool_size` | integer | `300` | Pre-allocated worker goroutines per provider queue |
|
||||
| `drop_excess_requests` | boolean | `false` | Drop requests when queue is full instead of waiting (returns HTTP 429) |
|
||||
|
||||
A larger pool reduces latency spikes under burst load at the cost of higher baseline memory. `500–1000` is a common starting point for production workloads with multiple providers.
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"initial_pool_size": 1000,
|
||||
"drop_excess_requests": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Request & Response Logging
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `enable_logging` | boolean | — | Log all LLM requests and responses |
|
||||
| `disable_content_logging` | boolean | `false` | Strip message content from logs (keeps metadata only) |
|
||||
| `log_retention_days` | integer | `365` | Days to retain log entries in the store |
|
||||
| `logging_headers` | array of strings | `[]` | HTTP request headers to capture in log metadata |
|
||||
|
||||
Set `disable_content_logging: true` for HIPAA / PCI compliance workloads where message content must not be persisted.
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"enable_logging": true,
|
||||
"disable_content_logging": true,
|
||||
"log_retention_days": 90,
|
||||
"logging_headers": ["x-request-id", "x-user-id"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security & CORS
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `allowed_origins` | array | `["*"]` | CORS allowed origins (use URIs or `"*"`) |
|
||||
| `allow_direct_keys` | boolean | `false` | Allow callers to pass provider keys directly in requests |
|
||||
| `enforce_auth_on_inference` | boolean | `false` | Require auth (virtual key, API key, or user token) on `/v1/*` inference routes |
|
||||
| `max_request_body_size_mb` | integer | `100` | Maximum allowed request body size in MB |
|
||||
| `whitelisted_routes` | array of strings | `[]` | Routes that bypass auth middleware |
|
||||
| `allowed_headers` | array of strings | `[]` | Additional headers permitted for CORS and WebSocket |
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"allowed_origins": [
|
||||
"https://app.yourcompany.com",
|
||||
"https://admin.yourcompany.com"
|
||||
],
|
||||
"allow_direct_keys": false,
|
||||
"enforce_auth_on_inference": true,
|
||||
"max_request_body_size_mb": 50,
|
||||
"whitelisted_routes": ["/health", "/metrics"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Header Filtering
|
||||
|
||||
Controls which `x-bf-eh-*` extra headers are forwarded to upstream LLM providers.
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `header_filter_config.allowlist` | array of strings | `[]` | Only these headers are forwarded (whitelist mode) |
|
||||
| `header_filter_config.denylist` | array of strings | `[]` | These headers are always blocked |
|
||||
| `required_headers` | array of strings | `[]` | Headers that must be present on every request (rejected with 400 if missing) |
|
||||
|
||||
When both `allowlist` and `denylist` are empty, all `x-bf-eh-*` headers pass through. Specifying an `allowlist` enables strict whitelist mode — only listed headers are forwarded.
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"header_filter_config": {
|
||||
"allowlist": [
|
||||
"x-bf-eh-anthropic-version",
|
||||
"x-bf-eh-openai-beta"
|
||||
],
|
||||
"denylist": []
|
||||
},
|
||||
"required_headers": ["x-request-id"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compat Shims
|
||||
|
||||
Compatibility flags that let Bifrost silently adapt request/response shapes for SDK integrations.
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `compat.convert_text_to_chat` | boolean | `false` | Wrap legacy `/v1/completions` text requests as chat messages |
|
||||
| `compat.convert_chat_to_responses` | boolean | `false` | Translate chat completions to Responses API format |
|
||||
| `compat.should_drop_params` | boolean | `false` | Silently drop unsupported parameters instead of erroring |
|
||||
| `compat.should_convert_params` | boolean | `false` | Auto-convert parameter values across provider schemas |
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"compat": {
|
||||
"should_drop_params": true,
|
||||
"convert_text_to_chat": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MCP Agent Settings
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `mcp_agent_depth` | integer | `10` | Maximum tool-call recursion depth for MCP agent mode |
|
||||
| `mcp_tool_execution_timeout` | integer | `30` | Timeout per MCP tool execution in seconds |
|
||||
| `mcp_code_mode_binding_level` | string | — | Code mode binding level: `"server"` or `"tool"` |
|
||||
| `mcp_tool_sync_interval` | integer | `10` | Global tool sync interval in minutes (`0` = disabled) |
|
||||
| `mcp_disable_auto_tool_inject` | boolean | `false` | When `true`, MCP tools are not automatically injected into requests |
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"mcp_agent_depth": 15,
|
||||
"mcp_tool_execution_timeout": 60,
|
||||
"mcp_tool_sync_interval": 10
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Async Jobs
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `async_job_result_ttl` | integer | `3600` | TTL (seconds) for async job results |
|
||||
| `disable_db_pings_in_health` | boolean | `false` | Exclude database connectivity from `/health` endpoint checks |
|
||||
|
||||
---
|
||||
|
||||
## Prometheus Labels
|
||||
|
||||
Add custom labels to every Prometheus metric emitted by Bifrost:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"prometheus_labels": ["environment=production", "region=us-east-1"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
`governance.auth_config` protects the Bifrost dashboard and management API with username/password auth.
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `is_enabled` | boolean | `false` | Enable username/password auth |
|
||||
| `admin_username` | string | — | Admin username |
|
||||
| `admin_password` | string | — | Admin password (use `env.` reference) |
|
||||
| `disable_auth_on_inference` | boolean | `false` | Skip auth check on `/v1/*` inference routes |
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"auth_config": {
|
||||
"is_enabled": true,
|
||||
"admin_username": "env.BIFROST_ADMIN_USERNAME",
|
||||
"admin_password": "env.BIFROST_ADMIN_PASSWORD",
|
||||
"disable_auth_on_inference": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
A top-level `auth_config` is also accepted for backwards compatibility, but `governance.auth_config` is the preferred location.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Encryption Key
|
||||
|
||||
```json
|
||||
{
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY"
|
||||
}
|
||||
```
|
||||
|
||||
| Notes |
|
||||
|-------|
|
||||
| Accepts any string; Bifrost derives a 32-byte AES-256 key using Argon2id |
|
||||
| Can also be set via the `BIFROST_ENCRYPTION_KEY` environment variable |
|
||||
| Once set and the database is populated, the key cannot be changed without clearing the database |
|
||||
| Omitting the key stores data in plain text — not recommended for production |
|
||||
|
||||
---
|
||||
|
||||
## Full Example
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
|
||||
"governance": {
|
||||
"auth_config": {
|
||||
"is_enabled": true,
|
||||
"admin_username": "env.BIFROST_ADMIN_USERNAME",
|
||||
"admin_password": "env.BIFROST_ADMIN_PASSWORD",
|
||||
"disable_auth_on_inference": false
|
||||
}
|
||||
},
|
||||
|
||||
"client": {
|
||||
"initial_pool_size": 1000,
|
||||
"drop_excess_requests": true,
|
||||
|
||||
"enable_logging": true,
|
||||
"disable_content_logging": false,
|
||||
"log_retention_days": 90,
|
||||
"logging_headers": ["x-request-id", "x-user-id"],
|
||||
|
||||
"allowed_origins": ["https://app.yourcompany.com"],
|
||||
"allow_direct_keys": false,
|
||||
"enforce_auth_on_inference": true,
|
||||
"max_request_body_size_mb": 100,
|
||||
|
||||
"header_filter_config": {
|
||||
"allowlist": [],
|
||||
"denylist": []
|
||||
},
|
||||
"required_headers": [],
|
||||
|
||||
"compat": {
|
||||
"should_drop_params": false
|
||||
},
|
||||
|
||||
"prometheus_labels": ["environment=production"],
|
||||
|
||||
"mcp_agent_depth": 10,
|
||||
"mcp_tool_execution_timeout": 30,
|
||||
|
||||
"async_job_result_ttl": 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
154
docs/deployment-guides/config-json/cluster.mdx
Normal file
154
docs/deployment-guides/config-json/cluster.mdx
Normal file
@@ -0,0 +1,154 @@
|
||||
---
|
||||
title: "Cluster"
|
||||
description: "Configure enterprise cluster mode in config.json using peers or automatic discovery"
|
||||
icon: "circle-nodes"
|
||||
---
|
||||
|
||||
<Warning>
|
||||
`cluster_config` is an enterprise capability. OSS builds ignore this section.
|
||||
|
||||
</Warning>
|
||||
|
||||
`cluster_config` enables multi-node Bifrost enterprise clustering with gossip-based membership and optional automatic node discovery.
|
||||
|
||||
You can form a cluster in two ways:
|
||||
|
||||
- Define static `peers` (`host:port`)
|
||||
- Enable `discovery` with one of: `kubernetes`, `dns`, `udp`, `consul`, `etcd`, `mdns`
|
||||
|
||||
<Tip>
|
||||
At least one of `peers` or `discovery.enabled: true` must be configured when `cluster_config.enabled` is true.
|
||||
</Tip>
|
||||
|
||||
---
|
||||
|
||||
## Minimal Runnable Configs
|
||||
|
||||
```json
|
||||
{
|
||||
"cluster_config": {
|
||||
"enabled": true,
|
||||
"discovery": {
|
||||
"enabled": true,
|
||||
"type": "mdns",
|
||||
"service_name": "bifrost-cluster"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Use this for local testing. At startup, cluster init requires either:
|
||||
|
||||
- non-empty `peers`, or
|
||||
- `discovery.enabled: true`
|
||||
|
||||
If neither is set, cluster initialization fails.
|
||||
|
||||
---
|
||||
|
||||
## Static Peers
|
||||
|
||||
```json
|
||||
{
|
||||
"cluster_config": {
|
||||
"enabled": true,
|
||||
"region": "us-east-1",
|
||||
"peers": [
|
||||
"10.0.1.10:10101",
|
||||
"10.0.1.11:10101"
|
||||
],
|
||||
"gossip": {
|
||||
"port": 10101,
|
||||
"config": {
|
||||
"timeout_seconds": 10,
|
||||
"success_threshold": 3,
|
||||
"failure_threshold": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Discovery Example (etcd)
|
||||
|
||||
```json
|
||||
{
|
||||
"cluster_config": {
|
||||
"enabled": true,
|
||||
"region": "us-east-1",
|
||||
"gossip": {
|
||||
"port": 10101,
|
||||
"config": {
|
||||
"timeout_seconds": 10,
|
||||
"success_threshold": 3,
|
||||
"failure_threshold": 3
|
||||
}
|
||||
},
|
||||
"discovery": {
|
||||
"enabled": true,
|
||||
"type": "etcd",
|
||||
"service_name": "bifrost-cluster",
|
||||
"etcd_endpoints": [
|
||||
"http://etcd-1:2379",
|
||||
"http://etcd-2:2379"
|
||||
],
|
||||
"dial_timeout": "10s"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Field Reference
|
||||
|
||||
### `cluster_config`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `enabled` | boolean | Enables cluster mode |
|
||||
| `region` | string | Region label for this node (defaults to `"unknown"` at runtime when omitted) |
|
||||
| `peers` | array of strings | Static peer addresses in `host:port` format |
|
||||
| `gossip` | object | Gossip/memberlist settings |
|
||||
| `discovery` | object | Automatic node discovery settings |
|
||||
|
||||
### `cluster_config.gossip`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `port` | integer | Gossip port for this node |
|
||||
| `config.timeout_seconds` | integer | Liveness timeout |
|
||||
| `config.success_threshold` | integer | Success count before healthy |
|
||||
| `config.failure_threshold` | integer | Failure count before unhealthy |
|
||||
|
||||
### `cluster_config.discovery`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `enabled` | boolean | Enables discovery process |
|
||||
| `type` | string | `kubernetes`, `dns`, `udp`, `consul`, `etcd`, `mdns` |
|
||||
| `service_name` | string | Service identifier (required for `consul`, `etcd`, `udp`, typically `mdns`; optional for `kubernetes` and `dns`) |
|
||||
| `bind_port` | integer | Port appended to discovered hosts if missing |
|
||||
| `dial_timeout` | string | Go duration string (`"5s"`, `"30s"`, `"1m"`) |
|
||||
| `allowed_address_space` | array of strings | CIDR filters for discovered nodes |
|
||||
| `k8s_namespace` | string | Kubernetes namespace for pod discovery |
|
||||
| `k8s_label_selector` | string | Kubernetes label selector |
|
||||
| `dns_names` | array of strings | DNS names to resolve |
|
||||
| `udp_broadcast_port` | integer | UDP broadcast port (required for `udp`) |
|
||||
| `consul_address` | string | Consul address |
|
||||
| `etcd_endpoints` | array of strings | etcd endpoint URLs |
|
||||
| `mdns_service` | string | Optional mDNS service type override (e.g. `"_bifrost-cluster._tcp"`) |
|
||||
|
||||
<Note>
|
||||
For `discovery.type: "mdns"`, `service_name` is sufficient for most setups. When `mdns_service` is omitted, Bifrost derives the mDNS service type as `"_<service_name>._tcp"`. If you set `mdns_service`, it **overrides** the derived value and is used for both mDNS registration and browsing.
|
||||
</Note>
|
||||
|
||||
<Warning>
|
||||
For `discovery.type: "udp"`, configure both `udp_broadcast_port` and `allowed_address_space`.
|
||||
</Warning>
|
||||
|
||||
---
|
||||
|
||||
For discovery-method deep dives and deployment patterns, see [Enterprise Clustering](/enterprise/clustering).
|
||||
333
docs/deployment-guides/config-json/governance.mdx
Normal file
333
docs/deployment-guides/config-json/governance.mdx
Normal file
@@ -0,0 +1,333 @@
|
||||
---
|
||||
title: "Governance"
|
||||
description: "Seed virtual keys, budgets, rate limits, routing rules, and admin auth in config.json"
|
||||
icon: "shield-check"
|
||||
---
|
||||
|
||||
The `governance` block lets you seed all governance resources directly in `config.json`. On startup, Bifrost loads these into the configuration store. This is the recommended approach for GitOps workflows where governance state is managed as code.
|
||||
|
||||
<Note>
|
||||
**Governance enforcement is always active** in OSS — you do not need a plugin entry to enable it. To require a virtual key on every inference request, set `client.enforce_auth_on_inference: true`. This is the global default, but a more specific inference-auth flag such as `governance.auth_config.disable_auth_on_inference` overrides it; if no specific override is set, `client.enforce_auth_on_inference` applies.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Admin Authentication
|
||||
|
||||
Protect the Bifrost dashboard and management API with username/password auth:
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"auth_config": {
|
||||
"is_enabled": true,
|
||||
"admin_username": "env.BIFROST_ADMIN_USERNAME",
|
||||
"admin_password": "env.BIFROST_ADMIN_PASSWORD",
|
||||
"disable_auth_on_inference": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `is_enabled` | `false` | Enable admin username/password auth |
|
||||
| `admin_username` | — | Admin username (supports `env.` prefix) |
|
||||
| `admin_password` | — | Admin password (supports `env.` prefix) |
|
||||
| `disable_auth_on_inference` | `false` | Skip auth check on `/v1/*` inference routes |
|
||||
|
||||
---
|
||||
|
||||
## Virtual Keys
|
||||
|
||||
Virtual keys are issued to clients and act as scoped API tokens. Each key specifies which providers, models, and API keys the bearer is allowed to use.
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-team-platform",
|
||||
"name": "platform-team",
|
||||
"value": "env.VK_PLATFORM_TEAM",
|
||||
"is_active": true,
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"allowed_models": ["gpt-4o", "gpt-4o-mini"],
|
||||
"key_ids": ["*"],
|
||||
"weight": 1
|
||||
},
|
||||
{
|
||||
"provider": "anthropic",
|
||||
"allowed_models": ["*"],
|
||||
"key_ids": ["*"],
|
||||
"weight": 1
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Virtual Key Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Yes | Unique virtual key ID (referenced by budgets / rate limits) |
|
||||
| `name` | Yes | Human-readable name |
|
||||
| `value` | No | The key token sent by clients (use `env.` prefix). Auto-generated if omitted |
|
||||
| `is_active` | No | Default `true`. Set `false` to disable without deleting |
|
||||
| `team_id` | No | Associate with a team (mutually exclusive with `customer_id`) |
|
||||
| `customer_id` | No | Associate with a customer |
|
||||
| `rate_limit_id` | No | Attach a rate limit |
|
||||
| `calendar_aligned` | No | Snap budget resets to day/week/month/year boundaries |
|
||||
| `provider_configs` | No | Allowed provider/model/key combinations (empty = deny all) |
|
||||
|
||||
### Provider Config Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `provider` | Yes | Provider name (e.g. `"openai"`) |
|
||||
| `allowed_models` | No | Model allow-list. `["*"]` = all models; `[]` = deny all |
|
||||
| `key_ids` | No | Provider key names allowed for this VK. `["*"]` = all keys; `[]` = deny all. Use key `name` values (not UUIDs) in `config.json` |
|
||||
| `weight` | No | Load-balancing weight when multiple provider configs are present |
|
||||
| `rate_limit_id` | No | Attach a per-provider-config rate limit |
|
||||
|
||||
---
|
||||
|
||||
## Budgets
|
||||
|
||||
Budgets cap cumulative spend (in USD) for a virtual key or provider config over a rolling window:
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"budgets": [
|
||||
{
|
||||
"id": "budget-platform-monthly",
|
||||
"max_limit": 500.00,
|
||||
"reset_duration": "1M",
|
||||
"virtual_key_id": "vk-team-platform"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Yes | Unique budget ID |
|
||||
| `max_limit` | Yes | Maximum spend in USD |
|
||||
| `reset_duration` | Yes | Window length: `"30s"`, `"5m"`, `"1h"`, `"1d"`, `"1w"`, `"1M"`, `"1Y"` |
|
||||
| `virtual_key_id` | No | Attach to a virtual key (mutually exclusive with `provider_config_id`) |
|
||||
| `provider_config_id` | No | Attach to a provider config ID |
|
||||
|
||||
---
|
||||
|
||||
## Rate Limits
|
||||
|
||||
Rate limits cap requests or tokens over a rolling window:
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"rate_limits": [
|
||||
{
|
||||
"id": "rl-platform-hourly",
|
||||
"request_max_limit": 1000,
|
||||
"request_reset_duration": "1h",
|
||||
"token_max_limit": 1000000,
|
||||
"token_reset_duration": "1h"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Yes | Unique rate limit ID |
|
||||
| `request_max_limit` | No | Maximum requests in window |
|
||||
| `request_reset_duration` | No | Window for request counter |
|
||||
| `token_max_limit` | No | Maximum tokens (input + output) in window |
|
||||
| `token_reset_duration` | No | Window for token counter |
|
||||
|
||||
Attach a rate limit to a virtual key via `virtual_keys[].rate_limit_id`, or to a provider config via `virtual_keys[].provider_configs[].rate_limit_id`.
|
||||
|
||||
---
|
||||
|
||||
## Routing Rules
|
||||
|
||||
Routing rules dynamically select the provider and model for each request based on a [CEL](https://cel.dev) expression. They are evaluated in priority order before the request is dispatched.
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"routing_rules": [
|
||||
{
|
||||
"id": "route-gpt4-to-azure",
|
||||
"name": "Redirect GPT-4o to Azure",
|
||||
"cel_expression": "request.model == 'gpt-4o'",
|
||||
"targets": [
|
||||
{ "provider": "azure", "model": "gpt-4o", "weight": 1.0 }
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "route-cost-split",
|
||||
"name": "Split traffic 70/30 between providers",
|
||||
"cel_expression": "true",
|
||||
"targets": [
|
||||
{ "provider": "openai", "weight": 0.7 },
|
||||
{ "provider": "anthropic", "weight": 0.3 }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Rule Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Yes | Unique rule ID |
|
||||
| `name` | Yes | Human-readable name |
|
||||
| `cel_expression` | No | CEL expression. `"true"` matches every request |
|
||||
| `targets` | Yes | Weighted target list (weights must sum to `1.0`) |
|
||||
| `enabled` | No | Default `true` |
|
||||
| `priority` | No | Evaluation order within scope — lower numbers run first |
|
||||
| `scope` | No | `"global"` (default), `"team"`, `"customer"`, `"virtual_key"` |
|
||||
| `scope_id` | Conditional | Required when `scope` is not `"global"` |
|
||||
| `chain_rule` | No | If `true`, re-evaluates the chain after this rule matches |
|
||||
| `fallbacks` | No | Ordered fallback provider list if primary target fails |
|
||||
|
||||
### Target Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `weight` | Yes | Fraction of traffic (all weights in a rule must sum to `1.0`) |
|
||||
| `provider` | No | Target provider. Omit to keep the incoming request's provider |
|
||||
| `model` | No | Target model. Omit to keep the incoming request's model |
|
||||
| `key_id` | No | Pin a specific API key by name |
|
||||
|
||||
---
|
||||
|
||||
## Customers & Teams
|
||||
|
||||
Define organizational entities and attach budgets or rate limits to them:
|
||||
|
||||
```json
|
||||
{
|
||||
"governance": {
|
||||
"customers": [
|
||||
{
|
||||
"id": "customer-acme",
|
||||
"name": "Acme Corp",
|
||||
"budget_id": "budget-acme-monthly",
|
||||
"rate_limit_id": "rl-acme-hourly"
|
||||
}
|
||||
],
|
||||
"teams": [
|
||||
{
|
||||
"id": "team-ml",
|
||||
"name": "ML Team",
|
||||
"customer_id": "customer-acme",
|
||||
"budget_id": "budget-team-ml"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Governance Example
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
|
||||
"client": {
|
||||
"enforce_auth_on_inference": true
|
||||
},
|
||||
|
||||
"governance": {
|
||||
"auth_config": {
|
||||
"is_enabled": true,
|
||||
"admin_username": "env.BIFROST_ADMIN_USERNAME",
|
||||
"admin_password": "env.BIFROST_ADMIN_PASSWORD"
|
||||
},
|
||||
|
||||
"budgets": [
|
||||
{
|
||||
"id": "budget-platform",
|
||||
"max_limit": 1000.00,
|
||||
"reset_duration": "1M",
|
||||
"virtual_key_id": "vk-platform"
|
||||
}
|
||||
],
|
||||
|
||||
"rate_limits": [
|
||||
{
|
||||
"id": "rl-platform",
|
||||
"request_max_limit": 5000,
|
||||
"request_reset_duration": "1h",
|
||||
"token_max_limit": 5000000,
|
||||
"token_reset_duration": "1h"
|
||||
}
|
||||
],
|
||||
|
||||
"virtual_keys": [
|
||||
{
|
||||
"id": "vk-platform",
|
||||
"name": "platform-key",
|
||||
"value": "env.VK_PLATFORM",
|
||||
"is_active": true,
|
||||
"rate_limit_id": "rl-platform",
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider": "openai",
|
||||
"allowed_models": ["*"],
|
||||
"key_ids": ["*"],
|
||||
"weight": 1
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
"routing_rules": [
|
||||
{
|
||||
"id": "fallback-to-anthropic",
|
||||
"name": "Fallback on error",
|
||||
"cel_expression": "true",
|
||||
"targets": [{ "provider": "openai", "weight": 1.0 }],
|
||||
"fallbacks": ["anthropic"]
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [{ "name": "openai-primary", "value": "env.OPENAI_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
},
|
||||
"anthropic": {
|
||||
"keys": [{ "name": "anthropic-primary", "value": "env.ANTHROPIC_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
}
|
||||
},
|
||||
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
291
docs/deployment-guides/config-json/guardrails.mdx
Normal file
291
docs/deployment-guides/config-json/guardrails.mdx
Normal file
@@ -0,0 +1,291 @@
|
||||
---
|
||||
title: "Guardrails"
|
||||
description: "Configure content moderation and policy enforcement in config.json using guardrails_config"
|
||||
icon: "shield-halved"
|
||||
---
|
||||
|
||||
<Note>
|
||||
Guardrails are an **enterprise-only** feature and require the enterprise Bifrost image.
|
||||
</Note>
|
||||
|
||||
Guardrails are configured under `guardrails_config` in `config.json`. The configuration has two parts:
|
||||
|
||||
- **`guardrail_providers`** — the backend that performs the check. Rules link to providers by `id`.
|
||||
- **`guardrail_rules`** — CEL expressions that control when and where providers are invoked.
|
||||
|
||||
---
|
||||
|
||||
## Providers
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Regex">
|
||||
|
||||
Runs entirely in-process with no external dependency. Patterns use RE2 syntax. Supports optional per-pattern flags: `i` (case-insensitive), `m` (multiline), `s` (dot-all).
|
||||
|
||||
```json
|
||||
{
|
||||
"guardrails_config": {
|
||||
"guardrail_providers": [
|
||||
{
|
||||
"id": 1,
|
||||
"provider_name": "regex",
|
||||
"policy_name": "block-secrets",
|
||||
"enabled": true,
|
||||
"timeout": 5,
|
||||
"config": {
|
||||
"patterns": [
|
||||
{ "pattern": "sk-[A-Za-z0-9]{20,}", "description": "OpenAI API key" },
|
||||
{ "pattern": "AKIA[0-9A-Z]{16}", "description": "AWS access key" },
|
||||
{ "pattern": "gh[ps]_[A-Za-z0-9]{36}", "description": "GitHub token", "flags": "i" }
|
||||
],
|
||||
"mode": "block"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="AWS Bedrock">
|
||||
|
||||
```json
|
||||
{
|
||||
"guardrails_config": {
|
||||
"guardrail_providers": [
|
||||
{
|
||||
"id": 2,
|
||||
"provider_name": "bedrock",
|
||||
"policy_name": "content-filter",
|
||||
"enabled": true,
|
||||
"timeout": 15,
|
||||
"config": {
|
||||
"guardrail_arn": "arn:aws:bedrock:us-east-1::guardrail/abc123",
|
||||
"guardrail_version": "DRAFT",
|
||||
"region": "us-east-1",
|
||||
"access_key": "env.AWS_ACCESS_KEY_ID",
|
||||
"secret_key": "env.AWS_SECRET_ACCESS_KEY"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Azure Content Safety">
|
||||
|
||||
```json
|
||||
{
|
||||
"guardrails_config": {
|
||||
"guardrail_providers": [
|
||||
{
|
||||
"id": 3,
|
||||
"provider_name": "azure",
|
||||
"policy_name": "azure-content-safety",
|
||||
"enabled": true,
|
||||
"timeout": 10,
|
||||
"config": {
|
||||
"endpoint": "https://your-resource.cognitiveservices.azure.com",
|
||||
"api_key": "env.AZURE_CONTENT_SAFETY_KEY",
|
||||
"analyze_enabled": true,
|
||||
"analyze_severity_threshold": "medium",
|
||||
"jailbreak_shield_enabled": true,
|
||||
"indirect_attack_shield_enabled": true,
|
||||
"copyright_enabled": false,
|
||||
"text_blocklist_enabled": false,
|
||||
"blocklist_names": []
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`analyze_severity_threshold` accepts `"low"`, `"medium"`, or `"high"`.
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gray Swan">
|
||||
|
||||
```json
|
||||
{
|
||||
"guardrails_config": {
|
||||
"guardrail_providers": [
|
||||
{
|
||||
"id": 4,
|
||||
"provider_name": "grayswan",
|
||||
"policy_name": "grayswan-jailbreak",
|
||||
"enabled": true,
|
||||
"timeout": 15,
|
||||
"config": {
|
||||
"api_key": "env.GRAYSWAN_API_KEY",
|
||||
"violation_threshold": 0.7,
|
||||
"reasoning_mode": "standard",
|
||||
"policy_id": "",
|
||||
"policy_ids": [],
|
||||
"rules": {}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Provider Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Yes | Unique integer ID — referenced by rules via `provider_config_ids` |
|
||||
| `provider_name` | Yes | Backend: `"regex"`, `"bedrock"`, `"azure"`, `"grayswan"` |
|
||||
| `policy_name` | Yes | Human-readable policy label |
|
||||
| `enabled` | Yes | `true` to activate |
|
||||
| `timeout` | No | Execution timeout in seconds |
|
||||
| `config` | No | Provider-specific configuration object |
|
||||
|
||||
---
|
||||
|
||||
## Rules
|
||||
|
||||
Rules are CEL expressions that fire when their condition matches. Available CEL variables:
|
||||
|
||||
| Variable | Type | Description |
|
||||
|----------|------|-------------|
|
||||
| `model` | `string` | Model name from the request |
|
||||
| `provider` | `string` | Provider name (e.g. `"openai"`) |
|
||||
| `headers` | `map<string,string>` | HTTP request headers |
|
||||
| `params` | `map<string,string>` | Query parameters |
|
||||
| `customer` | `string` | Customer ID |
|
||||
| `team` | `string` | Team ID |
|
||||
| `user` | `string` | User ID |
|
||||
|
||||
```json
|
||||
{
|
||||
"guardrails_config": {
|
||||
"guardrail_rules": [
|
||||
{
|
||||
"id": 101,
|
||||
"name": "block-secrets-input",
|
||||
"description": "Block prompts containing credentials",
|
||||
"enabled": true,
|
||||
"cel_expression": "true",
|
||||
"apply_to": "input",
|
||||
"sampling_rate": 100,
|
||||
"timeout": 10,
|
||||
"provider_config_ids": [1]
|
||||
},
|
||||
{
|
||||
"id": 102,
|
||||
"name": "content-safety-gpt4o-output",
|
||||
"enabled": true,
|
||||
"cel_expression": "model == 'gpt-4o'",
|
||||
"apply_to": "output",
|
||||
"sampling_rate": 100,
|
||||
"timeout": 15,
|
||||
"provider_config_ids": [3]
|
||||
},
|
||||
{
|
||||
"id": 103,
|
||||
"name": "grayswan-openai-partial",
|
||||
"enabled": true,
|
||||
"cel_expression": "provider == 'openai'",
|
||||
"apply_to": "input",
|
||||
"sampling_rate": 50,
|
||||
"timeout": 20,
|
||||
"provider_config_ids": [4]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Rule Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Yes | Unique integer ID |
|
||||
| `name` | Yes | Human-readable name |
|
||||
| `description` | No | Optional description |
|
||||
| `enabled` | Yes | `true` to activate |
|
||||
| `cel_expression` | Yes | CEL boolean expression. `"true"` matches every request |
|
||||
| `apply_to` | Yes | `"input"`, `"output"`, or `"both"` |
|
||||
| `sampling_rate` | No | `0`–`100`; percentage of requests to evaluate (default: `100`) |
|
||||
| `timeout` | No | Rule timeout in seconds |
|
||||
| `provider_config_ids` | No | `id` values of providers to invoke when this rule matches. Multiple providers run in parallel |
|
||||
|
||||
---
|
||||
|
||||
## Full Example
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [{ "name": "primary", "value": "env.OPENAI_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
}
|
||||
},
|
||||
|
||||
"guardrails_config": {
|
||||
"guardrail_providers": [
|
||||
{
|
||||
"id": 1,
|
||||
"provider_name": "regex",
|
||||
"policy_name": "block-secrets",
|
||||
"enabled": true,
|
||||
"timeout": 5,
|
||||
"config": {
|
||||
"patterns": [
|
||||
{ "pattern": "sk-[A-Za-z0-9]{20,}", "description": "OpenAI API key" },
|
||||
{ "pattern": "AKIA[0-9A-Z]{16}", "description": "AWS access key" }
|
||||
],
|
||||
"mode": "block"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"provider_name": "azure",
|
||||
"policy_name": "content-safety",
|
||||
"enabled": true,
|
||||
"timeout": 10,
|
||||
"config": {
|
||||
"endpoint": "https://your-resource.cognitiveservices.azure.com",
|
||||
"api_key": "env.AZURE_CONTENT_SAFETY_KEY",
|
||||
"analyze_enabled": true,
|
||||
"analyze_severity_threshold": "medium",
|
||||
"jailbreak_shield_enabled": true,
|
||||
"indirect_attack_shield_enabled": false
|
||||
}
|
||||
}
|
||||
],
|
||||
"guardrail_rules": [
|
||||
{
|
||||
"id": 101,
|
||||
"name": "block-secrets-input",
|
||||
"description": "Block prompts leaking credentials",
|
||||
"enabled": true,
|
||||
"cel_expression": "true",
|
||||
"apply_to": "input",
|
||||
"sampling_rate": 100,
|
||||
"timeout": 10,
|
||||
"provider_config_ids": [1]
|
||||
},
|
||||
{
|
||||
"id": 102,
|
||||
"name": "content-safety-both",
|
||||
"description": "Azure content safety on all traffic",
|
||||
"enabled": true,
|
||||
"cel_expression": "true",
|
||||
"apply_to": "both",
|
||||
"sampling_rate": 100,
|
||||
"timeout": 15,
|
||||
"provider_config_ids": [2]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
318
docs/deployment-guides/config-json/plugins.mdx
Normal file
318
docs/deployment-guides/config-json/plugins.mdx
Normal file
@@ -0,0 +1,318 @@
|
||||
---
|
||||
title: "Plugins"
|
||||
description: "Configure Bifrost plugins in config.json — semantic cache, OpenTelemetry, Maxim, Datadog, and custom plugins"
|
||||
icon: "puzzle-piece"
|
||||
---
|
||||
|
||||
<Note>
|
||||
**The `plugins` array only controls explicitly opt-in plugins**: `semantic_cache`, `otel`, `maxim`, `datadog` (enterprise), and custom plugins.
|
||||
|
||||
**Telemetry, logging, and governance are auto-loaded built-ins** — they are always active and configured via the `client` block and dedicated top-level keys, not the `plugins` array.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Auto-Loaded Built-ins
|
||||
|
||||
These plugins start automatically. You do **not** add them to the `plugins` array.
|
||||
|
||||
| Plugin | Always active? | How to configure |
|
||||
|--------|---------------|-----------------|
|
||||
| **Telemetry** (Prometheus `/metrics`) | Yes, always | `client.prometheus_labels` for custom labels; push gateway via `plugins` entry once DB-backed mode is running |
|
||||
| **Logging** | When `client.enable_logging: true` and `logs_store` is configured | `client.enable_logging`, `client.disable_content_logging`, `client.logging_headers` |
|
||||
| **Governance** | Yes, always (OSS) | `client.enforce_auth_on_inference` for VK enforcement; `governance.*` for virtual keys / budgets / routing rules |
|
||||
|
||||
See [Client Configuration](/deployment-guides/config-json/client) and [Governance](/deployment-guides/config-json/governance) for full details.
|
||||
|
||||
---
|
||||
|
||||
## Plugin Array Structure
|
||||
|
||||
Every entry in the `plugins` array supports these common fields:
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `name` | string | Yes | Plugin name |
|
||||
| `enabled` | boolean | Yes | Enable or disable this plugin |
|
||||
| `config` | object | Varies | Plugin-specific configuration |
|
||||
| `path` | string | No | Path to a custom plugin binary or WASM file |
|
||||
| `version` | integer | No | 🛑 **DB-Backed Only.** Plugin metadata persisted on `TablePlugin`. In DB-backed sync, higher values trigger replacement/reload. Valid range: `1` to `32767`. |
|
||||
| `placement` | string | No | 🛑 **DB-Backed Only.** Execution metadata (`"pre_builtin"`, `"builtin"`, `"post_builtin"`) persisted on `TablePlugin` and used for ordering behavior. |
|
||||
| `order` | integer | No | 🛑 **DB-Backed Only.** Execution metadata persisted on `TablePlugin`; within a placement group, lower values run earlier. |
|
||||
|
||||
<Note>
|
||||
`name`, `enabled`, `path`, and `config` are the core plugin config fields. In DB-backed mode, `version`, `placement`, and `order` are persisted on `TablePlugin` and used during sync/runtime ordering.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="Semantic Cache">
|
||||
|
||||
### Semantic Cache
|
||||
|
||||
Caches LLM responses by semantic similarity. Returns a cached response when an incoming request is semantically close enough to a previous one.
|
||||
|
||||
Requires a [vector store](/deployment-guides/config-json/storage#vector_store) to be configured.
|
||||
|
||||
| Field | Required | Default | Description |
|
||||
|-------|----------|---------|-------------|
|
||||
| `config.dimension` | Yes | — | Embedding dimension. Use `1` for hash-based (exact) caching without an embedding provider |
|
||||
| `config.provider` | No | — | Provider for generating embeddings (required for semantic mode) |
|
||||
| `config.embedding_model` | No | — | Model for embeddings (required when `provider` is set) |
|
||||
| `config.threshold` | No | `0.8` | Cosine similarity threshold for a cache hit (0.0–1.0) |
|
||||
| `config.ttl` | No | `300` | Cache entry TTL in seconds (or a duration string like `"1h"`) |
|
||||
| `config.cache_by_model` | No | `true` | Include model in cache key |
|
||||
| `config.cache_by_provider` | No | `true` | Include provider in cache key |
|
||||
| `config.exclude_system_prompt` | No | `false` | Exclude system prompt from cache key |
|
||||
| `config.conversation_history_threshold` | No | `3` | Skip caching for requests with more messages than this |
|
||||
| `config.default_cache_key` | No | — | Default cache key when no `x-bf-cache-key` header is sent |
|
||||
|
||||
**Semantic mode** (embedding-based similarity search):
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "semantic_cache",
|
||||
"enabled": true,
|
||||
"config": {
|
||||
"provider": "openai",
|
||||
"embedding_model": "text-embedding-3-small",
|
||||
"dimension": 1536,
|
||||
"threshold": 0.85,
|
||||
"ttl": 300,
|
||||
"cache_by_model": true,
|
||||
"cache_by_provider": true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Hash mode** (exact-match caching, no embedding provider needed):
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "semantic_cache",
|
||||
"enabled": true,
|
||||
"config": {
|
||||
"dimension": 1,
|
||||
"ttl": 1800
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
You must also configure a `vector_store` in `config.json`. See [Storage — vector_store](/deployment-guides/config-json/storage#vector_store).
|
||||
</Note>
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="OpenTelemetry">
|
||||
|
||||
### OpenTelemetry (OTel)
|
||||
|
||||
Exports distributed traces to any OTel-compatible collector (Jaeger, Zipkin, Tempo, Datadog via OTLP, etc.).
|
||||
|
||||
| Field | Required | Default | Description |
|
||||
|-------|----------|---------|-------------|
|
||||
| `config.collector_url` | Yes | — | OTLP collector endpoint |
|
||||
| `config.trace_type` | Yes | — | Trace format: `"genai_extension"`, `"vercel"`, or `"open_inference"` |
|
||||
| `config.protocol` | Yes | — | `"http"` or `"grpc"` |
|
||||
| `config.service_name` | No | `"bifrost"` | Service name reported to the collector |
|
||||
| `config.metrics_enabled` | No | `false` | Enable push-based OTLP metrics export |
|
||||
| `config.metrics_endpoint` | No | — | OTLP metrics endpoint URL |
|
||||
| `config.metrics_push_interval` | No | `15` | Metrics push interval in seconds |
|
||||
| `config.headers` | No | — | Custom headers for the collector (supports `env.` prefix) |
|
||||
| `config.insecure` | No | `false` | Skip TLS verification |
|
||||
| `config.tls_ca_cert` | No | — | Path to TLS CA certificate |
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "otel",
|
||||
"enabled": true,
|
||||
"config": {
|
||||
"collector_url": "http://otel-collector:4318",
|
||||
"trace_type": "genai_extension",
|
||||
"protocol": "http",
|
||||
"service_name": "bifrost-gateway"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**With authentication headers:**
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "otel",
|
||||
"enabled": true,
|
||||
"config": {
|
||||
"collector_url": "https://otel.example.com:4318",
|
||||
"trace_type": "open_inference",
|
||||
"protocol": "http",
|
||||
"service_name": "bifrost",
|
||||
"headers": {
|
||||
"Authorization": "env.OTEL_AUTH_HEADER"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**With OTLP metrics export:**
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "otel",
|
||||
"enabled": true,
|
||||
"config": {
|
||||
"collector_url": "http://otel-collector:4318",
|
||||
"trace_type": "genai_extension",
|
||||
"protocol": "http",
|
||||
"metrics_enabled": true,
|
||||
"metrics_endpoint": "http://otel-collector:4318/v1/metrics",
|
||||
"metrics_push_interval": 30
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Maxim">
|
||||
|
||||
### Maxim Observability
|
||||
|
||||
Sends request traces to the [Maxim](https://www.getmaxim.ai) observability platform.
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `config.api_key` | Yes | Maxim API key (use `env.` prefix) |
|
||||
| `config.log_repo_id` | No | Default Maxim logger repository ID |
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "maxim",
|
||||
"enabled": true,
|
||||
"config": {
|
||||
"api_key": "env.MAXIM_API_KEY",
|
||||
"log_repo_id": "your-log-repo-id"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Datadog">
|
||||
|
||||
### Datadog
|
||||
|
||||
<Note>
|
||||
Datadog is an **enterprise-only** plugin and is silently ignored in OSS builds.
|
||||
</Note>
|
||||
|
||||
Sends APM traces and metrics to a Datadog Agent.
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `config.agent_addr` | `"localhost:8126"` | Datadog Agent address for APM traces |
|
||||
| `config.service_name` | `"bifrost"` | Service name in Datadog |
|
||||
| `config.env` | — | Environment tag (e.g. `"production"`, `"staging"`) |
|
||||
| `config.version` | — | Service version tag |
|
||||
| `config.enable_traces` | `true` | Enable APM trace collection |
|
||||
| `config.custom_tags` | `{}` | Additional key/value tags for all traces and metrics |
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "datadog",
|
||||
"enabled": true,
|
||||
"config": {
|
||||
"agent_addr": "datadog-agent:8126",
|
||||
"service_name": "bifrost",
|
||||
"env": "production",
|
||||
"enable_traces": true,
|
||||
"custom_tags": {
|
||||
"team": "platform",
|
||||
"region": "us-east-1"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Custom / Dynamic Plugins
|
||||
|
||||
Load a custom Go plugin binary or WASM plugin at startup using the `path` field. Custom plugins must implement one of the Bifrost plugin interfaces.
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "my-custom-auth",
|
||||
"enabled": true,
|
||||
"path": "/app/plugins/my-custom-auth.so",
|
||||
"config": {
|
||||
"auth_endpoint": "env.AUTH_SERVICE_URL"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**WASM plugin:**
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"name": "my-wasm-plugin",
|
||||
"enabled": true,
|
||||
"path": "/app/plugins/my-plugin.wasm",
|
||||
"config": {}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
See [Writing Go Plugins](/plugins/writing-go-plugin) and [Writing WASM Plugins](/plugins/writing-wasm-plugin) for implementation guides.
|
||||
|
||||
**Placement and ordering (DB-backed only):**
|
||||
|
||||
In DB-backed mode, plugin metadata such as `version` (`1` to `32767`), `placement`, and `order` can be managed via config sync and DB/UI workflows:
|
||||
|
||||
| `placement` | When it runs |
|
||||
|-------------|-------------|
|
||||
| `pre_builtin` | Before all built-in plugins |
|
||||
| `builtin` | Alongside built-in plugins (by `order`) |
|
||||
| `post_builtin` | After all built-in plugins (default) |
|
||||
|
||||
Within a placement group, lower `order` values run earlier.
|
||||
755
docs/deployment-guides/config-json/providers.mdx
Normal file
755
docs/deployment-guides/config-json/providers.mdx
Normal file
@@ -0,0 +1,755 @@
|
||||
---
|
||||
title: "Provider Setup"
|
||||
description: "Configure LLM providers in config.json — API keys, cloud-native auth, per-provider network settings, and self-hosted endpoints"
|
||||
icon: "plug"
|
||||
---
|
||||
|
||||
All providers are configured under `providers` in `config.json`. Each provider entry contains a `keys` array where every key has a `name`, `value`, `models`, and `weight`, plus optional provider-specific config objects.
|
||||
|
||||
**Supplying credentials:**
|
||||
|
||||
Use the `env.` prefix to reference environment variables — never put API keys directly in `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "primary",
|
||||
"value": "env.OPENAI_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Provider Fields
|
||||
|
||||
Every key object supports these fields:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `name` | string | Unique name for this key (used in logs and virtual key pin) |
|
||||
| `value` | string | API key value or `env.VAR_NAME` reference |
|
||||
| `models` | array | Models this key serves. `["*"]` = all models |
|
||||
| `weight` | float | Load balancing weight. Higher = more traffic |
|
||||
| `aliases` | object | Map logical name → actual model name for this key |
|
||||
| `use_for_batch_api` | boolean | Mark key as eligible for batch API calls |
|
||||
|
||||
Per-provider `network_config` options (applies to all standard providers):
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `default_request_timeout_in_seconds` | integer | Per-request timeout |
|
||||
| `max_retries` | integer | Retry attempts on transient errors |
|
||||
| `retry_backoff_initial` | integer | Initial backoff in milliseconds |
|
||||
| `retry_backoff_max` | integer | Maximum backoff in milliseconds |
|
||||
| `max_conns_per_host` | integer | Max TCP connections to the provider endpoint (default: 5000) |
|
||||
| `extra_headers` | object | Static headers added to every provider request |
|
||||
| `stream_idle_timeout_in_seconds` | integer | Idle timeout per stream chunk (default: 60) |
|
||||
| `insecure_skip_verify` | boolean | Disable TLS verification (last resort only) |
|
||||
| `ca_cert_pem` | string | PEM-encoded CA for self-signed or private CA endpoints |
|
||||
|
||||
Concurrency and buffering per provider:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `concurrency_and_buffer_size.concurrency` | integer | Max concurrent requests to this provider |
|
||||
| `concurrency_and_buffer_size.buffer_size` | integer | Request queue depth |
|
||||
|
||||
---
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="OpenAI">
|
||||
|
||||
### OpenAI
|
||||
|
||||
Supports multiple keys with weighted load balancing. Mark one key with `use_for_batch_api: true` to designate it for the Batch API.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "openai-primary",
|
||||
"value": "env.OPENAI_KEY_1",
|
||||
"models": ["*"],
|
||||
"weight": 2.0
|
||||
},
|
||||
{
|
||||
"name": "openai-secondary",
|
||||
"value": "env.OPENAI_KEY_2",
|
||||
"models": ["gpt-4o-mini"],
|
||||
"weight": 1.0
|
||||
},
|
||||
{
|
||||
"name": "openai-batch",
|
||||
"value": "env.OPENAI_KEY_BATCH",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"use_for_batch_api": true
|
||||
}
|
||||
],
|
||||
"network_config": {
|
||||
"default_request_timeout_in_seconds": 120,
|
||||
"max_retries": 3,
|
||||
"retry_backoff_initial": 500,
|
||||
"retry_backoff_max": 5000
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Anthropic">
|
||||
|
||||
### Anthropic
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"anthropic": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "anthropic-primary",
|
||||
"value": "env.ANTHROPIC_KEY_1",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
},
|
||||
{
|
||||
"name": "anthropic-secondary",
|
||||
"value": "env.ANTHROPIC_KEY_2",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
],
|
||||
"network_config": {
|
||||
"default_request_timeout_in_seconds": 180
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Override Anthropic beta headers** (optional):
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"anthropic": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "primary",
|
||||
"value": "env.ANTHROPIC_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
],
|
||||
"network_config": {
|
||||
"beta_header_overrides": {
|
||||
"redact-thinking-": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Azure OpenAI">
|
||||
|
||||
### Azure OpenAI
|
||||
|
||||
Azure requires `azure_key_config` on every key with `endpoint` and `api_version`. List your Azure deployment names in `models` — Bifrost routes requests using the model name as the deployment name. If your deployment names differ from the model names you use in requests, add an `aliases` map on the key.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="API Key">
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"azure": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "azure-primary",
|
||||
"value": "env.AZURE_API_KEY",
|
||||
"models": ["gpt-4o", "gpt-4o-mini"],
|
||||
"weight": 1.0,
|
||||
"azure_key_config": {
|
||||
"endpoint": "env.AZURE_ENDPOINT",
|
||||
"api_version": "env.AZURE_API_VERSION"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Set environment variables:
|
||||
|
||||
```bash
|
||||
export AZURE_API_KEY="your-azure-api-key"
|
||||
export AZURE_ENDPOINT="https://your-resource.openai.azure.com"
|
||||
export AZURE_API_VERSION="2024-10-21"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Managed Identity / DefaultAzureCredential">
|
||||
|
||||
When `value` is empty or omitted, Bifrost uses `DefaultAzureCredential` — which resolves credentials from Workload Identity, VM managed identity, or `az login`.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"azure": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "azure-workload-identity",
|
||||
"value": "",
|
||||
"models": ["gpt-4o"],
|
||||
"weight": 1.0,
|
||||
"azure_key_config": {
|
||||
"endpoint": "env.AZURE_ENDPOINT",
|
||||
"api_version": "env.AZURE_API_VERSION"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
**Deployment name aliases** — when your Azure deployment names differ from the model names in requests, use `aliases`:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"azure": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "azure-primary",
|
||||
"value": "env.AZURE_API_KEY",
|
||||
"models": ["gpt-4o"],
|
||||
"weight": 1.0,
|
||||
"aliases": {
|
||||
"gpt-4o": "gpt-4o-prod-deployment"
|
||||
},
|
||||
"azure_key_config": {
|
||||
"endpoint": "env.AZURE_ENDPOINT",
|
||||
"api_version": "env.AZURE_API_VERSION"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Multi-region failover** (two keys, different regions):
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"azure": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "eastus",
|
||||
"value": "env.AZURE_KEY_EAST",
|
||||
"models": ["gpt-4o"],
|
||||
"weight": 1.0,
|
||||
"azure_key_config": {
|
||||
"endpoint": "env.AZURE_ENDPOINT_EAST",
|
||||
"api_version": "env.AZURE_API_VERSION"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "westus",
|
||||
"value": "env.AZURE_KEY_WEST",
|
||||
"models": ["gpt-4o"],
|
||||
"weight": 1.0,
|
||||
"azure_key_config": {
|
||||
"endpoint": "env.AZURE_ENDPOINT_WEST",
|
||||
"api_version": "env.AZURE_API_VERSION"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="AWS Bedrock">
|
||||
|
||||
### AWS Bedrock
|
||||
|
||||
Bedrock requires `bedrock_key_config` with at minimum a `region`. Three auth modes:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Static Credentials">
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"bedrock": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "bedrock-static",
|
||||
"value": "",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"bedrock_key_config": {
|
||||
"region": "us-east-1",
|
||||
"access_key": "env.AWS_ACCESS_KEY_ID",
|
||||
"secret_key": "env.AWS_SECRET_ACCESS_KEY"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="IAM Role (instance profile / IRSA)">
|
||||
|
||||
When only `region` is set, Bifrost inherits credentials from the AWS SDK default chain — IRSA (IAM Roles for Service Accounts), EC2 instance profile, or `AWS_*` env vars.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"bedrock": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "bedrock-iam",
|
||||
"value": "",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"bedrock_key_config": {
|
||||
"region": "us-east-1"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="STS AssumeRole">
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"bedrock": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "bedrock-assumerole",
|
||||
"value": "",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"bedrock_key_config": {
|
||||
"region": "us-west-2",
|
||||
"role_arn": "env.AWS_ROLE_ARN",
|
||||
"external_id": "env.AWS_EXTERNAL_ID",
|
||||
"session_name": "bifrost-session"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
**Model aliases** (map logical names to Bedrock inference profile IDs):
|
||||
|
||||
```json
|
||||
{
|
||||
"bedrock_key_config": {
|
||||
"region": "us-east-1"
|
||||
},
|
||||
"aliases": {
|
||||
"claude-sonnet": "us.anthropic.claude-3-5-sonnet-20241022-v2:0",
|
||||
"claude-haiku": "us.anthropic.claude-3-5-haiku-20241022-v1:0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Batch API — S3 configuration:**
|
||||
|
||||
```json
|
||||
{
|
||||
"bedrock_key_config": {
|
||||
"region": "us-east-1",
|
||||
"access_key": "env.AWS_ACCESS_KEY_ID",
|
||||
"secret_key": "env.AWS_SECRET_ACCESS_KEY",
|
||||
"batch_s3_config": {
|
||||
"buckets": [
|
||||
{
|
||||
"bucket_name": "my-bedrock-batch-bucket",
|
||||
"prefix": "batch/",
|
||||
"is_default": true
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Google Vertex AI">
|
||||
|
||||
### Google Vertex AI
|
||||
|
||||
Vertex requires `vertex_key_config` with `project_id` and `region`. Two auth modes:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Service Account Key">
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"vertex": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "vertex-sa",
|
||||
"value": "",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"vertex_key_config": {
|
||||
"project_id": "env.VERTEX_PROJECT_ID",
|
||||
"region": "us-central1",
|
||||
"auth_credentials": "env.VERTEX_AUTH_CREDENTIALS"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`VERTEX_AUTH_CREDENTIALS` should contain the base64-encoded service account JSON.
|
||||
|
||||
</Tab>
|
||||
<Tab title="GKE Workload Identity / ADC">
|
||||
|
||||
When `auth_credentials` is omitted, Bifrost calls `google.FindDefaultCredentials` — which resolves to GKE Workload Identity, GCE metadata server, or `gcloud auth application-default login`.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"vertex": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "vertex-workload-identity",
|
||||
"value": "",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"vertex_key_config": {
|
||||
"project_id": "my-gcp-project",
|
||||
"region": "us-central1"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Groq / Gemini / Mistral / Others">
|
||||
|
||||
### Standard API-Key Providers
|
||||
|
||||
These providers follow the same simple pattern — one or more keys with weights. Replace the provider name and env var name accordingly.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"groq": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "groq-primary",
|
||||
"value": "env.GROQ_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
},
|
||||
"gemini": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "gemini-primary",
|
||||
"value": "env.GEMINI_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
},
|
||||
"mistral": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "mistral-primary",
|
||||
"value": "env.MISTRAL_API_KEY",
|
||||
"models": ["*"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
},
|
||||
"cohere": {
|
||||
"keys": [{ "name": "cohere-main", "value": "env.COHERE_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
},
|
||||
"perplexity": {
|
||||
"keys": [{ "name": "perplexity-main", "value": "env.PERPLEXITY_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
},
|
||||
"xai": {
|
||||
"keys": [{ "name": "xai-main", "value": "env.XAI_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
},
|
||||
"cerebras": {
|
||||
"keys": [{ "name": "cerebras-main", "value": "env.CEREBRAS_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
},
|
||||
"openrouter": {
|
||||
"keys": [{ "name": "openrouter-main", "value": "env.OPENROUTER_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
},
|
||||
"nebius": {
|
||||
"keys": [{ "name": "nebius-main", "value": "env.NEBIUS_API_KEY", "models": ["*"], "weight": 1.0 }]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Self-Hosted">
|
||||
|
||||
### Self-Hosted Providers
|
||||
|
||||
Self-hosted providers point to a URL you operate. No API key is typically required (`"value": ""`).
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Ollama">
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"ollama": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "ollama-local",
|
||||
"value": "",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"ollama_key_config": {
|
||||
"url": "http://localhost:11434"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Using an env var for the URL (useful across environments):
|
||||
|
||||
```json
|
||||
{
|
||||
"ollama_key_config": {
|
||||
"url": "env.OLLAMA_URL"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="vLLM">
|
||||
|
||||
vLLM instances are model-specific — one key per served model:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"vllm": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "vllm-llama3-70b",
|
||||
"value": "",
|
||||
"models": ["llama-3-70b"],
|
||||
"weight": 1.0,
|
||||
"vllm_key_config": {
|
||||
"url": "http://vllm-server:8000",
|
||||
"model_name": "meta-llama/Meta-Llama-3-70B-Instruct"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "vllm-mistral",
|
||||
"value": "",
|
||||
"models": ["mistral-7b"],
|
||||
"weight": 1.0,
|
||||
"vllm_key_config": {
|
||||
"url": "http://vllm-mistral:8000",
|
||||
"model_name": "mistralai/Mistral-7B-Instruct-v0.3"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="SGLang">
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"sgl": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "sgl-main",
|
||||
"value": "",
|
||||
"models": ["*"],
|
||||
"weight": 1.0,
|
||||
"sgl_key_config": {
|
||||
"url": "http://sgl-router:30000"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="HuggingFace / Replicate">
|
||||
|
||||
These providers use `aliases` to map logical model names to provider-specific IDs:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"huggingface": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "hf-main",
|
||||
"value": "env.HF_API_KEY",
|
||||
"models": ["llama-3", "mixtral"],
|
||||
"weight": 1.0,
|
||||
"aliases": {
|
||||
"llama-3": "meta-llama/Meta-Llama-3-8B-Instruct",
|
||||
"mixtral": "mistralai/Mixtral-8x7B-Instruct-v0.1"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"replicate": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "replicate-main",
|
||||
"value": "env.REPLICATE_API_KEY",
|
||||
"models": ["llama-3"],
|
||||
"weight": 1.0,
|
||||
"aliases": {
|
||||
"llama-3": "meta/meta-llama-3-70b-instruct"
|
||||
},
|
||||
"replicate_key_config": {
|
||||
"use_deployments_endpoint": false
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Proxy Configuration
|
||||
|
||||
Route provider traffic through an HTTP or SOCKS5 proxy:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{ "name": "primary", "value": "env.OPENAI_API_KEY", "models": ["*"], "weight": 1.0 }
|
||||
],
|
||||
"proxy_config": {
|
||||
"type": "http",
|
||||
"url": "http://proxy.corp.example.com:3128",
|
||||
"username": "env.PROXY_USER",
|
||||
"password": "env.PROXY_PASS"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Options |
|
||||
|-------|------|---------|
|
||||
| `proxy_config.type` | string | `"none"`, `"http"`, `"socks5"`, `"environment"` |
|
||||
| `proxy_config.url` | string | Proxy server URL |
|
||||
| `proxy_config.username` | string | Proxy auth username |
|
||||
| `proxy_config.password` | string | Proxy auth password (`env.` supported) |
|
||||
| `proxy_config.ca_cert_pem` | string | PEM CA for TLS-intercepting proxies |
|
||||
|
||||
Use `"type": "environment"` to pick up `HTTP_PROXY` / `HTTPS_PROXY` env vars automatically.
|
||||
|
||||
---
|
||||
|
||||
## Multi-Provider Example
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{ "name": "openai-primary", "value": "env.OPENAI_API_KEY", "models": ["*"], "weight": 2.0 }
|
||||
]
|
||||
},
|
||||
"anthropic": {
|
||||
"keys": [
|
||||
{ "name": "anthropic-primary", "value": "env.ANTHROPIC_API_KEY", "models": ["*"], "weight": 1.0 }
|
||||
]
|
||||
},
|
||||
"groq": {
|
||||
"keys": [
|
||||
{ "name": "groq-primary", "value": "env.GROQ_API_KEY", "models": ["*"], "weight": 1.0 }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
With three providers and the weights above, traffic is distributed: 50% OpenAI, 25% Anthropic, 25% Groq. If any provider returns an error, Bifrost automatically retries on the next key or provider.
|
||||
252
docs/deployment-guides/config-json/schema-reference.mdx
Normal file
252
docs/deployment-guides/config-json/schema-reference.mdx
Normal file
@@ -0,0 +1,252 @@
|
||||
---
|
||||
title: "Schema Reference"
|
||||
description: "All top-level keys available in config.json, their types, and where each is documented"
|
||||
icon: "brackets-curly"
|
||||
---
|
||||
|
||||
<Note>
|
||||
The live schema is published at [`https://www.getbifrost.ai/schema`](https://www.getbifrost.ai/schema). Add `"$schema": "https://www.getbifrost.ai/schema"` to your `config.json` for IDE autocomplete and inline validation.
|
||||
</Note>
|
||||
|
||||
This page is a concise reference for every top-level key in `config.json`. Click the **Guide** links for full field-by-field documentation.
|
||||
|
||||
---
|
||||
|
||||
## Top-Level Keys
|
||||
|
||||
| Key | Type | Description | Guide |
|
||||
|-----|------|-------------|-------|
|
||||
| `$schema` | string | Schema URL for IDE validation. Set to `"https://www.getbifrost.ai/schema"` | — |
|
||||
| `encryption_key` | string | Optional AES-256 key (derived via Argon2id). Accepts `env.VAR` prefix and is also read from `BIFROST_ENCRYPTION_KEY`. If omitted, data is stored in plaintext. | [Client](/deployment-guides/config-json/client#encryption-key) |
|
||||
| `client` | object | Worker pool, logging, CORS, auth enforcement, header filtering, MCP, compat shims | [Client](/deployment-guides/config-json/client) |
|
||||
| `providers` | object | LLM provider API keys, network settings, concurrency | [Providers](/deployment-guides/config-json/providers) |
|
||||
| `governance` | object | Admin auth, virtual keys, budgets, rate limits, routing rules, customers, teams | [Governance](/deployment-guides/config-json/governance) |
|
||||
| `guardrails_config` | object | Content moderation providers and CEL-based rules *(enterprise only)* | [Guardrails](/deployment-guides/config-json/guardrails) |
|
||||
| `access_profiles` | array | Access profile templates for enterprise RBAC/governance controls *(enterprise only)* | [Enterprise Governance](/enterprise/advanced-governance) |
|
||||
| `cluster_config` | object | Cluster mode settings: gossip, peers, and auto-discovery backends *(enterprise only)* | [Cluster](/deployment-guides/config-json/cluster) |
|
||||
| `config_store` | object | Configuration database backend — SQLite, PostgreSQL, or disabled (file-only mode) | [Storage](/deployment-guides/config-json/storage#config_store) |
|
||||
| `logs_store` | object | Request/response log database — SQLite, PostgreSQL + optional S3/GCS offload | [Storage](/deployment-guides/config-json/storage#logs_store) |
|
||||
| `vector_store` | object | Vector database for semantic cache — Weaviate, Redis, Qdrant, Pinecone, Valkey | [Storage](/deployment-guides/config-json/storage#vector_store) |
|
||||
| `plugins` | array | Opt-in plugins: `semantic_cache`, `otel`, `maxim`, `datadog`, custom | [Plugins](/deployment-guides/config-json/plugins) |
|
||||
| `framework` | object | Model pricing catalog URL and sync interval | [Framework](#framework) |
|
||||
| `mcp` | object | MCP server and tool configuration | — |
|
||||
| `websocket` | object | WebSocket / Realtime API connection pool tuning | [WebSocket](#websocket) |
|
||||
| `auth_config` | object | **Deprecated** — use `governance.auth_config` | [Client](/deployment-guides/config-json/client#authentication) |
|
||||
|
||||
---
|
||||
|
||||
## `version`
|
||||
|
||||
Controls how empty arrays in allow-list fields (`models`, `allowed_models`, `key_ids`, `tools_to_execute`) are interpreted:
|
||||
|
||||
| Value | Behaviour |
|
||||
|-------|-----------|
|
||||
| `2` *(default, v1.5.0+)* | Empty array = **deny all**; `["*"]` = allow all |
|
||||
| `1` *(v1.4.x compat)* | Empty array = **allow all** |
|
||||
|
||||
Omitting `version` uses v2 semantics. Set `"version": 1` only if you are migrating from v1.4.x and need the old behaviour temporarily.
|
||||
|
||||
---
|
||||
|
||||
## `client`
|
||||
|
||||
Controls the worker pool, logging pipeline, security, and SDK shims. All fields are optional.
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `initial_pool_size` | integer | `300` | Pre-allocated goroutines per provider queue |
|
||||
| `drop_excess_requests` | boolean | `false` | Return HTTP 429 when queue is full |
|
||||
| `enable_logging` | boolean | `true`* | Persist request/response logs (`*` auto-enabled when `logs_store` is set) |
|
||||
| `disable_content_logging` | boolean | `false` | Strip message content from logs |
|
||||
| `log_retention_days` | integer | `365` | Days to retain log entries |
|
||||
| `logging_headers` | array | `[]` | HTTP headers to capture in log metadata |
|
||||
| `enforce_auth_on_inference` | boolean | `false` | Require a virtual key on every `/v1/*` request |
|
||||
| `allow_direct_keys` | boolean | `false` | Allow callers to pass provider API keys directly |
|
||||
| `allowed_origins` | array | `["*"]` | CORS allowed origins |
|
||||
| `max_request_body_size_mb` | integer | `100` | Maximum request body in MB |
|
||||
| `whitelisted_routes` | array | `[]` | Routes that bypass auth middleware |
|
||||
| `allowed_headers` | array | `[]` | Additional headers permitted for CORS/WebSocket |
|
||||
| `required_headers` | array | `[]` | Headers that must be present on every request |
|
||||
| `header_filter_config` | object | — | `allowlist` / `denylist` for `x-bf-eh-*` forwarded headers |
|
||||
| `prometheus_labels` | array | `[]` | Custom labels for all Prometheus metrics |
|
||||
| `compat` | object | — | SDK compatibility shims (`should_drop_params`, `convert_text_to_chat`, etc.) |
|
||||
| `mcp_agent_depth` | integer | `10` | Max tool-call recursion depth |
|
||||
| `mcp_tool_execution_timeout` | integer | `30` | Per-tool execution timeout in seconds |
|
||||
| `mcp_tool_sync_interval` | integer | `10` | Tool sync interval in minutes (`0` = disabled) |
|
||||
| `mcp_disable_auto_tool_inject` | boolean | `false` | Disable automatic MCP tool injection |
|
||||
| `async_job_result_ttl` | integer | `3600` | TTL for async job results in seconds |
|
||||
| `disable_db_pings_in_health` | boolean | `false` | Exclude DB connectivity from `/health` |
|
||||
| `routing_chain_max_depth` | integer | `10` | Max routing rule chain evaluation depth |
|
||||
|
||||
Full documentation: [Client Configuration](/deployment-guides/config-json/client).
|
||||
|
||||
---
|
||||
|
||||
## `providers`
|
||||
|
||||
Keyed by provider name. Each entry contains a `keys` array and optional `network_config`, `concurrency_and_buffer_size`, `proxy_config`.
|
||||
|
||||
Supported provider keys: `openai`, `anthropic`, `azure`, `bedrock`, `vertex`, `gemini`, `mistral`, `groq`, `cohere`, `perplexity`, `xai`, `cerebras`, `openrouter`, `nebius`, `fireworks`, `parasail`, `huggingface`, `replicate`, `ollama`, `vllm`, `sgl`, `elevenlabs`, `runway`.
|
||||
|
||||
Full documentation: [Provider Setup](/deployment-guides/config-json/providers).
|
||||
|
||||
---
|
||||
|
||||
## `governance`
|
||||
|
||||
Seeds governance resources at startup. All sub-keys are optional arrays.
|
||||
|
||||
| Sub-key | Description |
|
||||
|---------|-------------|
|
||||
| `auth_config` | Admin username/password auth for the dashboard |
|
||||
| `virtual_keys` | Scoped API tokens with provider/model allowlists |
|
||||
| `budgets` | Spend caps in USD over a rolling window |
|
||||
| `rate_limits` | Request and token rate limits |
|
||||
| `customers` | Customer entities (attach budgets/rate limits) |
|
||||
| `teams` | Team entities (attach to customers, budgets, rate limits) |
|
||||
| `routing_rules` | CEL-based dynamic provider/model routing |
|
||||
| `pricing_overrides` | Scoped per-model pricing overrides |
|
||||
| `model_configs` | Per-model rate limit and budget configurations |
|
||||
|
||||
Full documentation: [Governance](/deployment-guides/config-json/governance).
|
||||
|
||||
---
|
||||
|
||||
## `guardrails_config`
|
||||
|
||||
Enterprise-only. Two sub-keys: `guardrail_providers` (array) and `guardrail_rules` (array).
|
||||
|
||||
Full documentation: [Guardrails](/deployment-guides/config-json/guardrails).
|
||||
|
||||
---
|
||||
|
||||
## `access_profiles`
|
||||
|
||||
Enterprise-only. Defines access profile templates that can later be attached to roles/users.
|
||||
|
||||
```json
|
||||
{
|
||||
"access_profiles": [
|
||||
{
|
||||
"name": "platform-default",
|
||||
"description": "Default platform profile",
|
||||
"is_active": true,
|
||||
"tags": ["platform", "default"],
|
||||
"provider_configs": [
|
||||
{
|
||||
"provider_name": "openai",
|
||||
"all_models_allowed": false,
|
||||
"allowed_models": ["gpt-4o", "gpt-4o-mini"]
|
||||
}
|
||||
],
|
||||
"mcp_servers": [
|
||||
{ "mcp_server_id": "github" }
|
||||
],
|
||||
"mcp_tool_overrides": [
|
||||
{ "mcp_client_id": "github", "tool_name": "create_pull_request", "action": "include" }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `cluster_config`
|
||||
|
||||
Enterprise-only clustering settings for multi-node deployments.
|
||||
|
||||
| Sub-key | Description |
|
||||
|---------|-------------|
|
||||
| `enabled` | Enables cluster mode |
|
||||
| `region` | Region label used by enterprise clustering |
|
||||
| `peers` | Static peer list (`host:port`) |
|
||||
| `gossip` | Gossip/memberlist port + liveness thresholds |
|
||||
| `discovery` | Auto-discovery configuration (`kubernetes`, `dns`, `udp`, `consul`, `etcd`, `mdns`) |
|
||||
|
||||
Full documentation: [Cluster](/deployment-guides/config-json/cluster).
|
||||
|
||||
---
|
||||
|
||||
## `config_store`, `logs_store`, `vector_store`
|
||||
|
||||
Storage backends. Each has `enabled` (boolean), `type` (string), and `config` (object).
|
||||
|
||||
| Store | Types |
|
||||
|-------|-------|
|
||||
| `config_store` | `"sqlite"`, `"postgres"` |
|
||||
| `logs_store` | `"sqlite"`, `"postgres"` (+ optional `object_storage`) |
|
||||
| `vector_store` | `"weaviate"`, `"redis"`, `"qdrant"`, `"pinecone"` (`"redis"` also covers Valkey-compatible endpoints) |
|
||||
|
||||
Full documentation: [Storage](/deployment-guides/config-json/storage).
|
||||
|
||||
---
|
||||
|
||||
## `framework`
|
||||
|
||||
Controls model pricing catalog sync:
|
||||
|
||||
```json
|
||||
{
|
||||
"framework": {
|
||||
"pricing": {
|
||||
"pricing_url": "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json",
|
||||
"pricing_sync_interval": 86400
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `pricing.pricing_url` | LiteLLM catalog | URL of a model pricing JSON file |
|
||||
| `pricing.pricing_sync_interval` | `86400` | Sync interval in seconds (minimum: `3600`) |
|
||||
|
||||
---
|
||||
|
||||
## `websocket`
|
||||
|
||||
Optional tuning for the WebSocket gateway (Responses API WebSocket mode, Realtime API). WebSocket is always enabled.
|
||||
|
||||
```json
|
||||
{
|
||||
"websocket": {
|
||||
"max_connections_per_user": 100,
|
||||
"transcript_buffer_size": 100,
|
||||
"pool": {
|
||||
"max_idle_per_key": 50,
|
||||
"max_total_connections": 1000,
|
||||
"idle_timeout_seconds": 600,
|
||||
"max_connection_lifetime_seconds": 7200
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `max_connections_per_user` | `100` | Max concurrent WebSocket connections per user |
|
||||
| `transcript_buffer_size` | `100` | Transcript entries buffered for Realtime API mid-session fallback |
|
||||
| `pool.max_idle_per_key` | `50` | Max idle upstream connections per provider/key |
|
||||
| `pool.max_total_connections` | `1000` | Max total idle upstream connections |
|
||||
| `pool.idle_timeout_seconds` | `600` | Evict idle connections after this many seconds |
|
||||
| `pool.max_connection_lifetime_seconds` | `7200` | Max lifetime of any upstream connection |
|
||||
|
||||
---
|
||||
|
||||
## Minimal Valid Config
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{ "name": "primary", "value": "env.OPENAI_API_KEY", "models": ["*"], "weight": 1.0 }
|
||||
]
|
||||
}
|
||||
},
|
||||
"config_store": { "enabled": false }
|
||||
}
|
||||
```
|
||||
540
docs/deployment-guides/config-json/storage.mdx
Normal file
540
docs/deployment-guides/config-json/storage.mdx
Normal file
@@ -0,0 +1,540 @@
|
||||
---
|
||||
title: "Storage"
|
||||
description: "Configure Bifrost storage backends in config.json — config_store, logs_store, vector_store, and object storage for logs"
|
||||
icon: "database"
|
||||
---
|
||||
|
||||
Bifrost persists two types of data — **config** (providers, virtual keys, governance rules) and **logs** (request/response records). Each has its own store. A **vector store** is required for semantic caching.
|
||||
|
||||
| Store | Purpose | Backends |
|
||||
|-------|---------|---------|
|
||||
| `config_store` | Provider configs, virtual keys, governance rules | SQLite, PostgreSQL |
|
||||
| `logs_store` | Request/response logs shown in UI | SQLite, PostgreSQL + optional S3/GCS offload |
|
||||
| `vector_store` | Semantic response caching | Weaviate, Redis, Valkey, Qdrant, Pinecone |
|
||||
|
||||
<Note>
|
||||
If you use PostgreSQL for any store, the target database must be **UTF8 encoded**. See [PostgreSQL UTF8 Requirement](/quickstart/gateway/setting-up#postgresql-utf8-requirement).
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## config_store
|
||||
|
||||
<Note>
|
||||
When `config_store` is disabled (or absent), all configuration is loaded from `config.json` at startup only — the Web UI is disabled and changes require a restart. See [Two Configuration Modes](/deployment-guides/config-json#two-configuration-modes).
|
||||
</Note>
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="SQLite">
|
||||
|
||||
### SQLite (Default)
|
||||
|
||||
Simplest setup — no external database required. Bifrost stores configuration in a local SQLite file.
|
||||
|
||||
```json
|
||||
{
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "sqlite",
|
||||
"config": {
|
||||
"path": "./config.db"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `config.path` | Path to the SQLite file (relative to app-dir, or absolute) |
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="PostgreSQL">
|
||||
|
||||
### PostgreSQL
|
||||
|
||||
Production-grade storage suitable for high-availability and high-throughput deployments.
|
||||
|
||||
```json
|
||||
{
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require",
|
||||
"max_idle_conns": 5,
|
||||
"max_open_conns": 50
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `host` | — | PostgreSQL host (supports `env.` prefix) |
|
||||
| `port` | — | PostgreSQL port (as string) |
|
||||
| `user` | — | Database user (supports `env.` prefix) |
|
||||
| `password` | — | Database password (supports `env.` prefix). Leave empty for IAM role auth. |
|
||||
| `db_name` | — | Database name |
|
||||
| `ssl_mode` | — | `"disable"`, `"require"`, `"verify-ca"`, `"verify-full"` |
|
||||
| `max_idle_conns` | `5` | Maximum idle connections in the pool |
|
||||
| `max_open_conns` | `50` | Maximum open connections to the database |
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Disabled">
|
||||
|
||||
### Disabled (file-only mode)
|
||||
|
||||
Use this when you want Bifrost to read all configuration from `config.json` only — no database, no Web UI.
|
||||
|
||||
```json
|
||||
{
|
||||
"config_store": {
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is the recommended setup for [multinode OSS deployments](/deployment-guides/how-to/multinode) where a shared `config.json` is the single source of truth.
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## logs_store
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="SQLite">
|
||||
|
||||
### SQLite
|
||||
|
||||
```json
|
||||
{
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "sqlite",
|
||||
"config": {
|
||||
"path": "./logs.db"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="PostgreSQL">
|
||||
|
||||
### PostgreSQL
|
||||
|
||||
```json
|
||||
{
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require",
|
||||
"max_idle_conns": 10,
|
||||
"max_open_conns": 100
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For high log volumes, increase `max_open_conns`:
|
||||
|
||||
```json
|
||||
{
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require",
|
||||
"max_idle_conns": 10,
|
||||
"max_open_conns": 200
|
||||
},
|
||||
"retention_days": 90
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Disabled">
|
||||
|
||||
```json
|
||||
{
|
||||
"logs_store": {
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
### Log Retention
|
||||
|
||||
Set `retention_days` to automatically purge old log entries. `0` disables retention-based cleanup.
|
||||
|
||||
```json
|
||||
{
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": { "...": "..." },
|
||||
"retention_days": 90
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Object Storage for Logs
|
||||
|
||||
Offload large request/response payloads from the database to S3 or GCS. The database retains only lightweight index records; payloads are fetched on demand.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="AWS S3">
|
||||
|
||||
```json
|
||||
{
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": { "...": "..." },
|
||||
"object_storage": {
|
||||
"type": "s3",
|
||||
"bucket": "env.S3_BUCKET",
|
||||
"prefix": "bifrost",
|
||||
"compress": true,
|
||||
"region": "us-east-1",
|
||||
"access_key_id": "env.S3_ACCESS_KEY_ID",
|
||||
"secret_access_key": "env.S3_SECRET_ACCESS_KEY"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**IAM role (instance profile / IRSA)** — omit `access_key_id` and `secret_access_key`:
|
||||
|
||||
```json
|
||||
{
|
||||
"object_storage": {
|
||||
"type": "s3",
|
||||
"bucket": "bifrost-logs",
|
||||
"region": "us-east-1",
|
||||
"compress": true,
|
||||
"role_arn": "arn:aws:iam::123456789012:role/BifrostS3Role"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `bucket` | S3 bucket name (supports `env.` prefix) |
|
||||
| `prefix` | Key prefix for stored objects (default: `"bifrost"`) |
|
||||
| `compress` | Enable gzip compression (default: `false`) |
|
||||
| `region` | AWS region |
|
||||
| `access_key_id` | AWS access key ID (omit for default credential chain) |
|
||||
| `secret_access_key` | AWS secret access key |
|
||||
| `session_token` | STS temporary credentials session token |
|
||||
| `role_arn` | IAM role ARN for STS AssumeRole |
|
||||
| `endpoint` | Custom endpoint for MinIO / Cloudflare R2 |
|
||||
| `force_path_style` | Use path-style URLs (required for MinIO, default: `false`) |
|
||||
|
||||
</Tab>
|
||||
<Tab title="Google Cloud Storage">
|
||||
|
||||
```json
|
||||
{
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": { "...": "..." },
|
||||
"object_storage": {
|
||||
"type": "gcs",
|
||||
"bucket": "bifrost-logs",
|
||||
"prefix": "bifrost",
|
||||
"compress": true,
|
||||
"project_id": "env.GCP_PROJECT_ID",
|
||||
"credentials_json": "env.GCS_CREDENTIALS_JSON"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Omit `credentials_json` to use Application Default Credentials (Workload Identity, GCE metadata, `gcloud auth`).
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `project_id` | GCP project ID (supports `env.` prefix) |
|
||||
| `credentials_json` | Service account JSON or path — omit for ADC |
|
||||
|
||||
</Tab>
|
||||
<Tab title="MinIO (Self-Hosted)">
|
||||
|
||||
```json
|
||||
{
|
||||
"object_storage": {
|
||||
"type": "s3",
|
||||
"bucket": "bifrost-logs",
|
||||
"prefix": "bifrost",
|
||||
"compress": false,
|
||||
"region": "us-east-1",
|
||||
"endpoint": "http://minio.internal:9000",
|
||||
"access_key_id": "env.MINIO_ACCESS_KEY",
|
||||
"secret_access_key": "env.MINIO_SECRET_KEY",
|
||||
"force_path_style": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## vector_store
|
||||
|
||||
A vector store is required for [semantic caching](/features/semantic-caching). Choose from Weaviate, Redis/Valkey, Qdrant, or Pinecone.
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="Weaviate">
|
||||
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "weaviate",
|
||||
"config": {
|
||||
"scheme": "http",
|
||||
"host": "localhost:8080",
|
||||
"api_key": "env.WEAVIATE_API_KEY",
|
||||
"grpc_config": {
|
||||
"host": "localhost:50051",
|
||||
"secured": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `scheme` | Yes | `"http"` or `"https"` |
|
||||
| `host` | Yes | Weaviate server host and port |
|
||||
| `api_key` | No | Weaviate API key (supports `env.` prefix) |
|
||||
| `grpc_config.host` | No | gRPC host for faster vector operations |
|
||||
| `grpc_config.secured` | No | Use TLS for gRPC connection |
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Redis / Valkey">
|
||||
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "redis",
|
||||
"config": {
|
||||
"addr": "env.REDIS_ADDR",
|
||||
"password": "env.REDIS_PASSWORD",
|
||||
"db": 0,
|
||||
"use_tls": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**AWS MemoryDB (cluster mode):**
|
||||
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "redis",
|
||||
"config": {
|
||||
"addr": "env.MEMORYDB_ENDPOINT",
|
||||
"password": "env.MEMORYDB_PASSWORD",
|
||||
"use_tls": true,
|
||||
"cluster_mode": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `addr` | — | Redis/Valkey address `host:port` (supports `env.` prefix) |
|
||||
| `password` | — | Redis AUTH password (supports `env.` prefix) |
|
||||
| `db` | `0` | Redis database number |
|
||||
| `use_tls` | `false` | Enable TLS |
|
||||
| `cluster_mode` | `false` | Enable cluster mode (required for MemoryDB; `db` must be `0`) |
|
||||
| `pool_size` | — | Maximum socket connections |
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Qdrant">
|
||||
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "qdrant",
|
||||
"config": {
|
||||
"host": "env.QDRANT_HOST",
|
||||
"port": 6334,
|
||||
"api_key": "env.QDRANT_API_KEY",
|
||||
"use_tls": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `host` | — | Qdrant server host (supports `env.` prefix) |
|
||||
| `port` | `6334` | gRPC port |
|
||||
| `api_key` | — | API key (supports `env.` prefix) |
|
||||
| `use_tls` | `false` | Enable TLS |
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Pinecone">
|
||||
|
||||
Pinecone is external-only.
|
||||
|
||||
```json
|
||||
{
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "pinecone",
|
||||
"config": {
|
||||
"api_key": "env.PINECONE_API_KEY",
|
||||
"index_host": "env.PINECONE_INDEX_HOST"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `api_key` | Pinecone API key (supports `env.` prefix) |
|
||||
| `index_host` | Index host from Pinecone console (e.g. `your-index.svc.us-east1-gcp.pinecone.io`) |
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Mixed Backend Example
|
||||
|
||||
Run the config store on PostgreSQL (for UI) while keeping logs on SQLite (simpler, cheaper for append-heavy workloads):
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require"
|
||||
}
|
||||
},
|
||||
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "sqlite",
|
||||
"config": {
|
||||
"path": "./logs.db"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Storage Example
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"encryption_key": "env.BIFROST_ENCRYPTION_KEY",
|
||||
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require",
|
||||
"max_idle_conns": 5,
|
||||
"max_open_conns": 50
|
||||
}
|
||||
},
|
||||
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "env.PG_HOST",
|
||||
"port": "5432",
|
||||
"user": "env.PG_USER",
|
||||
"password": "env.PG_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "require",
|
||||
"max_idle_conns": 10,
|
||||
"max_open_conns": 100
|
||||
},
|
||||
"retention_days": 90,
|
||||
"object_storage": {
|
||||
"type": "s3",
|
||||
"bucket": "env.S3_BUCKET",
|
||||
"region": "us-east-1",
|
||||
"compress": true,
|
||||
"access_key_id": "env.S3_ACCESS_KEY_ID",
|
||||
"secret_access_key": "env.S3_SECRET_ACCESS_KEY"
|
||||
}
|
||||
},
|
||||
|
||||
"vector_store": {
|
||||
"enabled": true,
|
||||
"type": "weaviate",
|
||||
"config": {
|
||||
"scheme": "http",
|
||||
"host": "weaviate:8080"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
440
docs/deployment-guides/docker-tuning.mdx
Normal file
440
docs/deployment-guides/docker-tuning.mdx
Normal file
@@ -0,0 +1,440 @@
|
||||
---
|
||||
title: "Docker Performance Tuning"
|
||||
description: "Optimize Bifrost container performance with Go runtime tuning, resource limits, and system configuration"
|
||||
icon: "docker"
|
||||
---
|
||||
|
||||
This guide covers performance tuning for Bifrost when running in Docker containers. Proper tuning ensures Bifrost can fully utilize container resources and achieve optimal throughput.
|
||||
|
||||
<Note>
|
||||
These optimizations apply to Docker, Docker Compose, Kubernetes, and any container runtime using cgroups for resource management.
|
||||
</Note>
|
||||
|
||||
## Quick Start
|
||||
|
||||
For most production deployments, add these settings to your container:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost:
|
||||
image: maximhq/bifrost:latest
|
||||
environment:
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=3600MiB # 90% of 4GB memory limit
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '4'
|
||||
memory: 4G
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Go Runtime Tuning
|
||||
|
||||
### GOMAXPROCS (Automatic)
|
||||
|
||||
Bifrost automatically detects container CPU limits using [automaxprocs](https://github.com/uber-go/automaxprocs). This sets `GOMAXPROCS` to match your container's CPU quota from cgroups (v1 and v2).
|
||||
|
||||
**No configuration needed** — this works automatically. You'll see a log line at startup:
|
||||
|
||||
```
|
||||
maxprocs: Updating GOMAXPROCS=4: determined from CPU quota
|
||||
```
|
||||
|
||||
<Warning>
|
||||
Without automaxprocs, Go would detect all host CPUs (e.g., 64 on an EC2 instance) even when the container is limited to 4 CPUs, causing excessive context switching and degraded performance.
|
||||
</Warning>
|
||||
|
||||
### GOGC (Garbage Collection)
|
||||
|
||||
`GOGC` controls garbage collection frequency. The default is `100` (GC triggers when heap grows 100% since last collection).
|
||||
|
||||
| Scenario | Recommended GOGC | Trade-off |
|
||||
|----------|------------------|-----------|
|
||||
| Memory constrained | 50-100 | More frequent GC, lower memory |
|
||||
| High throughput, memory available | 200-400 | Less GC overhead, higher memory |
|
||||
| Latency sensitive | 50-100 | More predictable latency |
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- GOGC=200
|
||||
```
|
||||
|
||||
<Tip>
|
||||
For high-throughput API gateways, `GOGC=200` or `GOGC=400` typically provides the best balance of throughput and memory usage.
|
||||
</Tip>
|
||||
|
||||
### GOMEMLIMIT (Memory Limit)
|
||||
|
||||
`GOMEMLIMIT` sets a soft memory limit for the Go runtime. When approaching this limit, Go becomes more aggressive about garbage collection.
|
||||
|
||||
**Best practice:** Set to ~90% of your container's memory limit to leave headroom for non-heap memory (goroutine stacks, CGO, etc.).
|
||||
|
||||
| Container Memory | Recommended GOMEMLIMIT |
|
||||
|------------------|------------------------|
|
||||
| 512 MB | 450MiB |
|
||||
| 1 GB | 900MiB |
|
||||
| 2 GB | 1800MiB |
|
||||
| 4 GB | 3600MiB |
|
||||
| 8 GB | 7200MiB |
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- GOMEMLIMIT=3600MiB
|
||||
```
|
||||
|
||||
<Note>
|
||||
When using both `GOGC` and `GOMEMLIMIT`, Go GCs based on whichever trigger fires first. For high-throughput workloads, set `GOGC=200` or higher and let `GOMEMLIMIT` be the primary constraint.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## System Limits
|
||||
|
||||
### File Descriptor Limits (ulimits)
|
||||
|
||||
Each HTTP connection requires a file descriptor. The default container limit (often 1024) is too low for high-concurrency workloads.
|
||||
|
||||
```yaml
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
```
|
||||
|
||||
| Expected Concurrent Connections | Recommended nofile |
|
||||
|--------------------------------|-------------------|
|
||||
| < 1000 | 4096 |
|
||||
| 1000-5000 | 16384 |
|
||||
| 5000-10000 | 32768 |
|
||||
| > 10000 | 65536+ |
|
||||
|
||||
<Warning>
|
||||
If you see errors like `too many open files` or connections being refused under load, increase your `nofile` limit.
|
||||
</Warning>
|
||||
|
||||
### Resource Limits
|
||||
|
||||
Set CPU and memory limits to match your expected workload:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '4'
|
||||
memory: 4G
|
||||
reservations:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
```
|
||||
|
||||
**Sizing guidance:**
|
||||
|
||||
| Expected RPS | Recommended CPUs | Recommended Memory |
|
||||
|--------------|------------------|-------------------|
|
||||
| 100-500 | 1-2 | 512MB-1GB |
|
||||
| 500-2000 | 2-4 | 1-2GB |
|
||||
| 2000-5000 | 4-8 | 2-4GB |
|
||||
| 5000+ | 8+ | 4GB+ |
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose Examples
|
||||
|
||||
### Development
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
environment:
|
||||
- LOG_LEVEL=debug
|
||||
```
|
||||
|
||||
### Production (Single Node)
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- bifrost-data:/app/data
|
||||
environment:
|
||||
- LOG_LEVEL=info
|
||||
- LOG_STYLE=json
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=3600MiB
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '4'
|
||||
memory: 4G
|
||||
reservations:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--no-verbose", "--tries=1", "-O", "/dev/null", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
bifrost-data:
|
||||
```
|
||||
|
||||
### Production (Multi-Node with PostgreSQL)
|
||||
|
||||
<Note>
|
||||
If you use PostgreSQL for Bifrost storage, ensure the database is UTF8 encoded. See [PostgreSQL UTF8 Requirement](../quickstart/gateway/setting-up#postgresql-utf8-requirement).
|
||||
</Note>
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost-1:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8081:8080"
|
||||
environment:
|
||||
- LOG_LEVEL=info
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=1800MiB
|
||||
- BIFROST_DB_TYPE=postgres
|
||||
- BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
depends_on:
|
||||
- postgres
|
||||
|
||||
bifrost-2:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8082:8080"
|
||||
environment:
|
||||
- LOG_LEVEL=info
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=1800MiB
|
||||
- BIFROST_DB_TYPE=postgres
|
||||
- BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
depends_on:
|
||||
- postgres
|
||||
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
environment:
|
||||
- POSTGRES_USER=user
|
||||
- POSTGRES_PASSWORD=pass
|
||||
- POSTGRES_DB=bifrost
|
||||
volumes:
|
||||
- postgres-data:/var/lib/postgresql/data
|
||||
|
||||
volumes:
|
||||
postgres-data:
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Kubernetes Configuration
|
||||
|
||||
### Basic Deployment
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bifrost
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bifrost
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bifrost
|
||||
spec:
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
env:
|
||||
- name: GOGC
|
||||
value: "200"
|
||||
- name: GOMEMLIMIT
|
||||
value: "3600MiB"
|
||||
resources:
|
||||
limits:
|
||||
cpu: "4"
|
||||
memory: "4Gi"
|
||||
requests:
|
||||
cpu: "2"
|
||||
memory: "2Gi"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
```
|
||||
|
||||
### File Descriptor Limits in Kubernetes
|
||||
|
||||
File descriptor limits in Kubernetes are typically set at the node level. Options include:
|
||||
|
||||
1. **Node-level configuration** (recommended): Set `fs.file-max` and ulimits in your node configuration
|
||||
2. **Init container**: Use an init container with elevated privileges to set limits
|
||||
3. **Security context**: Some clusters allow setting capabilities
|
||||
|
||||
```yaml
|
||||
securityContext:
|
||||
capabilities:
|
||||
add: ["SYS_RESOURCE"]
|
||||
```
|
||||
|
||||
<Note>
|
||||
Check your current limits inside a container with: `cat /proc/sys/fs/file-max` and `ulimit -n`
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Bifrost Application Settings
|
||||
|
||||
Align Bifrost's internal settings with your container resources:
|
||||
|
||||
### Concurrency and Buffer Size
|
||||
|
||||
Configure per provider in `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openai": {
|
||||
"concurrency_and_buffer_size": {
|
||||
"concurrency": 1000,
|
||||
"buffer_size": 1500
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Formula:**
|
||||
- `concurrency` = expected RPS per provider
|
||||
- `buffer_size` = 1.5 × concurrency
|
||||
|
||||
### Initial Pool Size
|
||||
|
||||
Configure globally in `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"initial_pool_size": 3000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Formula:** `initial_pool_size` = 1.5 × total expected RPS across all providers
|
||||
|
||||
<Tip>
|
||||
See the [Performance Tuning](/providers/performance) guide for detailed sizing recommendations.
|
||||
</Tip>
|
||||
|
||||
---
|
||||
|
||||
## Tuning Checklist
|
||||
|
||||
<Steps>
|
||||
<Step title="Set container resource limits">
|
||||
Define CPU and memory limits based on expected workload. Start with 2 CPUs / 2GB for moderate loads.
|
||||
</Step>
|
||||
<Step title="Configure GOMEMLIMIT">
|
||||
Set to 90% of container memory limit (e.g., `1800MiB` for 2GB container).
|
||||
</Step>
|
||||
<Step title="Tune GOGC">
|
||||
Start with `GOGC=200` for throughput; reduce to 100 if memory pressure is high.
|
||||
</Step>
|
||||
<Step title="Set file descriptor limits">
|
||||
Set `nofile` ulimit to at least 2× your expected concurrent connections.
|
||||
</Step>
|
||||
<Step title="Align Bifrost settings">
|
||||
Match `concurrency` and `buffer_size` to your container's CPU count and expected RPS.
|
||||
</Step>
|
||||
<Step title="Monitor and adjust">
|
||||
Watch memory usage, GC pause times, and request latencies. Adjust settings based on observed behavior.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
- Reduce `GOGC` (e.g., from 200 to 100)
|
||||
- Ensure `GOMEMLIMIT` is set
|
||||
- Reduce `buffer_size` and `initial_pool_size`
|
||||
|
||||
### High Latency Spikes
|
||||
|
||||
- May indicate GC pauses; try reducing `GOGC`
|
||||
- Check if container is hitting CPU limits
|
||||
- Verify `GOMAXPROCS` matches container CPU quota (check startup logs)
|
||||
|
||||
### Connection Errors Under Load
|
||||
|
||||
- Increase `nofile` ulimit
|
||||
- Ensure `buffer_size` is large enough for traffic spikes
|
||||
- Check provider rate limits
|
||||
|
||||
### Container OOM Killed
|
||||
|
||||
- Reduce `GOMEMLIMIT` to 85% of container memory
|
||||
- Reduce `GOGC` to trigger more frequent GC
|
||||
- Reduce `buffer_size` and `initial_pool_size`
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **[Performance Tuning](/providers/performance)** - Bifrost-specific performance configuration
|
||||
- **[Helm Deployment](/deployment-guides/helm)** - Kubernetes deployment with Helm
|
||||
- **[Multi-Node Setup](/deployment-guides/how-to/multinode)** - Scaling across multiple instances
|
||||
1546
docs/deployment-guides/ecs.mdx
Normal file
1546
docs/deployment-guides/ecs.mdx
Normal file
File diff suppressed because it is too large
Load Diff
378
docs/deployment-guides/enterprise/aws.mdx
Normal file
378
docs/deployment-guides/enterprise/aws.mdx
Normal file
@@ -0,0 +1,378 @@
|
||||
---
|
||||
title: "AWS Deployment"
|
||||
description: "Deploy Bifrost Enterprise on AWS using ECR with IRSA or IAM Task Roles"
|
||||
icon: "aws"
|
||||
---
|
||||
|
||||
Bifrost Enterprise images for AWS customers are distributed through AWS ECR, enabling native IAM integration for secure, credential-less authentication.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph AWS[AWS Account]
|
||||
subgraph EKS[EKS Cluster]
|
||||
Pod[Bifrost Pod]
|
||||
KSA[K8s ServiceAccount]
|
||||
end
|
||||
IAMRole[IAM Role]
|
||||
ECR[AWS ECR<br/>Bifrost Images]
|
||||
end
|
||||
|
||||
KSA -->|Annotated with| IAMRole
|
||||
Pod -->|Assumes| IAMRole
|
||||
IAMRole -->|Pull Permission| ECR
|
||||
ECR -->|Image| Pod
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- EKS cluster (v1.23+) or ECS cluster
|
||||
- AWS CLI configured with appropriate permissions
|
||||
- `kubectl` configured for your EKS cluster
|
||||
- Your AWS Account ID allowlisted by Bifrost team
|
||||
|
||||
<Note>
|
||||
Contact the Bifrost team to get your AWS account ID and IAM role ARN allowlisted for ECR access.
|
||||
</Note>
|
||||
|
||||
## IRSA (Recommended)
|
||||
|
||||
IAM Roles for Service Accounts (IRSA) provides the most secure authentication method for EKS deployments.
|
||||
|
||||
### Step 1: Create IAM Policy
|
||||
|
||||
Create an IAM policy that grants ECR pull access to the Bifrost repository.
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "ECRAuth",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"ecr:GetAuthorizationToken"
|
||||
],
|
||||
"Resource": "*"
|
||||
},
|
||||
{
|
||||
"Sid": "ECRPullFromBifrost",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"ecr:BatchGetImage",
|
||||
"ecr:GetDownloadUrlForLayer",
|
||||
"ecr:BatchCheckLayerAvailability"
|
||||
],
|
||||
"Resource": "arn:aws:ecr:us-east-1:BIFROST_ACCOUNT_ID:repository/YOUR_HUB_SLUG"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
<Warning>
|
||||
Replace `BIFROST_ACCOUNT_ID` and `YOUR_HUB_SLUG` with the values provided by the Bifrost team.
|
||||
</Warning>
|
||||
|
||||
Save this policy as `bifrost-ecr-pull-policy.json` and create it:
|
||||
|
||||
```bash
|
||||
aws iam create-policy \
|
||||
--policy-name BifrostECRPullPolicy \
|
||||
--policy-document file://bifrost-ecr-pull-policy.json
|
||||
```
|
||||
|
||||
### Step 2: Create IAM Role with OIDC Trust
|
||||
|
||||
Create an IAM role that can be assumed by your Kubernetes ServiceAccount.
|
||||
|
||||
First, get your OIDC provider URL:
|
||||
|
||||
```bash
|
||||
aws eks describe-cluster \
|
||||
--name YOUR_CLUSTER_NAME \
|
||||
--query "cluster.identity.oidc.issuer" \
|
||||
--output text
|
||||
```
|
||||
|
||||
Create the trust policy (`trust-policy.json`):
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Principal": {
|
||||
"Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/OIDC_ID"
|
||||
},
|
||||
"Action": "sts:AssumeRoleWithWebIdentity",
|
||||
"Condition": {
|
||||
"StringEquals": {
|
||||
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:aud": "sts.amazonaws.com",
|
||||
"oidc.eks.REGION.amazonaws.com/id/OIDC_ID:sub": "system:serviceaccount:NAMESPACE:bifrost-sa"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Create the role and attach the policy:
|
||||
|
||||
```bash
|
||||
# Create the role
|
||||
aws iam create-role \
|
||||
--role-name BifrostECRPullRole \
|
||||
--assume-role-policy-document file://trust-policy.json
|
||||
|
||||
# Attach the policy
|
||||
aws iam attach-role-policy \
|
||||
--role-name BifrostECRPullRole \
|
||||
--policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/BifrostECRPullPolicy
|
||||
```
|
||||
|
||||
### Step 3: Provide Role ARN to Bifrost
|
||||
|
||||
Send your IAM role ARN to the Bifrost team for allowlisting:
|
||||
|
||||
```
|
||||
arn:aws:iam::YOUR_ACCOUNT_ID:role/BifrostECRPullRole
|
||||
```
|
||||
|
||||
### Step 4: Create Namespace and ServiceAccount
|
||||
|
||||
```bash
|
||||
kubectl create namespace bifrost
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: bifrost-sa
|
||||
namespace: bifrost
|
||||
annotations:
|
||||
eks.amazonaws.com/role-arn: arn:aws:iam::YOUR_ACCOUNT_ID:role/BifrostECRPullRole
|
||||
```
|
||||
|
||||
### Step 5: Deploy Bifrost
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
replicas: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bifrost
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bifrost
|
||||
spec:
|
||||
serviceAccountName: bifrost-sa
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: BIFROST_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/YOUR_HUB_SLUG:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http
|
||||
resources:
|
||||
requests:
|
||||
cpu: "250m"
|
||||
memory: "512Mi"
|
||||
limits:
|
||||
cpu: "1000m"
|
||||
memory: "2Gi"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 5
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app/data/config.json
|
||||
subPath: config.json
|
||||
volumes:
|
||||
- name: config
|
||||
secret:
|
||||
secretName: bifrost-config
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
selector:
|
||||
app: bifrost
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
type: ClusterIP
|
||||
```
|
||||
|
||||
## ECS Task Roles
|
||||
|
||||
For ECS deployments, use IAM Task Roles for authentication.
|
||||
|
||||
### Step 1: Create Task Execution Role
|
||||
|
||||
The task execution role allows ECS to pull images from ECR.
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"ecr:GetAuthorizationToken"
|
||||
],
|
||||
"Resource": "*"
|
||||
},
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"ecr:BatchCheckLayerAvailability",
|
||||
"ecr:GetDownloadUrlForLayer",
|
||||
"ecr:BatchGetImage"
|
||||
],
|
||||
"Resource": "arn:aws:ecr:us-east-1:BIFROST_ACCOUNT_ID:repository/YOUR_HUB_SLUG"
|
||||
},
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"logs:CreateLogStream",
|
||||
"logs:PutLogEvents"
|
||||
],
|
||||
"Resource": "*"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Create ECS Task Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"family": "bifrost",
|
||||
"networkMode": "awsvpc",
|
||||
"requiresCompatibilities": ["FARGATE"],
|
||||
"cpu": "512",
|
||||
"memory": "1024",
|
||||
"executionRoleArn": "arn:aws:iam::YOUR_ACCOUNT_ID:role/BifrostECSExecutionRole",
|
||||
"containerDefinitions": [
|
||||
{
|
||||
"name": "bifrost",
|
||||
"image": "BIFROST_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/YOUR_HUB_SLUG:latest",
|
||||
"portMappings": [
|
||||
{
|
||||
"containerPort": 8080,
|
||||
"protocol": "tcp"
|
||||
}
|
||||
],
|
||||
"healthCheck": {
|
||||
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
|
||||
"interval": 30,
|
||||
"timeout": 5,
|
||||
"retries": 3,
|
||||
"startPeriod": 60
|
||||
},
|
||||
"logConfiguration": {
|
||||
"logDriver": "awslogs",
|
||||
"options": {
|
||||
"awslogs-group": "/ecs/bifrost",
|
||||
"awslogs-region": "us-east-1",
|
||||
"awslogs-stream-prefix": "bifrost"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 3: Create ECS Service
|
||||
|
||||
```bash
|
||||
aws ecs create-service \
|
||||
--cluster your-cluster \
|
||||
--service-name bifrost \
|
||||
--task-definition bifrost \
|
||||
--desired-count 2 \
|
||||
--launch-type FARGATE \
|
||||
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx],assignPublicIp=ENABLED}"
|
||||
```
|
||||
|
||||
## Verifying Access
|
||||
|
||||
### Test ECR Authentication
|
||||
|
||||
```bash
|
||||
# Get ECR login token
|
||||
aws ecr get-login-password --region us-east-1 | \
|
||||
docker login --username AWS --password-stdin \
|
||||
BIFROST_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
|
||||
|
||||
# Pull test
|
||||
docker pull BIFROST_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/YOUR_HUB_SLUG:latest
|
||||
```
|
||||
|
||||
### Verify IRSA Configuration
|
||||
|
||||
```bash
|
||||
# Check ServiceAccount annotation
|
||||
kubectl get sa bifrost-sa -n bifrost -o yaml
|
||||
|
||||
# Verify pod can assume role
|
||||
kubectl exec -it deployment/bifrost -n bifrost -- \
|
||||
aws sts get-caller-identity
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### ImagePullBackOff Errors
|
||||
|
||||
1. **Check IAM Role trust policy**: Ensure the OIDC provider and ServiceAccount match
|
||||
2. **Verify ECR permissions**: Confirm the role has `ecr:BatchGetImage` permission
|
||||
3. **Check allowlisting**: Ensure your role ARN is allowlisted by Bifrost team
|
||||
|
||||
```bash
|
||||
# Check pod events
|
||||
kubectl describe pod -l app=bifrost -n bifrost
|
||||
|
||||
# Check IRSA token
|
||||
kubectl exec -it deployment/bifrost -n bifrost -- \
|
||||
cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token
|
||||
```
|
||||
|
||||
### Authentication Errors
|
||||
|
||||
```bash
|
||||
# Verify OIDC provider is configured
|
||||
aws iam list-open-id-connect-providers
|
||||
|
||||
# Check role assumption
|
||||
aws sts assume-role-with-web-identity \
|
||||
--role-arn arn:aws:iam::YOUR_ACCOUNT_ID:role/BifrostECRPullRole \
|
||||
--role-session-name test \
|
||||
--web-identity-token file:///path/to/token
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Configure [Bifrost settings](/quickstart/gateway/setting-up) for your use case
|
||||
- Set up [observability](/features/observability/default) for monitoring
|
||||
- Enable [clustering](/enterprise/clustering) for high availability
|
||||
451
docs/deployment-guides/enterprise/azure.mdx
Normal file
451
docs/deployment-guides/enterprise/azure.mdx
Normal file
@@ -0,0 +1,451 @@
|
||||
---
|
||||
title: "Azure Deployment"
|
||||
description: "Deploy Bifrost Enterprise on Azure AKS using Workload Identity Federation to GCP Artifact Registry"
|
||||
icon: "microsoft"
|
||||
---
|
||||
|
||||
Bifrost Enterprise images for Azure customers are distributed through GCP Artifact Registry, using Azure Workload Identity Federation for secure, credential-less authentication.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Azure[Azure Subscription]
|
||||
subgraph AKS[AKS Cluster]
|
||||
Pod[Bifrost Pod]
|
||||
KSA[K8s ServiceAccount]
|
||||
end
|
||||
MI[Managed Identity]
|
||||
end
|
||||
|
||||
subgraph GCP[GCP Project]
|
||||
WIF[Workload Identity<br/>Federation Pool]
|
||||
GSA[GCP Service Account]
|
||||
AR[Artifact Registry<br/>Bifrost Images]
|
||||
end
|
||||
|
||||
KSA -->|Federated| MI
|
||||
MI -->|OIDC Token| WIF
|
||||
WIF -->|Exchange| GSA
|
||||
GSA -->|Pull Permission| AR
|
||||
AR -->|Image| Pod
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
Azure Workload Identity Federation allows Azure Managed Identities to authenticate to GCP without exchanging credentials:
|
||||
|
||||
1. **AKS Pod** requests a token using its Kubernetes ServiceAccount
|
||||
2. **Azure AD** issues an OIDC token for the Managed Identity
|
||||
3. **GCP Workload Identity Federation** validates the Azure token
|
||||
4. **GCP STS** exchanges it for a GCP access token
|
||||
5. **Pod** uses the GCP token to pull images from Artifact Registry
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- AKS cluster (v1.24+) with Workload Identity enabled
|
||||
- Azure CLI configured with appropriate permissions
|
||||
- `kubectl` configured for your AKS cluster
|
||||
- Your Azure Tenant ID and Managed Identity Client ID provided to Bifrost team
|
||||
|
||||
<Note>
|
||||
Contact the Bifrost team with your Azure Tenant ID and Managed Identity Client IDs to get access configured.
|
||||
</Note>
|
||||
|
||||
## Step 1: Enable Workload Identity on AKS
|
||||
|
||||
If not already enabled, enable Workload Identity on your AKS cluster:
|
||||
|
||||
```bash
|
||||
# For existing cluster
|
||||
az aks update \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--name YOUR_CLUSTER_NAME \
|
||||
--enable-oidc-issuer \
|
||||
--enable-workload-identity
|
||||
|
||||
# Get the OIDC issuer URL
|
||||
az aks show \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--name YOUR_CLUSTER_NAME \
|
||||
--query "oidcIssuerProfile.issuerUrl" -o tsv
|
||||
```
|
||||
|
||||
## Step 2: Create Azure Managed Identity
|
||||
|
||||
```bash
|
||||
# Create Managed Identity
|
||||
az identity create \
|
||||
--name bifrost-pull-identity \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--location YOUR_LOCATION
|
||||
|
||||
# Get the Client ID
|
||||
CLIENT_ID=$(az identity show \
|
||||
--name bifrost-pull-identity \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--query clientId -o tsv)
|
||||
|
||||
echo "Client ID: $CLIENT_ID"
|
||||
```
|
||||
|
||||
## Step 3: Create Federated Credential
|
||||
|
||||
Link the Kubernetes ServiceAccount to the Azure Managed Identity:
|
||||
|
||||
```bash
|
||||
# Get AKS OIDC issuer
|
||||
AKS_OIDC_ISSUER=$(az aks show \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--name YOUR_CLUSTER_NAME \
|
||||
--query "oidcIssuerProfile.issuerUrl" -o tsv)
|
||||
|
||||
# Create federated credential
|
||||
az identity federated-credential create \
|
||||
--name bifrost-federated-credential \
|
||||
--identity-name bifrost-pull-identity \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--issuer "$AKS_OIDC_ISSUER" \
|
||||
--subject "system:serviceaccount:bifrost:bifrost-sa" \
|
||||
--audience "api://AzureADTokenExchange"
|
||||
```
|
||||
|
||||
## Step 4: Provide Details to Bifrost Team
|
||||
|
||||
Send the following information to the Bifrost team:
|
||||
|
||||
```bash
|
||||
# Get Tenant ID
|
||||
az account show --query tenantId -o tsv
|
||||
|
||||
# Get Client ID
|
||||
az identity show \
|
||||
--name bifrost-pull-identity \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--query clientId -o tsv
|
||||
```
|
||||
|
||||
The Bifrost team will configure GCP Workload Identity Federation to trust your Azure Managed Identity.
|
||||
|
||||
## Step 5: Store GCP Credential Configuration
|
||||
|
||||
After the Bifrost team configures access, they will provide a credential configuration. Store it as a ConfigMap:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: gcp-credential-config
|
||||
namespace: bifrost
|
||||
data:
|
||||
credential-config.json: |
|
||||
{
|
||||
"type": "external_account",
|
||||
"audience": "//iam.googleapis.com/projects/BIFROST_PROJECT_NUMBER/locations/global/workloadIdentityPools/YOUR_HUB_SLUG-azure-pool/providers/YOUR_HUB_SLUG-azure-provider",
|
||||
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
|
||||
"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/BIFROST_SA@BIFROST_PROJECT.iam.gserviceaccount.com:generateAccessToken",
|
||||
"token_url": "https://sts.googleapis.com/v1/token",
|
||||
"credential_source": {
|
||||
"file": "/var/run/secrets/azure/tokens/azure-identity-token",
|
||||
"format": {
|
||||
"type": "text"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<Warning>
|
||||
The Bifrost team will provide the exact values for `BIFROST_PROJECT_NUMBER`, `YOUR_HUB_SLUG`, and `BIFROST_SA`.
|
||||
</Warning>
|
||||
|
||||
## Step 6: Create Kubernetes ServiceAccount
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: bifrost-sa
|
||||
namespace: bifrost
|
||||
annotations:
|
||||
azure.workload.identity/client-id: YOUR_MANAGED_IDENTITY_CLIENT_ID
|
||||
labels:
|
||||
azure.workload.identity/use: "true"
|
||||
```
|
||||
|
||||
## Step 7: Create Image Pull Secret with Token Refresh
|
||||
|
||||
Create a CronJob to refresh the imagePullSecret using the federated identity:
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: refresh-ar-secret
|
||||
namespace: bifrost
|
||||
spec:
|
||||
schedule: "*/30 * * * *" # Every 30 minutes
|
||||
successfulJobsHistoryLimit: 1
|
||||
failedJobsHistoryLimit: 3
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
azure.workload.identity/use: "true"
|
||||
spec:
|
||||
serviceAccountName: bifrost-sa
|
||||
containers:
|
||||
- name: token-refresh
|
||||
image: google/cloud-sdk:slim
|
||||
command: ["/bin/bash", "-c"]
|
||||
args:
|
||||
- |
|
||||
set -e
|
||||
|
||||
# Set GCP credential config
|
||||
export GOOGLE_APPLICATION_CREDENTIALS=/etc/gcp/credential-config.json
|
||||
|
||||
# Get GCP access token via federation
|
||||
TOKEN=$(gcloud auth print-access-token)
|
||||
|
||||
# Delete existing secret if it exists
|
||||
kubectl delete secret ar-pull-secret --ignore-not-found -n bifrost
|
||||
|
||||
# Create new imagePullSecret
|
||||
kubectl create secret docker-registry ar-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=oauth2accesstoken \
|
||||
--docker-password="$TOKEN" \
|
||||
-n bifrost
|
||||
|
||||
echo "Secret refreshed at $(date)"
|
||||
volumeMounts:
|
||||
- name: gcp-credential-config
|
||||
mountPath: /etc/gcp
|
||||
readOnly: true
|
||||
- name: azure-identity-token
|
||||
mountPath: /var/run/secrets/azure/tokens
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: gcp-credential-config
|
||||
configMap:
|
||||
name: gcp-credential-config
|
||||
- name: azure-identity-token
|
||||
projected:
|
||||
sources:
|
||||
- serviceAccountToken:
|
||||
path: azure-identity-token
|
||||
expirationSeconds: 3600
|
||||
audience: api://AzureADTokenExchange
|
||||
restartPolicy: OnFailure
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: secret-manager
|
||||
namespace: bifrost
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["secrets"]
|
||||
verbs: ["get", "create", "delete"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: secret-manager-binding
|
||||
namespace: bifrost
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: bifrost-sa
|
||||
namespace: bifrost
|
||||
roleRef:
|
||||
kind: Role
|
||||
name: secret-manager
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
```
|
||||
|
||||
## Step 8: Deploy Bifrost
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
replicas: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bifrost
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bifrost
|
||||
azure.workload.identity/use: "true"
|
||||
spec:
|
||||
serviceAccountName: bifrost-sa
|
||||
imagePullSecrets:
|
||||
- name: ar-pull-secret
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http
|
||||
resources:
|
||||
requests:
|
||||
cpu: "250m"
|
||||
memory: "512Mi"
|
||||
limits:
|
||||
cpu: "1000m"
|
||||
memory: "2Gi"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 5
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app/data/config.json
|
||||
subPath: config.json
|
||||
volumes:
|
||||
- name: config
|
||||
secret:
|
||||
secretName: bifrost-config
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
selector:
|
||||
app: bifrost
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
type: ClusterIP
|
||||
```
|
||||
|
||||
## Bootstrap: Initial Secret Creation
|
||||
|
||||
Before the first deployment, manually trigger the CronJob or create the secret:
|
||||
|
||||
```bash
|
||||
# Create namespace
|
||||
kubectl create namespace bifrost
|
||||
|
||||
# Apply all configurations
|
||||
kubectl apply -f configmap.yaml
|
||||
kubectl apply -f serviceaccount.yaml
|
||||
kubectl apply -f cronjob.yaml
|
||||
|
||||
# Manually trigger the CronJob
|
||||
kubectl create job --from=cronjob/refresh-ar-secret initial-refresh -n bifrost
|
||||
|
||||
# Wait for completion
|
||||
kubectl wait --for=condition=complete job/initial-refresh -n bifrost --timeout=120s
|
||||
|
||||
# Verify secret was created
|
||||
kubectl get secret ar-pull-secret -n bifrost
|
||||
```
|
||||
|
||||
## Verifying Access
|
||||
|
||||
### Check Workload Identity Configuration
|
||||
|
||||
```bash
|
||||
# Verify AKS has Workload Identity enabled
|
||||
az aks show \
|
||||
--resource-group YOUR_RESOURCE_GROUP \
|
||||
--name YOUR_CLUSTER_NAME \
|
||||
--query "oidcIssuerProfile.enabled" -o tsv
|
||||
|
||||
# Check federated credential
|
||||
az identity federated-credential show \
|
||||
--name bifrost-federated-credential \
|
||||
--identity-name bifrost-pull-identity \
|
||||
--resource-group YOUR_RESOURCE_GROUP
|
||||
```
|
||||
|
||||
### Verify Token Exchange
|
||||
|
||||
```bash
|
||||
# Check CronJob ran successfully
|
||||
kubectl get jobs -n bifrost
|
||||
|
||||
# View CronJob logs
|
||||
kubectl logs -l job-name=refresh-ar-secret -n bifrost
|
||||
|
||||
# Verify imagePullSecret exists
|
||||
kubectl get secret ar-pull-secret -n bifrost -o yaml
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### ImagePullBackOff Errors
|
||||
|
||||
1. **Check imagePullSecret exists**: `kubectl get secret ar-pull-secret -n bifrost`
|
||||
2. **Verify CronJob succeeded**: `kubectl get jobs -n bifrost`
|
||||
3. **Check Azure Workload Identity**: Ensure labels are set correctly
|
||||
|
||||
```bash
|
||||
# Check pod events
|
||||
kubectl describe pod -l app=bifrost -n bifrost
|
||||
|
||||
# Check ServiceAccount has correct annotations
|
||||
kubectl get sa bifrost-sa -n bifrost -o yaml
|
||||
```
|
||||
|
||||
### Token Exchange Failures
|
||||
|
||||
```bash
|
||||
# Check CronJob logs for errors
|
||||
kubectl logs -l job-name=refresh-ar-secret -n bifrost
|
||||
|
||||
# Common issues:
|
||||
# - "audience mismatch": Check credential-config.json audience field
|
||||
# - "subject mismatch": Verify federated credential subject matches SA
|
||||
# - "permission denied": Contact Bifrost team to verify WIF configuration
|
||||
```
|
||||
|
||||
### Azure Workload Identity Issues
|
||||
|
||||
```bash
|
||||
# Verify Managed Identity exists
|
||||
az identity show \
|
||||
--name bifrost-pull-identity \
|
||||
--resource-group YOUR_RESOURCE_GROUP
|
||||
|
||||
# Check federated credentials
|
||||
az identity federated-credential list \
|
||||
--identity-name bifrost-pull-identity \
|
||||
--resource-group YOUR_RESOURCE_GROUP
|
||||
|
||||
# Verify pod has identity token mounted
|
||||
kubectl exec -it deployment/bifrost -n bifrost -- \
|
||||
ls -la /var/run/secrets/azure/tokens/
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
| Component | Value |
|
||||
|-----------|-------|
|
||||
| Registry | GCP Artifact Registry |
|
||||
| Authentication | Azure WIF -> GCP WIF -> GCP SA |
|
||||
| Token Lifetime | 60 minutes (auto-refreshed every 30 min) |
|
||||
| Secret Name | `ar-pull-secret` |
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Configure [Bifrost settings](/quickstart/gateway/setting-up) for your use case
|
||||
- Set up [observability](/features/observability/default) for monitoring
|
||||
- Enable [clustering](/enterprise/clustering) for high availability
|
||||
386
docs/deployment-guides/enterprise/gcp.mdx
Normal file
386
docs/deployment-guides/enterprise/gcp.mdx
Normal file
@@ -0,0 +1,386 @@
|
||||
---
|
||||
title: "GCP Deployment"
|
||||
description: "Deploy Bifrost Enterprise on GCP using Artifact Registry with Workload Identity"
|
||||
icon: "google"
|
||||
---
|
||||
|
||||
Bifrost Enterprise images for GCP customers are distributed through GCP Artifact Registry, enabling native Workload Identity for secure, keyless authentication.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph GCP[GCP Project]
|
||||
subgraph GKE[GKE Cluster]
|
||||
Pod[Bifrost Pod]
|
||||
KSA[K8s ServiceAccount]
|
||||
end
|
||||
GSA[GCP Service Account]
|
||||
AR[Artifact Registry<br/>Bifrost Images]
|
||||
end
|
||||
|
||||
KSA -->|Workload Identity| GSA
|
||||
Pod -->|Impersonates| GSA
|
||||
GSA -->|Pull Permission| AR
|
||||
AR -->|Image| Pod
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- GKE cluster (v1.24+) with Workload Identity enabled
|
||||
- `gcloud` CLI configured with appropriate permissions
|
||||
- `kubectl` configured for your GKE cluster
|
||||
- Your GCP project allowlisted by Bifrost team
|
||||
|
||||
<Note>
|
||||
Contact the Bifrost team with your GCP project ID and service account email to get access configured.
|
||||
</Note>
|
||||
|
||||
## Workload Identity (Recommended)
|
||||
|
||||
Workload Identity provides the most secure authentication method for GKE deployments by eliminating the need for service account keys.
|
||||
|
||||
### Step 1: Enable Workload Identity on GKE
|
||||
|
||||
If not already enabled, enable Workload Identity on your cluster:
|
||||
|
||||
```bash
|
||||
# For existing cluster
|
||||
gcloud container clusters update YOUR_CLUSTER_NAME \
|
||||
--region=YOUR_REGION \
|
||||
--workload-pool=YOUR_PROJECT_ID.svc.id.goog
|
||||
|
||||
# Verify Workload Identity is enabled
|
||||
gcloud container clusters describe YOUR_CLUSTER_NAME \
|
||||
--region=YOUR_REGION \
|
||||
--format="value(workloadIdentityConfig.workloadPool)"
|
||||
```
|
||||
|
||||
### Step 2: Create GCP Service Account
|
||||
|
||||
Create a service account that will be used to pull images:
|
||||
|
||||
```bash
|
||||
# Create service account
|
||||
gcloud iam service-accounts create bifrost-pull-sa \
|
||||
--display-name="Bifrost Image Pull SA" \
|
||||
--project=YOUR_PROJECT_ID
|
||||
```
|
||||
|
||||
### Step 3: Request Access from Bifrost Team
|
||||
|
||||
Provide the following to the Bifrost team:
|
||||
- Your GCP project ID
|
||||
- Service account email: `bifrost-pull-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com`
|
||||
|
||||
The Bifrost team will grant the necessary permissions to pull images from the registry.
|
||||
|
||||
### Step 4: Create Namespace and ServiceAccount
|
||||
|
||||
```bash
|
||||
kubectl create namespace bifrost
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: bifrost-sa
|
||||
namespace: bifrost
|
||||
annotations:
|
||||
iam.gke.io/gcp-service-account: bifrost-pull-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com
|
||||
```
|
||||
|
||||
### Step 5: Bind Kubernetes SA to GCP SA
|
||||
|
||||
Allow the Kubernetes ServiceAccount to impersonate the GCP Service Account:
|
||||
|
||||
```bash
|
||||
gcloud iam service-accounts add-iam-policy-binding \
|
||||
bifrost-pull-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com \
|
||||
--role=roles/iam.workloadIdentityUser \
|
||||
--member="serviceAccount:YOUR_PROJECT_ID.svc.id.goog[bifrost/bifrost-sa]"
|
||||
```
|
||||
|
||||
### Step 6: Create Image Pull Secret with Token Refresh
|
||||
|
||||
Artifact Registry tokens expire after 60 minutes. Use a CronJob to refresh the imagePullSecret:
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: refresh-ar-secret
|
||||
namespace: bifrost
|
||||
spec:
|
||||
schedule: "*/30 * * * *" # Every 30 minutes
|
||||
successfulJobsHistoryLimit: 1
|
||||
failedJobsHistoryLimit: 3
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
serviceAccountName: bifrost-sa
|
||||
containers:
|
||||
- name: token-refresh
|
||||
image: google/cloud-sdk:slim
|
||||
command: ["/bin/bash", "-c"]
|
||||
args:
|
||||
- |
|
||||
set -e
|
||||
|
||||
# Get access token using Workload Identity
|
||||
TOKEN=$(gcloud auth print-access-token)
|
||||
|
||||
# Delete existing secret if it exists
|
||||
kubectl delete secret ar-pull-secret --ignore-not-found -n bifrost
|
||||
|
||||
# Create new imagePullSecret
|
||||
kubectl create secret docker-registry ar-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=oauth2accesstoken \
|
||||
--docker-password="$TOKEN" \
|
||||
-n bifrost
|
||||
|
||||
echo "Secret refreshed at $(date)"
|
||||
restartPolicy: OnFailure
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: secret-manager
|
||||
namespace: bifrost
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["secrets"]
|
||||
verbs: ["get", "create", "delete"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: secret-manager-binding
|
||||
namespace: bifrost
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: bifrost-sa
|
||||
namespace: bifrost
|
||||
roleRef:
|
||||
kind: Role
|
||||
name: secret-manager
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
```
|
||||
|
||||
<Warning>
|
||||
Replace `REGION` with your Artifact Registry region (e.g., `us-central1`).
|
||||
</Warning>
|
||||
|
||||
### Step 7: Deploy Bifrost
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
replicas: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bifrost
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bifrost
|
||||
spec:
|
||||
serviceAccountName: bifrost-sa
|
||||
imagePullSecrets:
|
||||
- name: ar-pull-secret
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http
|
||||
resources:
|
||||
requests:
|
||||
cpu: "250m"
|
||||
memory: "512Mi"
|
||||
limits:
|
||||
cpu: "1000m"
|
||||
memory: "2Gi"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 5
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app/data/config.json
|
||||
subPath: config.json
|
||||
volumes:
|
||||
- name: config
|
||||
secret:
|
||||
secretName: bifrost-config
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
selector:
|
||||
app: bifrost
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
type: ClusterIP
|
||||
```
|
||||
|
||||
### Bootstrap: Initial Secret Creation
|
||||
|
||||
Before the first deployment, manually create the initial imagePullSecret:
|
||||
|
||||
```bash
|
||||
# Authenticate gcloud
|
||||
gcloud auth login
|
||||
|
||||
# Create initial secret
|
||||
kubectl create secret docker-registry ar-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=oauth2accesstoken \
|
||||
--docker-password="$(gcloud auth print-access-token)" \
|
||||
-n bifrost
|
||||
```
|
||||
|
||||
## Service Account Impersonation
|
||||
|
||||
For cross-project deployments or when you need to use an existing service account:
|
||||
|
||||
### Configure Impersonation
|
||||
|
||||
```bash
|
||||
# Grant impersonation permission
|
||||
gcloud iam service-accounts add-iam-policy-binding \
|
||||
BIFROST_PROVIDED_SA@BIFROST_PROJECT.iam.gserviceaccount.com \
|
||||
--role=roles/iam.serviceAccountTokenCreator \
|
||||
--member="serviceAccount:bifrost-pull-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com"
|
||||
```
|
||||
|
||||
### Token Refresh with Impersonation
|
||||
|
||||
Update the CronJob to use impersonation:
|
||||
|
||||
```yaml
|
||||
args:
|
||||
- |
|
||||
set -e
|
||||
|
||||
# Get access token by impersonating the Bifrost SA
|
||||
TOKEN=$(gcloud auth print-access-token \
|
||||
--impersonate-service-account=BIFROST_PROVIDED_SA@BIFROST_PROJECT.iam.gserviceaccount.com)
|
||||
|
||||
kubectl delete secret ar-pull-secret --ignore-not-found -n bifrost
|
||||
kubectl create secret docker-registry ar-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=oauth2accesstoken \
|
||||
--docker-password="$TOKEN" \
|
||||
-n bifrost
|
||||
```
|
||||
|
||||
## Service Account Key (Legacy)
|
||||
|
||||
<Warning>
|
||||
Service account keys are not recommended for production. Use Workload Identity instead.
|
||||
</Warning>
|
||||
|
||||
For environments that cannot use Workload Identity:
|
||||
|
||||
```bash
|
||||
# Create key (provided by Bifrost team)
|
||||
# Store key securely
|
||||
|
||||
# Create imagePullSecret
|
||||
kubectl create secret docker-registry ar-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=_json_key \
|
||||
--docker-password="$(cat sa-key.json)" \
|
||||
-n bifrost
|
||||
```
|
||||
|
||||
## Verifying Access
|
||||
|
||||
### Test Artifact Registry Authentication
|
||||
|
||||
```bash
|
||||
# Configure docker for Artifact Registry
|
||||
gcloud auth configure-docker REGION-docker.pkg.dev
|
||||
|
||||
# Pull test (requires impersonation or direct access)
|
||||
docker pull REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
```
|
||||
|
||||
### Verify Workload Identity Configuration
|
||||
|
||||
```bash
|
||||
# Check ServiceAccount annotation
|
||||
kubectl get sa bifrost-sa -n bifrost -o yaml
|
||||
|
||||
# Verify pod can authenticate
|
||||
kubectl exec -it deployment/bifrost -n bifrost -- \
|
||||
gcloud auth print-access-token
|
||||
|
||||
# Check token refresh CronJob
|
||||
kubectl get cronjob refresh-ar-secret -n bifrost
|
||||
kubectl get jobs -n bifrost
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### ImagePullBackOff Errors
|
||||
|
||||
1. **Check imagePullSecret exists**: `kubectl get secret ar-pull-secret -n bifrost`
|
||||
2. **Verify token is valid**: Check if CronJob ran successfully
|
||||
3. **Check Workload Identity binding**: Ensure GCP SA is bound to K8s SA
|
||||
|
||||
```bash
|
||||
# Check pod events
|
||||
kubectl describe pod -l app=bifrost -n bifrost
|
||||
|
||||
# Manually refresh token
|
||||
kubectl create job --from=cronjob/refresh-ar-secret manual-refresh -n bifrost
|
||||
```
|
||||
|
||||
### Workload Identity Issues
|
||||
|
||||
```bash
|
||||
# Verify Workload Identity pool
|
||||
gcloud container clusters describe YOUR_CLUSTER_NAME \
|
||||
--region=YOUR_REGION \
|
||||
--format="value(workloadIdentityConfig.workloadPool)"
|
||||
|
||||
# Check IAM binding
|
||||
gcloud iam service-accounts get-iam-policy \
|
||||
bifrost-pull-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com
|
||||
```
|
||||
|
||||
### Token Expiration
|
||||
|
||||
If pods fail to pull images after 60 minutes:
|
||||
|
||||
1. Verify CronJob is running: `kubectl get cronjob -n bifrost`
|
||||
2. Check CronJob logs: `kubectl logs -l job-name=refresh-ar-secret -n bifrost`
|
||||
3. Manually trigger refresh: `kubectl create job --from=cronjob/refresh-ar-secret manual-refresh -n bifrost`
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Configure [Bifrost settings](/quickstart/gateway/setting-up) for your use case
|
||||
- Set up [observability](/features/observability/default) for monitoring
|
||||
- Enable [clustering](/enterprise/clustering) for high availability
|
||||
541
docs/deployment-guides/enterprise/on-premise.mdx
Normal file
541
docs/deployment-guides/enterprise/on-premise.mdx
Normal file
@@ -0,0 +1,541 @@
|
||||
---
|
||||
title: "On-Premise Deployment"
|
||||
description: "Deploy Bifrost Enterprise in on-premise or air-gapped environments using Docker credentials"
|
||||
icon: "server"
|
||||
---
|
||||
|
||||
Bifrost Enterprise supports on-premise deployments for environments that cannot use cloud-native identity federation. Images are pulled from GCP Artifact Registry using username/password authentication.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph OnPrem[On-Premise Environment]
|
||||
subgraph K8s[Kubernetes Cluster]
|
||||
Pod[Bifrost Pod]
|
||||
Secret[imagePullSecret]
|
||||
end
|
||||
Docker[Docker Daemon]
|
||||
end
|
||||
|
||||
subgraph GCP[GCP]
|
||||
AR[Artifact Registry<br/>Bifrost Images]
|
||||
end
|
||||
|
||||
Secret -->|Credentials| Pod
|
||||
Pod -->|Pull| AR
|
||||
Docker -->|Pull| AR
|
||||
AR -->|Image| Pod
|
||||
AR -->|Image| Docker
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Kubernetes cluster (v1.23+) or Docker runtime
|
||||
- Network access to `us-central1-docker.pkg.dev` (or your designated region)
|
||||
- Docker credentials provided by Bifrost team
|
||||
|
||||
<Note>
|
||||
Contact the Bifrost team to receive your Docker username and password credentials.
|
||||
</Note>
|
||||
|
||||
## Credentials
|
||||
|
||||
The Bifrost team will provide you with:
|
||||
|
||||
| Credential | Description |
|
||||
|------------|-------------|
|
||||
| **Username** | `_json_key` (fixed value for GCP Artifact Registry) |
|
||||
| **Password** | Service account JSON key (base64 encoded or raw JSON) |
|
||||
| **Registry** | `REGION-docker.pkg.dev` (e.g., `us-central1-docker.pkg.dev`) |
|
||||
| **Repository** | `REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG` |
|
||||
|
||||
<Warning>
|
||||
Store credentials securely. Never commit them to version control or expose them in logs.
|
||||
</Warning>
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
### Step 1: Login to Registry
|
||||
|
||||
```bash
|
||||
# Using the JSON key file
|
||||
cat bifrost-credentials.json | docker login -u _json_key --password-stdin https://REGION-docker.pkg.dev
|
||||
|
||||
# Or using the password directly
|
||||
docker login -u _json_key -p "$(cat bifrost-credentials.json)" https://REGION-docker.pkg.dev
|
||||
```
|
||||
|
||||
### Step 2: Pull the Image
|
||||
|
||||
```bash
|
||||
docker pull REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
```
|
||||
|
||||
### Step 3: Run Bifrost
|
||||
|
||||
```bash
|
||||
docker run -d \
|
||||
--name bifrost \
|
||||
-p 8080:8080 \
|
||||
-v /path/to/config.json:/app/data/config.json:ro \
|
||||
-v /path/to/data:/app/data \
|
||||
REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
```
|
||||
|
||||
## Kubernetes Deployment
|
||||
|
||||
### Step 1: Create Namespace
|
||||
|
||||
```bash
|
||||
kubectl create namespace bifrost
|
||||
```
|
||||
|
||||
### Step 2: Create imagePullSecret
|
||||
|
||||
<Tabs>
|
||||
<Tab title="From JSON Key File">
|
||||
```bash
|
||||
kubectl create secret docker-registry bifrost-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=_json_key \
|
||||
--docker-password="$(cat bifrost-credentials.json)" \
|
||||
--namespace=bifrost
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="From Base64 Key">
|
||||
```bash
|
||||
# If you received a base64-encoded key
|
||||
kubectl create secret docker-registry bifrost-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=_json_key \
|
||||
--docker-password="$(echo 'BASE64_ENCODED_KEY' | base64 -d)" \
|
||||
--namespace=bifrost
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="Using YAML">
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: bifrost-pull-secret
|
||||
namespace: bifrost
|
||||
type: kubernetes.io/dockerconfigjson
|
||||
data:
|
||||
.dockerconfigjson: <BASE64_ENCODED_DOCKER_CONFIG>
|
||||
```
|
||||
|
||||
Generate the base64-encoded config:
|
||||
|
||||
```bash
|
||||
# Create docker config
|
||||
cat <<EOF > docker-config.json
|
||||
{
|
||||
"auths": {
|
||||
"REGION-docker.pkg.dev": {
|
||||
"username": "_json_key",
|
||||
"password": "$(cat bifrost-credentials.json | tr -d '\n')",
|
||||
"auth": "$(echo -n '_json_key:'$(cat bifrost-credentials.json | tr -d '\n') | base64 -w 0)"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Base64 encode for secret
|
||||
cat docker-config.json | base64 -w 0
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Step 3: Create Bifrost Configuration
|
||||
|
||||
<Note>
|
||||
If you use PostgreSQL for `config_store` or `logs_store`, ensure the target database is UTF8 encoded. See [PostgreSQL UTF8 Requirement](../../quickstart/gateway/setting-up#postgresql-utf8-requirement).
|
||||
</Note>
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: bifrost-config
|
||||
namespace: bifrost
|
||||
type: Opaque
|
||||
stringData:
|
||||
config.json: |
|
||||
{
|
||||
"config_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "postgres.bifrost.svc.cluster.local",
|
||||
"port": "5432",
|
||||
"user": "bifrost",
|
||||
"password": "YOUR_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "disable"
|
||||
}
|
||||
},
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {
|
||||
"host": "postgres.bifrost.svc.cluster.local",
|
||||
"port": "5432",
|
||||
"user": "bifrost",
|
||||
"password": "YOUR_PASSWORD",
|
||||
"db_name": "bifrost",
|
||||
"ssl_mode": "disable"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Deploy Bifrost
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
replicas: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bifrost
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bifrost
|
||||
spec:
|
||||
imagePullSecrets:
|
||||
- name: bifrost-pull-secret
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http
|
||||
resources:
|
||||
requests:
|
||||
cpu: "250m"
|
||||
memory: "512Mi"
|
||||
limits:
|
||||
cpu: "1000m"
|
||||
memory: "2Gi"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 5
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app/data/config.json
|
||||
subPath: config.json
|
||||
- name: data
|
||||
mountPath: /app/data
|
||||
volumes:
|
||||
- name: config
|
||||
secret:
|
||||
secretName: bifrost-config
|
||||
- name: data
|
||||
persistentVolumeClaim:
|
||||
claimName: bifrost-data
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
spec:
|
||||
selector:
|
||||
app: bifrost
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
type: ClusterIP
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: bifrost-data
|
||||
namespace: bifrost
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
```
|
||||
|
||||
### Step 5: Expose Bifrost (Optional)
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Ingress">
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: bifrost
|
||||
annotations:
|
||||
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
|
||||
spec:
|
||||
ingressClassName: nginx
|
||||
rules:
|
||||
- host: bifrost.your-domain.com
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: bifrost
|
||||
port:
|
||||
number: 80
|
||||
tls:
|
||||
- hosts:
|
||||
- bifrost.your-domain.com
|
||||
secretName: bifrost-tls
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="LoadBalancer">
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost-lb
|
||||
namespace: bifrost
|
||||
spec:
|
||||
selector:
|
||||
app: bifrost
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
type: LoadBalancer
|
||||
```
|
||||
</Tab>
|
||||
<Tab title="NodePort">
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost-nodeport
|
||||
namespace: bifrost
|
||||
spec:
|
||||
selector:
|
||||
app: bifrost
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
nodePort: 30080
|
||||
protocol: TCP
|
||||
type: NodePort
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Docker Compose Deployment
|
||||
|
||||
For simpler deployments without Kubernetes:
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
bifrost:
|
||||
image: REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
container_name: bifrost
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./config.json:/app/data/config.json:ro
|
||||
- bifrost-data:/app/data
|
||||
environment:
|
||||
- BIFROST_LOG_LEVEL=info
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
restart: unless-stopped
|
||||
|
||||
postgres:
|
||||
image: postgres:15-alpine
|
||||
container_name: bifrost-postgres
|
||||
environment:
|
||||
- POSTGRES_USER=bifrost
|
||||
- POSTGRES_PASSWORD=YOUR_PASSWORD
|
||||
- POSTGRES_DB=bifrost
|
||||
volumes:
|
||||
- postgres-data:/var/lib/postgresql/data
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U bifrost"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
bifrost-data:
|
||||
postgres-data:
|
||||
```
|
||||
|
||||
Login to registry before running:
|
||||
|
||||
```bash
|
||||
cat bifrost-credentials.json | docker login -u _json_key --password-stdin https://REGION-docker.pkg.dev
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Air-Gapped Environments
|
||||
|
||||
For environments without internet access, you can mirror the image to your internal registry.
|
||||
|
||||
### Step 1: Pull Image (Internet-Connected Machine)
|
||||
|
||||
```bash
|
||||
# Login and pull
|
||||
cat bifrost-credentials.json | docker login -u _json_key --password-stdin https://REGION-docker.pkg.dev
|
||||
docker pull REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
|
||||
# Save to tar file
|
||||
docker save REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest > bifrost-image.tar
|
||||
```
|
||||
|
||||
### Step 2: Transfer and Load (Air-Gapped Machine)
|
||||
|
||||
```bash
|
||||
# Load image
|
||||
docker load < bifrost-image.tar
|
||||
|
||||
# Tag for internal registry
|
||||
docker tag REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest \
|
||||
internal-registry.company.com/bifrost:latest
|
||||
|
||||
# Push to internal registry
|
||||
docker push internal-registry.company.com/bifrost:latest
|
||||
```
|
||||
|
||||
### Step 3: Update Kubernetes Manifests
|
||||
|
||||
Update the image reference in your deployment:
|
||||
|
||||
```yaml
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: internal-registry.company.com/bifrost:latest
|
||||
```
|
||||
|
||||
## Credential Rotation
|
||||
|
||||
When the Bifrost team rotates your credentials:
|
||||
|
||||
### Update Docker Login
|
||||
|
||||
```bash
|
||||
cat new-credentials.json | docker login -u _json_key --password-stdin https://REGION-docker.pkg.dev
|
||||
```
|
||||
|
||||
### Update Kubernetes Secret
|
||||
|
||||
```bash
|
||||
# Delete old secret
|
||||
kubectl delete secret bifrost-pull-secret -n bifrost
|
||||
|
||||
# Create new secret
|
||||
kubectl create secret docker-registry bifrost-pull-secret \
|
||||
--docker-server=REGION-docker.pkg.dev \
|
||||
--docker-username=_json_key \
|
||||
--docker-password="$(cat new-credentials.json)" \
|
||||
--namespace=bifrost
|
||||
|
||||
# Restart deployment to pick up new secret
|
||||
kubectl rollout restart deployment/bifrost -n bifrost
|
||||
```
|
||||
|
||||
## Verifying Access
|
||||
|
||||
### Test Docker Authentication
|
||||
|
||||
```bash
|
||||
# Verify login
|
||||
docker login -u _json_key -p "$(cat bifrost-credentials.json)" https://REGION-docker.pkg.dev
|
||||
|
||||
# Test pull
|
||||
docker pull REGION-docker.pkg.dev/BIFROST_PROJECT/YOUR_HUB_SLUG/bifrost:latest
|
||||
```
|
||||
|
||||
### Verify Kubernetes Secret
|
||||
|
||||
```bash
|
||||
# Check secret exists
|
||||
kubectl get secret bifrost-pull-secret -n bifrost
|
||||
|
||||
# Verify secret content (base64 encoded)
|
||||
kubectl get secret bifrost-pull-secret -n bifrost -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### ImagePullBackOff Errors
|
||||
|
||||
```bash
|
||||
# Check pod events
|
||||
kubectl describe pod -l app=bifrost -n bifrost
|
||||
|
||||
# Common issues:
|
||||
# - "unauthorized": Invalid credentials - check username/password
|
||||
# - "not found": Wrong repository path - verify with Bifrost team
|
||||
# - "connection refused": Network issue - check firewall rules
|
||||
```
|
||||
|
||||
### Network Connectivity
|
||||
|
||||
```bash
|
||||
# Test DNS resolution
|
||||
nslookup REGION-docker.pkg.dev
|
||||
|
||||
# Test HTTPS connectivity
|
||||
curl -v https://REGION-docker.pkg.dev/v2/
|
||||
|
||||
# Required outbound access:
|
||||
# - REGION-docker.pkg.dev:443
|
||||
# - oauth2.googleapis.com:443 (for token refresh)
|
||||
```
|
||||
|
||||
### Credential Issues
|
||||
|
||||
```bash
|
||||
# Verify JSON key format
|
||||
cat bifrost-credentials.json | jq .
|
||||
|
||||
# Check key hasn't expired
|
||||
cat bifrost-credentials.json | jq '.private_key_id'
|
||||
|
||||
# Contact Bifrost team if credentials are invalid
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **Store credentials securely**: Use a secrets manager (Vault, AWS Secrets Manager) for credential storage
|
||||
2. **Limit access**: Only grant imagePullSecret access to required namespaces
|
||||
3. **Rotate regularly**: Request credential rotation from Bifrost team periodically
|
||||
4. **Audit access**: Monitor image pull logs for unauthorized access attempts
|
||||
5. **Network isolation**: Restrict outbound access to only required registry endpoints
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Configure [Bifrost settings](/quickstart/gateway/setting-up) for your use case
|
||||
- Set up [observability](/features/observability/default) for monitoring
|
||||
- Enable [clustering](/enterprise/clustering) for high availability
|
||||
141
docs/deployment-guides/enterprise/overview.mdx
Normal file
141
docs/deployment-guides/enterprise/overview.mdx
Normal file
@@ -0,0 +1,141 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Deploy Bifrost Enterprise in your cloud environment with secure, private container image distribution"
|
||||
icon: "info-circle"
|
||||
---
|
||||
|
||||
Bifrost Enterprise provides private container image distribution through dedicated registries, enabling secure deployments in AWS, GCP, Azure, and on-premise environments.
|
||||
|
||||
## Architecture
|
||||
|
||||
Bifrost uses a hub-and-spoke model with two container registries optimized for each cloud platform:
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph BifrostInfra[Bifrost Infrastructure]
|
||||
CICD[CI/CD Pipeline]
|
||||
GCR[GCP Artifact Registry]
|
||||
ECR[AWS ECR]
|
||||
end
|
||||
|
||||
subgraph Customers[Customer Environments]
|
||||
subgraph AWSCustomer[AWS Customers]
|
||||
EKS[EKS Cluster]
|
||||
ECS[ECS Service]
|
||||
end
|
||||
subgraph GCPCustomer[GCP Customers]
|
||||
GKE[GKE Cluster]
|
||||
end
|
||||
subgraph AzureCustomer[Azure Customers]
|
||||
AKS[AKS Cluster]
|
||||
end
|
||||
subgraph OnPrem[On-Premise]
|
||||
K8S[Kubernetes]
|
||||
Docker[Docker]
|
||||
end
|
||||
end
|
||||
|
||||
CICD -->|Push| GCR
|
||||
CICD -->|Push| ECR
|
||||
|
||||
ECR -->|IRSA| EKS
|
||||
ECR -->|Task Role| ECS
|
||||
GCR -->|Workload Identity| GKE
|
||||
GCR -->|Azure WIF| AKS
|
||||
GCR -->|Basic Auth| OnPrem
|
||||
```
|
||||
|
||||
### Registry Distribution
|
||||
|
||||
| Customer Cloud | Registry Source | Why |
|
||||
|----------------|-----------------|-----|
|
||||
| AWS | AWS ECR | Native IAM integration, lowest latency within AWS |
|
||||
| GCP | GCP Artifact Registry | Native Workload Identity, lowest latency within GCP |
|
||||
| Azure | GCP Artifact Registry | Workload Identity Federation from Azure to GCP |
|
||||
| On-Premise | GCP Artifact Registry | Basic auth with username/password credentials |
|
||||
|
||||
## Authentication Methods
|
||||
|
||||
Choose the authentication method based on your deployment environment:
|
||||
|
||||
| Environment | Method | Security Level | Setup Complexity |
|
||||
|-------------|--------|----------------|------------------|
|
||||
| AWS EKS | [IRSA](/deployment-guides/enterprise/aws#irsa-recommended) | High | Medium |
|
||||
| AWS ECS | [IAM Task Roles](/deployment-guides/enterprise/aws#ecs-task-roles) | High | Low |
|
||||
| GCP GKE | [Workload Identity](/deployment-guides/enterprise/gcp#workload-identity-recommended) | High | Low |
|
||||
| Azure AKS | [Azure WIF](/deployment-guides/enterprise/azure) | High | Medium |
|
||||
| On-Premise | [Basic Auth](/deployment-guides/enterprise/on-premise) | Medium | Low |
|
||||
|
||||
<Note>
|
||||
Cloud-native identity federation (IRSA, Workload Identity, Azure WIF) is recommended over static credentials for production deployments.
|
||||
</Note>
|
||||
|
||||
## Security Features
|
||||
|
||||
### Encryption
|
||||
- **In-Transit**: All registry communication uses TLS 1.3
|
||||
- **At-Rest**: Images encrypted using cloud-native encryption (AWS KMS, GCP CMEK)
|
||||
|
||||
### Access Control
|
||||
- **IAM-based**: Fine-grained permissions using cloud IAM policies
|
||||
- **Audit Logging**: All image pull operations are logged for compliance
|
||||
- **IP Restrictions**: Optional VPC Service Controls (GCP) or VPC endpoints (AWS)
|
||||
|
||||
### Image Security
|
||||
- **Vulnerability Scanning**: Automatic scanning on push
|
||||
- **Immutable Tags**: Optional tag immutability to prevent overwrites
|
||||
- **Signed Images**: Container image signatures for verification
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before deploying Bifrost Enterprise, ensure you have:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="AWS">
|
||||
- AWS account with ECR access
|
||||
- EKS cluster (v1.23+) or ECS cluster
|
||||
- IAM permissions to create roles and policies
|
||||
- `kubectl` and `aws` CLI configured
|
||||
</Tab>
|
||||
<Tab title="GCP">
|
||||
- GCP project with Artifact Registry API enabled
|
||||
- GKE cluster (v1.24+) with Workload Identity enabled
|
||||
- IAM permissions for service account management
|
||||
- `kubectl` and `gcloud` CLI configured
|
||||
</Tab>
|
||||
<Tab title="Azure">
|
||||
- Azure subscription with AKS
|
||||
- AKS cluster (v1.24+) with Workload Identity enabled
|
||||
- Permissions to create Managed Identities
|
||||
- `kubectl` and `az` CLI configured
|
||||
</Tab>
|
||||
<Tab title="On-Premise">
|
||||
- Kubernetes cluster (v1.23+) or Docker runtime
|
||||
- Network access to `us-central1-docker.pkg.dev`
|
||||
- Docker credentials provided by Bifrost team
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Getting Started
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="AWS Deployment" icon="aws" href="/deployment-guides/enterprise/aws">
|
||||
Deploy on EKS or ECS with IRSA authentication
|
||||
</Card>
|
||||
<Card title="GCP Deployment" icon="google" href="/deployment-guides/enterprise/gcp">
|
||||
Deploy on GKE with Workload Identity
|
||||
</Card>
|
||||
<Card title="Azure Deployment" icon="microsoft" href="/deployment-guides/enterprise/azure">
|
||||
Deploy on AKS with Azure Workload Identity Federation
|
||||
</Card>
|
||||
<Card title="On-Premise" icon="server" href="/deployment-guides/enterprise/on-premise">
|
||||
Deploy anywhere with Docker credentials
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## Support
|
||||
|
||||
For enterprise deployment assistance:
|
||||
- **Email**: [contact@getmaxim.ai](mailto:contact@getmaxim.ai)
|
||||
- **Slack**: Connect via Slack Connect for real-time support
|
||||
- **Documentation**: Platform-specific guides linked above
|
||||
34
docs/deployment-guides/fly.mdx
Normal file
34
docs/deployment-guides/fly.mdx
Normal file
@@ -0,0 +1,34 @@
|
||||
---
|
||||
title: fly.io
|
||||
description: "This guide explains how to deploy Bifrost on fly.io"
|
||||
icon: "fly"
|
||||
---
|
||||
|
||||
As `Bifrost` uses multiple sub-modules (`core`, `framework`, etc.) and also embeds the front-end into a single binary (embed.FS), we use a custom Docker build step before we hand over the deployment to flyctl.
|
||||
|
||||
There are two ways to deploy Bifrost on Fly.io:
|
||||
|
||||
1. By cloning the repo
|
||||
2. Using flyctl + Docker Hub image
|
||||
|
||||
## By cloning the repo
|
||||
|
||||
1. Clone https://github.com/maximhq/bifrost
|
||||
2. Ensure [Make](/deployment-guides/how-to/install-make) is installed.
|
||||
3. Run `make deploy-to-fly-io APP_NAME=<your-fly-app-name>`
|
||||
|
||||
|
||||
## Using flyctl + Docker Hub image
|
||||
|
||||
1. Update your `fly.toml` to specify the Bifrost Docker Hub image.
|
||||
|
||||
```toml
|
||||
[build]
|
||||
image = "maximhq/bifrost:latest"
|
||||
```
|
||||
|
||||
2. Or you can specify the Docker Hub image path in the command:
|
||||
|
||||
```
|
||||
fly deploy --app <your-app-name> --image docker.io/maximhq/bifrost:latest
|
||||
```
|
||||
639
docs/deployment-guides/helm.mdx
Normal file
639
docs/deployment-guides/helm.mdx
Normal file
@@ -0,0 +1,639 @@
|
||||
---
|
||||
title: "Quick Start"
|
||||
description: "Deploy Bifrost on Kubernetes using the official Helm chart — quickstart for OSS and Enterprise"
|
||||
icon: "server"
|
||||
---
|
||||
|
||||
<Note>
|
||||
**Latest Chart Version**: [View on Artifact Hub](https://artifacthub.io/packages/helm/bifrost/bifrost)
|
||||
</Note>
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="OSS">
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Kubernetes cluster (v1.19+)
|
||||
- `kubectl` configured
|
||||
- Helm 3.2.0+ installed
|
||||
- Persistent Volume provisioner (required for SQLite; optional for Postgres-only)
|
||||
|
||||
<Note>
|
||||
If you use PostgreSQL for Bifrost storage, ensure the database is UTF8 encoded. See [PostgreSQL UTF8 Requirement](../quickstart/gateway/setting-up#postgresql-utf8-requirement).
|
||||
</Note>
|
||||
|
||||
## Step 1 — Add the Helm Repository
|
||||
|
||||
```bash
|
||||
helm repo add bifrost https://maximhq.github.io/bifrost/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
|
||||
## Step 2 — Install
|
||||
|
||||
<Note>
|
||||
The Helm chart ships ready-made values files under `helm-charts/bifrost/values-examples/`.
|
||||
For example: `sqlite-only.yaml`, `production-ha.yaml`, `external-postgres.yaml`, and `secrets-from-k8s.yaml`.
|
||||
See the full list here: https://github.com/maximhq/bifrost/tree/main/helm-charts/bifrost/values-examples
|
||||
</Note>
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Minimal (SQLite)">
|
||||
|
||||
Fastest way to get running. Bifrost deploys as a StatefulSet with a 10Gi PVC for SQLite.
|
||||
|
||||
```bash
|
||||
kubectl create secret generic bifrost-encryption-key \
|
||||
--from-literal=encryption-key="$(openssl rand -base64 32)"
|
||||
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
--set bifrost.encryptionKeySecret.name="bifrost-encryption-key" \
|
||||
--set bifrost.encryptionKeySecret.key="encryption-key"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="With a Provider Key">
|
||||
|
||||
Add your first provider key at install time:
|
||||
|
||||
```bash
|
||||
kubectl create secret generic bifrost-encryption-key \
|
||||
--from-literal=encryption-key="$(openssl rand -base64 32)"
|
||||
|
||||
kubectl create secret generic provider-keys \
|
||||
--from-literal=openai-api-key='sk-your-key'
|
||||
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
--set bifrost.encryptionKeySecret.name="bifrost-encryption-key" \
|
||||
--set bifrost.encryptionKeySecret.key="encryption-key" \
|
||||
--set 'bifrost.providers.openai.keys[0].name=primary' \
|
||||
--set 'bifrost.providers.openai.keys[0].value=env.OPENAI_API_KEY' \
|
||||
--set 'bifrost.providers.openai.keys[0].weight=1' \
|
||||
--set bifrost.providerSecrets.openai.existingSecret="provider-keys" \
|
||||
--set bifrost.providerSecrets.openai.key="openai-api-key" \
|
||||
--set bifrost.providerSecrets.openai.envVar="OPENAI_API_KEY"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Production (PostgreSQL + HA)">
|
||||
|
||||
High-availability setup — 3 replicas, PostgreSQL, autoscaling, ingress.
|
||||
|
||||
```bash
|
||||
# 1. Create secrets
|
||||
kubectl create secret generic bifrost-encryption-key \
|
||||
--from-literal=encryption-key="$(openssl rand -base64 32)"
|
||||
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password="$(openssl rand -base64 32)"
|
||||
|
||||
kubectl create secret generic provider-keys \
|
||||
--from-literal=openai-api-key='sk-...'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# production.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
replicaCount: 3
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
auth:
|
||||
username: bifrost
|
||||
database: bifrost
|
||||
existingSecret: "postgres-credentials"
|
||||
secretKeys:
|
||||
adminPasswordKey: "password"
|
||||
primary:
|
||||
persistence:
|
||||
size: 50Gi
|
||||
resources:
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 2000m
|
||||
memory: 2Gi
|
||||
|
||||
autoscaling:
|
||||
enabled: true
|
||||
minReplicas: 3
|
||||
maxReplicas: 10
|
||||
targetCPUUtilizationPercentage: 70
|
||||
targetMemoryUtilizationPercentage: 80
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
hosts:
|
||||
- host: bifrost.yourdomain.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: bifrost-tls
|
||||
hosts:
|
||||
- bifrost.yourdomain.com
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 2000m
|
||||
memory: 2Gi
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption-key"
|
||||
key: "encryption-key"
|
||||
|
||||
client:
|
||||
initialPoolSize: 500
|
||||
dropExcessRequests: true
|
||||
enableLogging: true
|
||||
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "env.OPENAI_API_KEY"
|
||||
weight: 1
|
||||
|
||||
providerSecrets:
|
||||
openai:
|
||||
existingSecret: "provider-keys"
|
||||
key: "openai-api-key"
|
||||
envVar: "OPENAI_API_KEY"
|
||||
|
||||
plugins:
|
||||
telemetry:
|
||||
enabled: true
|
||||
version: 1
|
||||
logging:
|
||||
enabled: true
|
||||
version: 1
|
||||
governance:
|
||||
enabled: true
|
||||
version: 1
|
||||
```
|
||||
|
||||
```bash
|
||||
# 2. Install
|
||||
helm install bifrost bifrost/bifrost -f production.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
<Note>
|
||||
`image.tag` is required — the chart will not start without it. Check [Docker Hub](https://hub.docker.com/r/maximhq/bifrost/tags) for available versions.
|
||||
</Note>
|
||||
|
||||
## Step 3 — Verify
|
||||
|
||||
```bash
|
||||
# Check pods are running
|
||||
kubectl get pods -l app.kubernetes.io/name=bifrost
|
||||
|
||||
# Port forward and hit the health endpoint
|
||||
kubectl port-forward svc/bifrost 8080:8080
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Check Prometheus metrics
|
||||
curl http://localhost:8080/metrics
|
||||
```
|
||||
|
||||
## Step 4 — Configure Providers & Plugins
|
||||
|
||||
```bash
|
||||
# Make your first inference call
|
||||
curl http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello from Bifrost!"}]
|
||||
}'
|
||||
```
|
||||
|
||||
Next steps: jump to [Next Steps](#next-steps).
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Enterprise">
|
||||
|
||||
Enterprise customers receive dedicated container images in a private registry, along with additional features, SLAs, and compliance documentation.
|
||||
|
||||
<Note>
|
||||
[Book a demo](https://calendly.com/maximai/bifrost-demo) to know more about our enterprise features.
|
||||
</Note>
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Kubernetes cluster (v1.19+)
|
||||
- `kubectl` configured
|
||||
- Helm 3.2.0+ installed
|
||||
- Enterprise registry credentials (provided by Maxim)
|
||||
|
||||
## Step 1 — Add the Helm Repository
|
||||
|
||||
```bash
|
||||
helm repo add bifrost https://maximhq.github.io/bifrost/helm-charts
|
||||
helm repo update
|
||||
```
|
||||
|
||||
## Step 2 — Create Pull Secret
|
||||
|
||||
Create a Kubernetes image pull secret for our private enterprise registry:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Google Artifact Registry">
|
||||
|
||||
```bash
|
||||
kubectl create secret docker-registry enterprise-registry-secret \
|
||||
--docker-server=us-west1-docker.pkg.dev \
|
||||
--docker-username=_json_key \
|
||||
--docker-password="$(cat service-account-key.json)" \
|
||||
--docker-email=your-email@example.com
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="AWS ECR">
|
||||
|
||||
```bash
|
||||
kubectl create secret docker-registry enterprise-registry-secret \
|
||||
--docker-server=123456789.dkr.ecr.us-east-1.amazonaws.com \
|
||||
--docker-username=AWS \
|
||||
--docker-password=$(aws ecr get-login-password --region us-east-1)
|
||||
```
|
||||
|
||||
<Note>
|
||||
ECR tokens expire after 12 hours. Use the [ECR Credential Helper](https://github.com/awslabs/amazon-ecr-credential-helper) or [ECR Registry Creds operator](https://github.com/upmc-enterprises/registry-creds) for automatic refresh.
|
||||
</Note>
|
||||
|
||||
</Tab>
|
||||
<Tab title="Azure ACR">
|
||||
|
||||
```bash
|
||||
kubectl create secret docker-registry enterprise-registry-secret \
|
||||
--docker-server=yourregistry.azurecr.io \
|
||||
--docker-username=<service-principal-id> \
|
||||
--docker-password=<service-principal-password>
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Self-Hosted Registry">
|
||||
|
||||
```bash
|
||||
kubectl create secret docker-registry enterprise-registry-secret \
|
||||
--docker-server=registry.yourcompany.com \
|
||||
--docker-username=<username> \
|
||||
--docker-password=<password>
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
## Step 3 — Create Required Secrets
|
||||
|
||||
```bash
|
||||
# Encryption key
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=key="$(openssl rand -base64 32)"
|
||||
|
||||
# Provider API keys
|
||||
kubectl create secret generic provider-keys \
|
||||
--from-literal=openai-api-key='sk-...' \
|
||||
--from-literal=anthropic-api-key='sk-ant-...'
|
||||
|
||||
# Admin credentials (for dashboard + governance)
|
||||
kubectl create secret generic bifrost-admin-credentials \
|
||||
--from-literal=username='admin' \
|
||||
--from-literal=password='secure-admin-password'
|
||||
```
|
||||
|
||||
## Step 4 — Install
|
||||
|
||||
```yaml
|
||||
# enterprise.yaml
|
||||
image:
|
||||
# Registry URL provided by Maxim
|
||||
repository: us-west1-docker.pkg.dev/bifrost-enterprise/your-org/bifrost
|
||||
tag: "latest"
|
||||
|
||||
imagePullSecrets:
|
||||
- name: enterprise-registry-secret
|
||||
|
||||
replicaCount: 3
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 1000m
|
||||
memory: 2Gi
|
||||
limits:
|
||||
cpu: 4000m
|
||||
memory: 8Gi
|
||||
|
||||
autoscaling:
|
||||
enabled: true
|
||||
minReplicas: 3
|
||||
maxReplicas: 20
|
||||
targetCPUUtilizationPercentage: 70
|
||||
targetMemoryUtilizationPercentage: 80
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
auth:
|
||||
password: "secure-password" # use existingSecret in production
|
||||
primary:
|
||||
persistence:
|
||||
size: 100Gi
|
||||
resources:
|
||||
requests:
|
||||
cpu: 1000m
|
||||
memory: 2Gi
|
||||
limits:
|
||||
cpu: 4000m
|
||||
memory: 8Gi
|
||||
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: weaviate
|
||||
weaviate:
|
||||
enabled: true
|
||||
persistence:
|
||||
size: 100Gi
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
|
||||
hosts:
|
||||
- host: bifrost.yourcompany.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: bifrost-tls
|
||||
hosts:
|
||||
- bifrost.yourcompany.com
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "key"
|
||||
|
||||
client:
|
||||
initialPoolSize: 1000
|
||||
dropExcessRequests: true
|
||||
enableLogging: true
|
||||
disableContentLogging: false # set true for HIPAA/compliance
|
||||
logRetentionDays: 365
|
||||
enforceGovernanceHeader: true
|
||||
allowDirectKeys: false
|
||||
maxRequestBodySizeMb: 100
|
||||
allowedOrigins:
|
||||
- "https://yourcompany.com"
|
||||
- "https://*.yourcompany.com"
|
||||
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "env.OPENAI_API_KEY"
|
||||
weight: 1
|
||||
anthropic:
|
||||
keys:
|
||||
- name: "anthropic-primary"
|
||||
value: "env.ANTHROPIC_API_KEY"
|
||||
weight: 1
|
||||
|
||||
providerSecrets:
|
||||
openai:
|
||||
existingSecret: "provider-keys"
|
||||
key: "openai-api-key"
|
||||
envVar: "OPENAI_API_KEY"
|
||||
anthropic:
|
||||
existingSecret: "provider-keys"
|
||||
key: "anthropic-api-key"
|
||||
envVar: "ANTHROPIC_API_KEY"
|
||||
|
||||
governance:
|
||||
authConfig:
|
||||
isEnabled: true
|
||||
disableAuthOnInference: false
|
||||
existingSecret: "bifrost-admin-credentials"
|
||||
usernameKey: "username"
|
||||
passwordKey: "password"
|
||||
|
||||
plugins:
|
||||
telemetry:
|
||||
enabled: true
|
||||
version: 1
|
||||
logging:
|
||||
enabled: true
|
||||
version: 1
|
||||
governance:
|
||||
enabled: true
|
||||
version: 1
|
||||
config:
|
||||
is_vk_mandatory: true
|
||||
semanticCache:
|
||||
enabled: true
|
||||
version: 1
|
||||
config:
|
||||
provider: "openai"
|
||||
embedding_model: "text-embedding-3-small"
|
||||
dimension: 1536
|
||||
threshold: 0.85
|
||||
ttl: "1h"
|
||||
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: bifrost
|
||||
topologyKey: kubernetes.io/hostname
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f enterprise.yaml
|
||||
```
|
||||
|
||||
Next steps: jump to [Next Steps](#next-steps).
|
||||
|
||||
<Note>
|
||||
For DB-backed deployments, built-in plugins support a top-level `version` field (for example: `telemetry`, `logging`, `governance`, `semanticCache`, `otel`, `maxim`, `datadog`). Increase this number when you want config from Helm to overwrite an older plugin record in the DB.
|
||||
</Note>
|
||||
|
||||
## Enterprise Support
|
||||
|
||||
Enterprise customers have access to:
|
||||
- Dedicated Slack channel for support
|
||||
- Priority bug fixes and feature requests
|
||||
- Custom feature development
|
||||
- SLA guarantees
|
||||
- Compliance documentation (SOC2, HIPAA, etc.)
|
||||
|
||||
Contact [support@getmaxim.ai](mailto:support@getmaxim.ai) for support.
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Operations
|
||||
|
||||
### Upgrade
|
||||
|
||||
```bash
|
||||
helm repo update
|
||||
|
||||
# Upgrade reusing all existing values
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values
|
||||
|
||||
# Upgrade with new values
|
||||
helm upgrade bifrost bifrost/bifrost -f your-values.yaml
|
||||
|
||||
# Upgrade and override a single field
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set image.tag=v1.4.11
|
||||
```
|
||||
|
||||
### Rollback
|
||||
|
||||
```bash
|
||||
helm history bifrost
|
||||
helm rollback bifrost # to previous revision
|
||||
helm rollback bifrost 2 # to specific revision
|
||||
```
|
||||
|
||||
### Scale
|
||||
|
||||
```bash
|
||||
kubectl scale deployment bifrost --replicas=5
|
||||
|
||||
# Or via Helm
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set replicaCount=5
|
||||
```
|
||||
|
||||
### Uninstall
|
||||
|
||||
```bash
|
||||
helm uninstall bifrost
|
||||
|
||||
# Also remove PVCs (permanently deletes all data)
|
||||
kubectl delete pvc -l app.kubernetes.io/instance=bifrost
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Prometheus Metrics
|
||||
|
||||
Bifrost exposes Prometheus metrics at `/metrics`.
|
||||
|
||||
Enable ServiceMonitor for automatic scraping:
|
||||
|
||||
```yaml
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
interval: 30s
|
||||
scrapeTimeout: 10s
|
||||
```
|
||||
|
||||
### Health Checks
|
||||
|
||||
Check pod health:
|
||||
|
||||
```bash
|
||||
# View pod status
|
||||
kubectl get pods -l app.kubernetes.io/name=bifrost
|
||||
|
||||
# Check logs
|
||||
kubectl logs -l app.kubernetes.io/name=bifrost --tail=100
|
||||
|
||||
# Describe pod
|
||||
kubectl describe pod -l app.kubernetes.io/name=bifrost
|
||||
```
|
||||
|
||||
### Metrics Endpoints
|
||||
|
||||
```bash
|
||||
# Port forward
|
||||
kubectl port-forward svc/bifrost 8080:8080
|
||||
|
||||
# Check metrics
|
||||
curl http://localhost:8080/metrics
|
||||
|
||||
# Check health
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Guides
|
||||
|
||||
<CardGroup cols={3}>
|
||||
<Card title="Values Reference" icon="sliders" href="/deployment-guides/helm/values">
|
||||
All parameters, secret references, advanced config, example patterns
|
||||
</Card>
|
||||
<Card title="Client Configuration" icon="gear" href="/deployment-guides/helm/client">
|
||||
Pool size, logging, CORS, header filtering, compat shims, MCP settings
|
||||
</Card>
|
||||
<Card title="Provider Setup" icon="plug" href="/deployment-guides/helm/providers">
|
||||
OpenAI, Anthropic, Azure, Bedrock, Vertex, Groq, self-hosted
|
||||
</Card>
|
||||
<Card title="Storage" icon="database" href="/deployment-guides/helm/storage">
|
||||
SQLite, PostgreSQL, object storage for logs, vector stores
|
||||
</Card>
|
||||
<Card title="Plugins" icon="puzzle-piece" href="/deployment-guides/helm/plugins">
|
||||
Telemetry, logging, semantic cache, OTel, Datadog, governance
|
||||
</Card>
|
||||
<Card title="Governance" icon="shield" href="/deployment-guides/helm/governance">
|
||||
Budgets, rate limits, virtual keys, routing rules
|
||||
</Card>
|
||||
<Card title="Cluster Mode" icon="network-wired" href="/deployment-guides/helm/cluster">
|
||||
Multi-replica HA, gossip, peer discovery
|
||||
</Card>
|
||||
<Card title="Troubleshooting" icon="wrench" href="/deployment-guides/helm/troubleshooting">
|
||||
Pod startup, database, ingress, PVC, secrets, performance
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
- [Helm Chart Repository](https://github.com/maximhq/bifrost/tree/main/helm-charts)
|
||||
- [Artifact Hub](https://artifacthub.io/packages/helm/bifrost/bifrost)
|
||||
- [Example Configurations](https://github.com/maximhq/bifrost/tree/main/helm-charts/bifrost/values-examples)
|
||||
- [GitHub Issues](https://github.com/maximhq/bifrost/issues)
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Configure [provider keys](/providers/supported-providers/overview)
|
||||
2. Enable [plugins](/plugins/getting-started)
|
||||
3. Set up [observability](/features/observability/default)
|
||||
4. Configure [governance](/features/governance/virtual-keys)
|
||||
316
docs/deployment-guides/helm/client.mdx
Normal file
316
docs/deployment-guides/helm/client.mdx
Normal file
@@ -0,0 +1,316 @@
|
||||
---
|
||||
title: "Client Configuration"
|
||||
description: "Configure the Bifrost client: connection pool, logging, CORS, header filtering, compat shims, and MCP settings"
|
||||
icon: "gear"
|
||||
---
|
||||
|
||||
The `bifrost.client` block controls how Bifrost manages its internal worker pool, request logging, authentication enforcement, header policies, SDK compatibility shims, and MCP agent behaviour. All settings map directly to the `client` section of the rendered `config.json`.
|
||||
|
||||
---
|
||||
|
||||
## Connection Pool
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.initialPoolSize` | Pre-allocated worker goroutines per provider queue | `300` |
|
||||
| `bifrost.client.dropExcessRequests` | Drop requests when queue is full instead of waiting | `false` |
|
||||
|
||||
A larger pool reduces latency spikes under burst load at the cost of higher baseline memory. For production workloads with multiple providers, `1000` is a common starting point.
|
||||
|
||||
```yaml
|
||||
# client-pool.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
client:
|
||||
initialPoolSize: 1000
|
||||
dropExcessRequests: true # Return 429 instead of queuing indefinitely
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f client-pool.yaml
|
||||
|
||||
# Or set inline
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set bifrost.client.initialPoolSize=1000 \
|
||||
--set bifrost.client.dropExcessRequests=true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Request & Response Logging
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.enableLogging` | Log all LLM requests and responses | `true` |
|
||||
| `bifrost.client.disableContentLogging` | Strip message content from logs (keeps metadata) | `false` |
|
||||
| `bifrost.client.logRetentionDays` | Days to retain log entries in the store | `365` |
|
||||
| `bifrost.client.loggingHeaders` | HTTP request headers to capture in log metadata | `[]` |
|
||||
|
||||
Set `disableContentLogging: true` for HIPAA / PCI compliance workloads where message content must not be persisted.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
enableLogging: true
|
||||
disableContentLogging: true # PII / compliance: store metadata only
|
||||
logRetentionDays: 90
|
||||
loggingHeaders:
|
||||
- "x-request-id"
|
||||
- "x-user-id"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set bifrost.client.disableContentLogging=true \
|
||||
--set bifrost.client.logRetentionDays=90
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security & CORS
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.allowedOrigins` | CORS allowed origins | `["*"]` |
|
||||
| `bifrost.client.allowDirectKeys` | Allow callers to pass provider keys directly in requests | `false` |
|
||||
| `bifrost.client.enforceGovernanceHeader` | Require `x-bf-vk` virtual-key header on every request | `false` |
|
||||
| `bifrost.client.maxRequestBodySizeMb` | Maximum allowed request body size | `100` |
|
||||
| `bifrost.client.whitelistedRoutes` | Routes that bypass auth middleware | `[]` |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
allowedOrigins:
|
||||
- "https://app.yourdomain.com"
|
||||
- "https://admin.yourdomain.com"
|
||||
allowDirectKeys: false # Prevent callers from supplying raw provider keys
|
||||
enforceGovernanceHeader: true # Every request must carry a virtual key
|
||||
maxRequestBodySizeMb: 50
|
||||
whitelistedRoutes:
|
||||
- "/health"
|
||||
- "/metrics"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
--set bifrost.client.enforceGovernanceHeader=true \
|
||||
--set bifrost.client.allowDirectKeys=false
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Header Filtering
|
||||
|
||||
Controls which `x-bf-eh-*` headers are forwarded to upstream LLM providers.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.headerFilterConfig.allowlist` | Only these headers are forwarded (whitelist mode) | `[]` |
|
||||
| `bifrost.client.headerFilterConfig.denylist` | These headers are always blocked | `[]` |
|
||||
| `bifrost.client.requiredHeaders` | Headers that must be present on every request | `[]` |
|
||||
| `bifrost.client.allowedHeaders` | Additional headers permitted for CORS and WebSocket | `[]` |
|
||||
|
||||
When both lists are empty, all `x-bf-eh-*` headers pass through. Specifying an `allowlist` enables strict whitelist mode — only listed headers are forwarded.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
headerFilterConfig:
|
||||
allowlist:
|
||||
- "x-bf-eh-anthropic-version"
|
||||
- "x-bf-eh-openai-beta"
|
||||
denylist: []
|
||||
requiredHeaders:
|
||||
- "x-request-id"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.authConfig.isEnabled` | Enable username/password auth for the API and dashboard | `false` |
|
||||
| `bifrost.authConfig.adminUsername` | Admin username (plain text, prefer secret) | `""` |
|
||||
| `bifrost.authConfig.adminPassword` | Admin password (plain text, prefer secret) | `""` |
|
||||
| `bifrost.authConfig.existingSecret` | Kubernetes Secret name for credentials | `""` |
|
||||
| `bifrost.authConfig.usernameKey` | Key within the secret for username | `"username"` |
|
||||
| `bifrost.authConfig.passwordKey` | Key within the secret for password | `"password"` |
|
||||
| `bifrost.authConfig.disableAuthOnInference` | Skip auth check on `/v1/*` inference routes | `false` |
|
||||
|
||||
```bash
|
||||
# Create secret first
|
||||
kubectl create secret generic bifrost-admin \
|
||||
--from-literal=username='admin' \
|
||||
--from-literal=password='your-secure-password'
|
||||
```
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
authConfig:
|
||||
isEnabled: true
|
||||
disableAuthOnInference: false
|
||||
existingSecret: "bifrost-admin"
|
||||
usernameKey: "username"
|
||||
passwordKey: "password"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
-f auth-values.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Encryption
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.encryptionKey` | Optional encryption key (plain text — use `encryptionKeySecret` in production). If omitted, data is stored in plaintext. | `""` |
|
||||
| `bifrost.encryptionKeySecret.name` | Kubernetes Secret name containing the key | `""` |
|
||||
| `bifrost.encryptionKeySecret.key` | Key within the secret | `"encryption-key"` |
|
||||
|
||||
Always use a Kubernetes Secret in production:
|
||||
|
||||
```bash
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=encryption-key='your-32-byte-encryption-key-here'
|
||||
```
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "encryption-key"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
-f encryption-values.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Async Jobs & Database Pings
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.disableDbPingsInHealth` | Exclude DB connectivity from `/health` checks | `false` |
|
||||
| `bifrost.client.asyncJobResultTTL` | TTL (seconds) for async job results | `3600` |
|
||||
|
||||
---
|
||||
|
||||
## Compat Shims
|
||||
|
||||
Compatibility flags that let Bifrost silently adapt request/response shapes for SDK integrations:
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.compat.convertTextToChat` | Wrap legacy text completions as chat messages | `false` |
|
||||
| `bifrost.client.compat.convertChatToResponses` | Translate chat completions to Responses API format | `false` |
|
||||
| `bifrost.client.compat.shouldDropParams` | Silently drop unsupported parameters instead of erroring | `false` |
|
||||
| `bifrost.client.compat.shouldConvertParams` | Auto-convert parameter names across provider schemas | `false` |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
compat:
|
||||
shouldDropParams: true # Useful when proxying mixed SDK traffic
|
||||
convertTextToChat: true # For clients using the legacy /v1/completions endpoint
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prometheus Labels
|
||||
|
||||
Add custom labels to every Prometheus metric emitted by Bifrost:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
prometheusLabels:
|
||||
- name: "environment"
|
||||
value: "production"
|
||||
- name: "region"
|
||||
value: "us-east-1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MCP Agent Settings
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.mcpAgentDepth` | Maximum tool-call recursion depth for MCP agent mode | `10` |
|
||||
| `bifrost.client.mcpToolExecutionTimeout` | Timeout per tool execution in seconds | `30` |
|
||||
| `bifrost.client.mcpCodeModeBindingLevel` | Code mode binding level (`server` or `tool`) | `""` |
|
||||
| `bifrost.client.mcpToolSyncInterval` | Global tool sync interval in minutes (`0` = disabled) | `0` |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
mcpAgentDepth: 15
|
||||
mcpToolExecutionTimeout: 60
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Example
|
||||
|
||||
```yaml
|
||||
# client-full.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "encryption-key"
|
||||
|
||||
authConfig:
|
||||
isEnabled: true
|
||||
disableAuthOnInference: false
|
||||
existingSecret: "bifrost-admin"
|
||||
usernameKey: "username"
|
||||
passwordKey: "password"
|
||||
|
||||
client:
|
||||
initialPoolSize: 1000
|
||||
dropExcessRequests: true
|
||||
allowedOrigins:
|
||||
- "https://app.yourdomain.com"
|
||||
enableLogging: true
|
||||
disableContentLogging: false
|
||||
logRetentionDays: 90
|
||||
enforceGovernanceHeader: true
|
||||
allowDirectKeys: false
|
||||
maxRequestBodySizeMb: 100
|
||||
headerFilterConfig:
|
||||
allowlist: []
|
||||
denylist: []
|
||||
prometheusLabels:
|
||||
- name: "environment"
|
||||
value: "production"
|
||||
mcpAgentDepth: 10
|
||||
mcpToolExecutionTimeout: 30
|
||||
```
|
||||
|
||||
```bash
|
||||
# Create prerequisites
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=encryption-key='your-32-byte-encryption-key-here'
|
||||
|
||||
kubectl create secret generic bifrost-admin \
|
||||
--from-literal=username='admin' \
|
||||
--from-literal=password='your-secure-password'
|
||||
|
||||
# Install
|
||||
helm install bifrost bifrost/bifrost -f client-full.yaml
|
||||
```
|
||||
523
docs/deployment-guides/helm/cluster.mdx
Normal file
523
docs/deployment-guides/helm/cluster.mdx
Normal file
@@ -0,0 +1,523 @@
|
||||
---
|
||||
title: "Cluster Mode & HA"
|
||||
description: "Run Bifrost in a multi-replica cluster with gossip-based peer discovery, distributed state sync, and high-availability configuration"
|
||||
icon: "network-wired"
|
||||
---
|
||||
|
||||
Cluster mode enables multiple Bifrost replicas to share state — rate limits, budget counters, and governance data — across pods. When `bifrost.cluster.enabled` is `false` (the default), each replica operates independently and state is only shared via the database.
|
||||
|
||||
<Note>
|
||||
Cluster mode requires **PostgreSQL** as the storage backend. SQLite is single-node only.
|
||||
</Note>
|
||||
|
||||
<Warning>
|
||||
`bifrost.cluster.*` is an enterprise capability. OSS images accept these values but do not run cluster mode at runtime.
|
||||
</Warning>
|
||||
|
||||
## When to Use Cluster Mode
|
||||
|
||||
| Scenario | Recommendation |
|
||||
|----------|---------------|
|
||||
| Single replica | Not needed |
|
||||
| Multiple replicas, shared DB only | Optional — DB provides eventual consistency |
|
||||
| Multiple replicas with strict per-minute rate limiting | **Enable cluster mode** — in-memory counters are synced via gossip |
|
||||
| Geographic multi-region | Enable cluster mode with DNS or Consul discovery |
|
||||
|
||||
---
|
||||
|
||||
## Basic Cluster Setup
|
||||
|
||||
```yaml
|
||||
# cluster-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
replicaCount: 3
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
external:
|
||||
enabled: true
|
||||
host: "your-postgres-host.example.com"
|
||||
port: 5432
|
||||
user: bifrost
|
||||
database: bifrost
|
||||
sslMode: require
|
||||
existingSecret: "postgres-credentials"
|
||||
passwordKey: "password"
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "encryption-key"
|
||||
|
||||
cluster:
|
||||
enabled: true
|
||||
gossip:
|
||||
port: 7946
|
||||
config:
|
||||
timeoutSeconds: 10
|
||||
successThreshold: 3
|
||||
failureThreshold: 3
|
||||
|
||||
# Spread replicas across nodes for true HA
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: bifrost
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
# Conservative scale-down: avoid killing pods mid-stream
|
||||
autoscaling:
|
||||
enabled: true
|
||||
minReplicas: 3
|
||||
maxReplicas: 10
|
||||
targetCPUUtilizationPercentage: 70
|
||||
behavior:
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
policies:
|
||||
- type: Pods
|
||||
value: 1
|
||||
periodSeconds: 120
|
||||
|
||||
# Give in-flight SSE streams time to drain
|
||||
terminationGracePeriodSeconds: 90
|
||||
lifecycle:
|
||||
preStop:
|
||||
exec:
|
||||
command: ["sh", "-c", "sleep 20"]
|
||||
```
|
||||
|
||||
```bash
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password='your-postgres-password'
|
||||
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=encryption-key='your-32-byte-encryption-key'
|
||||
|
||||
helm install bifrost bifrost/bifrost -f cluster-values.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Peer Discovery
|
||||
|
||||
Bifrost uses a gossip protocol (memberlist) for peer-to-peer state sync. Configure how peers find each other:
|
||||
|
||||
<Note>
|
||||
For `consul`, `etcd`, and `udp` discovery, set `bifrost.cluster.discovery.serviceName` so nodes register/discover under a stable service identity.
|
||||
</Note>
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="Kubernetes (Recommended)">
|
||||
|
||||
Bifrost queries the Kubernetes API to find other Bifrost pods by label selector. No static peer list needed — works with HPA.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
enabled: true
|
||||
discovery:
|
||||
enabled: true
|
||||
type: kubernetes
|
||||
k8sNamespace: "default" # namespace where Bifrost runs
|
||||
k8sLabelSelector: "app.kubernetes.io/name=bifrost"
|
||||
gossip:
|
||||
port: 7946
|
||||
```
|
||||
|
||||
The service account needs permission to list pods:
|
||||
|
||||
```yaml
|
||||
serviceAccount:
|
||||
create: true
|
||||
annotations: {}
|
||||
```
|
||||
|
||||
```bash
|
||||
# Create a ClusterRole and binding for pod discovery (apply once)
|
||||
kubectl apply -f - <<'EOF'
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: bifrost-pod-discovery
|
||||
namespace: default
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["pods"]
|
||||
verbs: ["list", "get", "watch"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: bifrost-pod-discovery
|
||||
namespace: default
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: bifrost
|
||||
namespace: default
|
||||
roleRef:
|
||||
kind: Role
|
||||
name: bifrost-pod-discovery
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
EOF
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f cluster-k8s-discovery-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="DNS">
|
||||
|
||||
Uses a headless service DNS name to resolve peer IPs. Works well with StatefulSets (predictable pod DNS names).
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
enabled: true
|
||||
discovery:
|
||||
enabled: true
|
||||
type: dns
|
||||
dnsNames:
|
||||
- "bifrost-headless.default.svc.cluster.local"
|
||||
gossip:
|
||||
port: 7946
|
||||
```
|
||||
|
||||
The chart automatically creates a headless service (`bifrost-headless`) when cluster mode is enabled with a StatefulSet. For Deployments, create it manually:
|
||||
|
||||
```bash
|
||||
kubectl apply -f - <<'EOF'
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost-headless
|
||||
spec:
|
||||
clusterIP: None
|
||||
selector:
|
||||
app.kubernetes.io/name: bifrost
|
||||
ports:
|
||||
- name: gossip
|
||||
port: 7946
|
||||
protocol: TCP
|
||||
EOF
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f cluster-dns-discovery-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Static Peers">
|
||||
|
||||
Enumerate peer addresses explicitly. Use when discovery mechanisms are unavailable or you want deterministic membership.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
enabled: true
|
||||
peers:
|
||||
- "bifrost-0.bifrost-headless.default.svc.cluster.local:7946"
|
||||
- "bifrost-1.bifrost-headless.default.svc.cluster.local:7946"
|
||||
- "bifrost-2.bifrost-headless.default.svc.cluster.local:7946"
|
||||
gossip:
|
||||
port: 7946
|
||||
```
|
||||
|
||||
<Note>
|
||||
Static peers require StatefulSet pod names to be stable. This approach doesn't adapt to HPA-driven scaling — use Kubernetes or DNS discovery for dynamic replica counts.
|
||||
</Note>
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Consul">
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
enabled: true
|
||||
discovery:
|
||||
enabled: true
|
||||
type: consul
|
||||
serviceName: "bifrost-cluster"
|
||||
consulAddress: "consul.consul.svc.cluster.local:8500"
|
||||
gossip:
|
||||
port: 7946
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f cluster-consul-discovery-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="etcd">
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
enabled: true
|
||||
discovery:
|
||||
enabled: true
|
||||
type: etcd
|
||||
serviceName: "bifrost-cluster"
|
||||
etcdEndpoints:
|
||||
- "http://etcd-0.etcd.default.svc.cluster.local:2379"
|
||||
- "http://etcd-1.etcd.default.svc.cluster.local:2379"
|
||||
- "http://etcd-2.etcd.default.svc.cluster.local:2379"
|
||||
gossip:
|
||||
port: 7946
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="mDNS">
|
||||
|
||||
Best for local development or bare-metal clusters where multicast is available.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
enabled: true
|
||||
discovery:
|
||||
enabled: true
|
||||
type: mdns
|
||||
mdnsService: "_bifrost._tcp"
|
||||
gossip:
|
||||
port: 7946
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Allowed Address Space
|
||||
|
||||
Restrict gossip to a specific subnet (useful in multi-tenant clusters):
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
discovery:
|
||||
enabled: true
|
||||
type: kubernetes
|
||||
k8sNamespace: "default"
|
||||
k8sLabelSelector: "app.kubernetes.io/name=bifrost"
|
||||
allowedAddressSpace:
|
||||
- "10.0.0.0/8"
|
||||
- "172.16.0.0/12"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Region-Aware Routing
|
||||
|
||||
Tag replicas with a region identifier for latency-aware routing:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
cluster:
|
||||
enabled: true
|
||||
region: "us-east-1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full HA Production Example
|
||||
|
||||
```yaml
|
||||
# ha-production-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
replicaCount: 3
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 1000m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 4000m
|
||||
memory: 4Gi
|
||||
|
||||
autoscaling:
|
||||
enabled: true
|
||||
minReplicas: 3
|
||||
maxReplicas: 15
|
||||
targetCPUUtilizationPercentage: 70
|
||||
targetMemoryUtilizationPercentage: 75
|
||||
behavior:
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
policies:
|
||||
- type: Pods
|
||||
value: 1
|
||||
periodSeconds: 120
|
||||
scaleUp:
|
||||
stabilizationWindowSeconds: 30
|
||||
|
||||
terminationGracePeriodSeconds: 90
|
||||
lifecycle:
|
||||
preStop:
|
||||
exec:
|
||||
command: ["sh", "-c", "sleep 20"]
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
|
||||
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
|
||||
hosts:
|
||||
- host: bifrost.yourdomain.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: bifrost-tls
|
||||
hosts:
|
||||
- bifrost.yourdomain.com
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
external:
|
||||
enabled: true
|
||||
host: "rds.us-east-1.amazonaws.com"
|
||||
port: 5432
|
||||
user: bifrost
|
||||
database: bifrost
|
||||
sslMode: require
|
||||
existingSecret: "postgres-credentials"
|
||||
passwordKey: "password"
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "encryption-key"
|
||||
|
||||
client:
|
||||
initialPoolSize: 1000
|
||||
dropExcessRequests: true
|
||||
enableLogging: true
|
||||
enforceGovernanceHeader: true
|
||||
|
||||
cluster:
|
||||
enabled: true
|
||||
region: "us-east-1"
|
||||
discovery:
|
||||
enabled: true
|
||||
type: kubernetes
|
||||
k8sNamespace: "default"
|
||||
k8sLabelSelector: "app.kubernetes.io/name=bifrost"
|
||||
gossip:
|
||||
port: 7946
|
||||
config:
|
||||
timeoutSeconds: 10
|
||||
successThreshold: 3
|
||||
failureThreshold: 3
|
||||
|
||||
plugins:
|
||||
telemetry:
|
||||
enabled: true
|
||||
config:
|
||||
push_gateway:
|
||||
enabled: true
|
||||
push_gateway_url: "http://prometheus-pushgateway.monitoring.svc.cluster.local:9091"
|
||||
push_interval: 15
|
||||
logging:
|
||||
enabled: true
|
||||
governance:
|
||||
enabled: true
|
||||
config:
|
||||
is_vk_mandatory: true
|
||||
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: bifrost
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
serviceAccount:
|
||||
create: true
|
||||
annotations: {}
|
||||
```
|
||||
|
||||
```bash
|
||||
# Prerequisites
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password='your-secure-postgres-password'
|
||||
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=encryption-key='your-32-byte-encryption-key'
|
||||
|
||||
# RBAC for Kubernetes pod discovery
|
||||
kubectl apply -f - <<'EOF'
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: bifrost-pod-discovery
|
||||
namespace: default
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["pods"]
|
||||
verbs: ["list", "get", "watch"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: bifrost-pod-discovery
|
||||
namespace: default
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: bifrost
|
||||
namespace: default
|
||||
roleRef:
|
||||
kind: Role
|
||||
name: bifrost-pod-discovery
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
EOF
|
||||
|
||||
# Install
|
||||
helm install bifrost bifrost/bifrost -f ha-production-values.yaml
|
||||
|
||||
# Verify all peers have found each other (check logs)
|
||||
kubectl logs -l app.kubernetes.io/name=bifrost --tail=50 | grep -i gossip
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verifying Cluster Health
|
||||
|
||||
```bash
|
||||
# Check all pods are running
|
||||
kubectl get pods -l app.kubernetes.io/name=bifrost
|
||||
|
||||
# Check gossip port is reachable between pods
|
||||
kubectl exec -it bifrost-0 -- nc -zv bifrost-1.bifrost-headless 7946
|
||||
|
||||
# Check health endpoint
|
||||
kubectl port-forward svc/bifrost 8080:8080 &
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# View HPA status
|
||||
kubectl get hpa bifrost
|
||||
|
||||
# Scale manually during maintenance
|
||||
kubectl scale deployment bifrost --replicas=5
|
||||
```
|
||||
446
docs/deployment-guides/helm/governance.mdx
Normal file
446
docs/deployment-guides/helm/governance.mdx
Normal file
@@ -0,0 +1,446 @@
|
||||
---
|
||||
title: "Governance"
|
||||
description: "Configure Bifrost governance in Helm — budgets, rate limits, virtual keys, routing rules, and admin authentication"
|
||||
icon: "shield"
|
||||
---
|
||||
|
||||
Governance lets you control who can call which providers, how much they can spend, how fast they can go, and how traffic is routed. Everything is declared under `bifrost.governance` in your values file and seeded into the database at startup.
|
||||
|
||||
<Note>
|
||||
The governance **plugin** must also be enabled for enforcement to take effect:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
plugins:
|
||||
governance:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
See the [Plugins](/deployment-guides/helm/plugins) page for plugin configuration details.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Admin Authentication
|
||||
|
||||
Protect the Bifrost dashboard and management API with username/password auth.
|
||||
|
||||
```bash
|
||||
kubectl create secret generic bifrost-admin-credentials \
|
||||
--from-literal=username='admin' \
|
||||
--from-literal=password='your-secure-admin-password'
|
||||
```
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
authConfig:
|
||||
isEnabled: true
|
||||
disableAuthOnInference: false # keep auth on inference routes
|
||||
existingSecret: "bifrost-admin-credentials"
|
||||
usernameKey: "username"
|
||||
passwordKey: "password"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values -f governance-auth-values.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Budgets
|
||||
|
||||
Spending caps that reset on a configurable period. Budgets are referenced by ID from virtual keys, teams, customers, or providers.
|
||||
|
||||
| Reset duration | Syntax |
|
||||
|----------------|--------|
|
||||
| 30 seconds | `"30s"` |
|
||||
| 5 minutes | `"5m"` |
|
||||
| 1 hour | `"1h"` |
|
||||
| 1 day | `"1d"` |
|
||||
| 1 week | `"1w"` |
|
||||
| 1 month | `"1M"` |
|
||||
| 1 year | `"1Y"` |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
budgets:
|
||||
- id: "budget-dev"
|
||||
max_limit: 50 # $50 per month
|
||||
reset_duration: "1M"
|
||||
|
||||
- id: "budget-production"
|
||||
max_limit: 500 # $500 per month
|
||||
reset_duration: "1M"
|
||||
|
||||
- id: "budget-testing"
|
||||
max_limit: 10 # $10 per day
|
||||
reset_duration: "1d"
|
||||
|
||||
- id: "budget-enterprise"
|
||||
max_limit: 5000 # $5000 per month
|
||||
reset_duration: "1M"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rate Limits
|
||||
|
||||
Token and request-count caps per time window. Referenced by ID from virtual keys, teams, customers, or providers.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
rateLimits:
|
||||
- id: "rate-limit-standard"
|
||||
token_max_limit: 100000 # 100K tokens per hour
|
||||
token_reset_duration: "1h"
|
||||
request_max_limit: 1000 # 1000 requests per hour
|
||||
request_reset_duration: "1h"
|
||||
|
||||
- id: "rate-limit-high"
|
||||
token_max_limit: 500000 # 500K tokens per hour
|
||||
token_reset_duration: "1h"
|
||||
request_max_limit: 5000
|
||||
request_reset_duration: "1h"
|
||||
|
||||
- id: "rate-limit-burst"
|
||||
token_max_limit: 50000 # 50K tokens per minute (burst)
|
||||
token_reset_duration: "1m"
|
||||
request_max_limit: 500
|
||||
request_reset_duration: "1m"
|
||||
|
||||
- id: "rate-limit-testing"
|
||||
token_max_limit: 10000
|
||||
token_reset_duration: "1h"
|
||||
request_max_limit: 100
|
||||
request_reset_duration: "1h"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Customers & Teams
|
||||
|
||||
Optional organizational hierarchy. Virtual keys can be assigned to customers or teams, inheriting their budgets and rate limits.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
customers:
|
||||
- id: "customer-acme"
|
||||
name: "Acme Corp"
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-high"
|
||||
|
||||
- id: "customer-startup"
|
||||
name: "Startup Inc"
|
||||
budget_id: "budget-dev"
|
||||
rate_limit_id: "rate-limit-standard"
|
||||
|
||||
teams:
|
||||
- id: "team-platform"
|
||||
name: "Platform Team"
|
||||
customer_id: "customer-acme"
|
||||
budget_id: "budget-enterprise"
|
||||
rate_limit_id: "rate-limit-high"
|
||||
|
||||
- id: "team-ml"
|
||||
name: "ML Team"
|
||||
customer_id: "customer-acme"
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-standard"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Virtual Keys
|
||||
|
||||
Virtual keys are the primary access tokens issued to callers. They scope which providers, models, and underlying API keys are accessible.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
virtualKeys:
|
||||
# 1. Unrestricted dev key — access to every provider
|
||||
- id: "vk-dev-all"
|
||||
name: "Dev: all providers"
|
||||
value: "vk-dev-all-secret-token"
|
||||
is_active: true
|
||||
budget_id: "budget-dev"
|
||||
rate_limit_id: "rate-limit-standard"
|
||||
# No provider_configs → all providers allowed
|
||||
|
||||
# 2. OpenAI only — restricted to two models
|
||||
- id: "vk-openai-prod"
|
||||
name: "OpenAI Production"
|
||||
value: "vk-openai-prod-secret-token"
|
||||
is_active: true
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-high"
|
||||
provider_configs:
|
||||
- provider: "openai"
|
||||
weight: 1
|
||||
allowed_models: ["gpt-4o", "gpt-4o-mini"]
|
||||
|
||||
# 3. Multi-provider with weighted routing
|
||||
- id: "vk-multi"
|
||||
name: "Multi-provider weighted"
|
||||
value: "vk-multi-secret-token"
|
||||
is_active: true
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-high"
|
||||
provider_configs:
|
||||
- provider: "openai"
|
||||
weight: 2 # 50%
|
||||
allowed_models: ["*"]
|
||||
- provider: "anthropic"
|
||||
weight: 1 # 25%
|
||||
allowed_models: ["*"]
|
||||
- provider: "groq"
|
||||
weight: 1 # 25%
|
||||
allowed_models: ["*"]
|
||||
|
||||
# 4. Team-scoped key
|
||||
- id: "vk-platform-team"
|
||||
name: "Platform Team Key"
|
||||
value: "vk-platform-team-token"
|
||||
is_active: true
|
||||
team_id: "team-platform" # inherits team budget/rate-limit
|
||||
provider_configs:
|
||||
- provider: "openai"
|
||||
weight: 1
|
||||
allowed_models: ["*"]
|
||||
key_ids: ["openai-primary"] # pin to specific configured key by name
|
||||
|
||||
# 5. Restricted testing key
|
||||
- id: "vk-testing"
|
||||
name: "Testing (gpt-4o-mini only)"
|
||||
value: "vk-testing-token"
|
||||
is_active: true
|
||||
budget_id: "budget-testing"
|
||||
rate_limit_id: "rate-limit-testing"
|
||||
provider_configs:
|
||||
- provider: "openai"
|
||||
weight: 1
|
||||
allowed_models: ["gpt-4o-mini"]
|
||||
|
||||
# 6. Batch API key
|
||||
- id: "vk-batch"
|
||||
name: "Batch API workloads"
|
||||
value: "vk-batch-token"
|
||||
is_active: true
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-burst"
|
||||
provider_configs:
|
||||
- provider: "openai"
|
||||
weight: 1
|
||||
allowed_models: ["*"]
|
||||
key_ids: ["openai-batch"] # only the batch-flagged key
|
||||
```
|
||||
|
||||
`provider_configs[].key_ids` and `provider_configs[].keys` are both supported in Helm values. Prefer `key_ids` for parity with `config.json` (`key_ids` should contain provider key names).
|
||||
|
||||
**Use a virtual key in API calls:**
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-vk: vk-openai-prod-secret-token" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Model Configs
|
||||
|
||||
Apply budgets and rate limits at the model level, independent of virtual keys:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
modelConfigs:
|
||||
- id: "model-gpt4o"
|
||||
model_name: "gpt-4o"
|
||||
provider: "openai"
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-high"
|
||||
|
||||
- id: "model-claude"
|
||||
model_name: "claude-3-5-sonnet-20241022"
|
||||
provider: "anthropic"
|
||||
rate_limit_id: "rate-limit-standard"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Provider Governance
|
||||
|
||||
Apply budgets and rate limits at the provider level:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
providers:
|
||||
- name: "openai"
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-high"
|
||||
send_back_raw_request: false
|
||||
send_back_raw_response: false
|
||||
|
||||
- name: "anthropic"
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-standard"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Routing Rules
|
||||
|
||||
CEL-expression-based routing rules redirect requests to different providers or models based on request attributes.
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `cel_expression` | CEL expression evaluated against the request; if `true`, rule fires |
|
||||
| `targets` | Provider/model targets with weights |
|
||||
| `fallbacks` | Providers to try if all targets fail |
|
||||
| `scope` | `global`, `team`, `customer`, or `virtual_key` |
|
||||
| `scope_id` | Required for non-global scopes |
|
||||
| `priority` | Lower number = evaluated first |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
governance:
|
||||
routingRules:
|
||||
# Route all GPT requests to Azure
|
||||
- id: "route-gpt-to-azure"
|
||||
name: "GPT → Azure"
|
||||
description: "Route all GPT model requests to Azure OpenAI"
|
||||
enabled: true
|
||||
cel_expression: "model.startsWith('gpt-')"
|
||||
targets:
|
||||
- provider: "azure"
|
||||
model: "" # empty = use original model name
|
||||
weight: 1.0
|
||||
fallbacks: ["openai"]
|
||||
scope: "global"
|
||||
priority: 0
|
||||
|
||||
# Route heavy models to a slower but cheaper provider
|
||||
- id: "route-heavy-to-groq"
|
||||
name: "Large context → Groq"
|
||||
enabled: true
|
||||
cel_expression: "model == 'gpt-4o' && request_body.max_tokens > 4000"
|
||||
targets:
|
||||
- provider: "groq"
|
||||
model: "llama-3.3-70b-versatile"
|
||||
weight: 1.0
|
||||
fallbacks: ["openai"]
|
||||
scope: "global"
|
||||
priority: 1
|
||||
|
||||
# Team-scoped rule
|
||||
- id: "route-ml-team-bedrock"
|
||||
name: "ML Team → Bedrock"
|
||||
enabled: true
|
||||
cel_expression: "true" # match all requests for this scope
|
||||
targets:
|
||||
- provider: "bedrock"
|
||||
model: ""
|
||||
weight: 1.0
|
||||
fallbacks: ["openai"]
|
||||
scope: "team"
|
||||
scope_id: "team-ml"
|
||||
priority: 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Example
|
||||
|
||||
```yaml
|
||||
# governance-full-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "encryption-key"
|
||||
|
||||
plugins:
|
||||
governance:
|
||||
enabled: true
|
||||
config:
|
||||
is_vk_mandatory: true
|
||||
|
||||
governance:
|
||||
authConfig:
|
||||
isEnabled: true
|
||||
existingSecret: "bifrost-admin-credentials"
|
||||
usernameKey: "username"
|
||||
passwordKey: "password"
|
||||
|
||||
budgets:
|
||||
- id: "budget-production"
|
||||
max_limit: 500
|
||||
reset_duration: "1M"
|
||||
- id: "budget-dev"
|
||||
max_limit: 50
|
||||
reset_duration: "1M"
|
||||
|
||||
rateLimits:
|
||||
- id: "rate-limit-standard"
|
||||
token_max_limit: 100000
|
||||
token_reset_duration: "1h"
|
||||
request_max_limit: 1000
|
||||
request_reset_duration: "1h"
|
||||
|
||||
virtualKeys:
|
||||
- id: "vk-production"
|
||||
name: "Production"
|
||||
value: "vk-prod-secret-token"
|
||||
is_active: true
|
||||
budget_id: "budget-production"
|
||||
rate_limit_id: "rate-limit-standard"
|
||||
provider_configs:
|
||||
- provider: "openai"
|
||||
weight: 1
|
||||
allowed_models: ["gpt-4o", "gpt-4o-mini"]
|
||||
```
|
||||
|
||||
```bash
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=encryption-key='your-32-byte-key'
|
||||
|
||||
kubectl create secret generic bifrost-admin-credentials \
|
||||
--from-literal=username='admin' \
|
||||
--from-literal=password='secure-admin-password'
|
||||
|
||||
helm install bifrost bifrost/bifrost -f governance-full-values.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Access Profiles (Enterprise)
|
||||
|
||||
You can seed enterprise `access_profiles` directly from Helm values. The chart renders `bifrost.accessProfiles` into top-level `access_profiles` in `config.json`.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
accessProfiles:
|
||||
- name: "platform-default"
|
||||
description: "Default profile for platform users"
|
||||
is_active: true
|
||||
tags: ["platform", "default"]
|
||||
provider_configs:
|
||||
- provider_name: "openai"
|
||||
all_models_allowed: false
|
||||
allowed_models: ["gpt-4o", "gpt-4o-mini"]
|
||||
mcp_servers:
|
||||
- mcp_server_id: "github"
|
||||
mcp_tool_overrides:
|
||||
- mcp_client_id: "github"
|
||||
tool_name: "create_pull_request"
|
||||
action: "include"
|
||||
```
|
||||
262
docs/deployment-guides/helm/guardrails.mdx
Normal file
262
docs/deployment-guides/helm/guardrails.mdx
Normal file
@@ -0,0 +1,262 @@
|
||||
---
|
||||
title: "Guardrails"
|
||||
description: "Configure guardrails providers and rules in Bifrost Helm deployments"
|
||||
icon: "shield-halved"
|
||||
---
|
||||
|
||||
<Note>
|
||||
Guardrails are an **enterprise-only** feature. They require the enterprise Bifrost image.
|
||||
</Note>
|
||||
|
||||
Guardrails are configured under `bifrost.guardrails` in your values file. The configuration has two parts:
|
||||
|
||||
- **`providers`** — the backend that performs the check. Rules link to providers by `id`.
|
||||
- **`rules`** — CEL expressions that control when and where providers are invoked.
|
||||
|
||||
---
|
||||
|
||||
## Providers
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Regex">
|
||||
|
||||
Runs entirely in-process with no external dependency. Patterns use RE2 syntax. Supports optional per-pattern flags: `i` (case-insensitive), `m` (multiline), `s` (dot-all).
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
guardrails:
|
||||
providers:
|
||||
- id: 1
|
||||
provider_name: "regex"
|
||||
policy_name: "block-secrets"
|
||||
enabled: true
|
||||
timeout: 5
|
||||
config:
|
||||
patterns:
|
||||
- pattern: "sk-[A-Za-z0-9]{20,}"
|
||||
description: "OpenAI API key"
|
||||
- pattern: "AKIA[0-9A-Z]{16}"
|
||||
description: "AWS access key"
|
||||
flags: "i"
|
||||
- pattern: "gh[ps]_[A-Za-z0-9]{36}"
|
||||
description: "GitHub token"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="AWS Bedrock">
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
guardrails:
|
||||
providers:
|
||||
- id: 2
|
||||
provider_name: "bedrock"
|
||||
policy_name: "content-filter"
|
||||
enabled: true
|
||||
timeout: 15
|
||||
config:
|
||||
guardrail_arn: "arn:aws:bedrock:us-east-1::guardrail/abc123"
|
||||
guardrail_version: "DRAFT" # or a published version number
|
||||
region: "us-east-1"
|
||||
access_key: "env.AWS_ACCESS_KEY_ID" # omit to use instance role
|
||||
secret_key: "env.AWS_SECRET_ACCESS_KEY"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Azure Content Safety">
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
guardrails:
|
||||
providers:
|
||||
- id: 3
|
||||
provider_name: "azure"
|
||||
policy_name: "azure-content-safety"
|
||||
enabled: true
|
||||
timeout: 10
|
||||
config:
|
||||
endpoint: "https://your-resource.cognitiveservices.azure.com"
|
||||
api_key: "env.AZURE_CONTENT_SAFETY_KEY"
|
||||
analyze_enabled: true
|
||||
analyze_severity_threshold: "medium" # low | medium | high
|
||||
jailbreak_shield_enabled: true
|
||||
indirect_attack_shield_enabled: true
|
||||
copyright_enabled: false
|
||||
text_blocklist_enabled: false
|
||||
blocklist_names: []
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gray Swan">
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
guardrails:
|
||||
providers:
|
||||
- id: 4
|
||||
provider_name: "grayswan"
|
||||
policy_name: "grayswan-jailbreak"
|
||||
enabled: true
|
||||
timeout: 15
|
||||
config:
|
||||
api_key: "env.GRAYSWAN_API_KEY"
|
||||
violation_threshold: 0.7 # 0.0–1.0; higher = more permissive
|
||||
reasoning_mode: "standard" # standard | fast
|
||||
policy_id: "" # optional: single policy ID
|
||||
policy_ids: [] # optional: multiple policy IDs
|
||||
rules: {} # optional: inline rule map
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Rules
|
||||
|
||||
Rules are CEL expressions that fire when their condition is met. Available CEL variables:
|
||||
|
||||
| Variable | Type | Description |
|
||||
|----------|------|-------------|
|
||||
| `model` | `string` | Model name from the request |
|
||||
| `provider` | `string` | Provider name (e.g. `"openai"`) |
|
||||
| `headers` | `map<string,string>` | HTTP request headers |
|
||||
| `params` | `map<string,string>` | Query parameters |
|
||||
| `customer` | `string` | Customer ID |
|
||||
| `team` | `string` | Team ID |
|
||||
| `user` | `string` | User ID |
|
||||
|
||||
Rule fields:
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `id` | Yes | Unique integer ID |
|
||||
| `name` | Yes | Human-readable name |
|
||||
| `description` | No | Optional description |
|
||||
| `enabled` | Yes | `true` to activate |
|
||||
| `cel_expression` | Yes | CEL boolean expression; `"true"` matches all requests |
|
||||
| `apply_to` | Yes | `"input"`, `"output"`, or `"both"` |
|
||||
| `sampling_rate` | No | `0`–`100`; percentage of requests to check (default: 100) |
|
||||
| `timeout` | No | Rule timeout in seconds |
|
||||
| `provider_config_ids` | No | Provider `id`s to invoke when this rule matches |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
guardrails:
|
||||
rules:
|
||||
- id: 101
|
||||
name: "block-secrets-input"
|
||||
description: "Block prompts containing API keys"
|
||||
enabled: true
|
||||
cel_expression: "true"
|
||||
apply_to: "input"
|
||||
sampling_rate: 100
|
||||
timeout: 10
|
||||
provider_config_ids: [1]
|
||||
|
||||
- id: 102
|
||||
name: "azure-output-gpt4o"
|
||||
description: "Scan GPT-4o responses"
|
||||
enabled: true
|
||||
cel_expression: "model == 'gpt-4o'"
|
||||
apply_to: "output"
|
||||
sampling_rate: 100
|
||||
timeout: 15
|
||||
provider_config_ids: [3]
|
||||
|
||||
- id: 103
|
||||
name: "grayswan-openai-input"
|
||||
enabled: true
|
||||
cel_expression: "provider == 'openai'"
|
||||
apply_to: "input"
|
||||
sampling_rate: 50
|
||||
timeout: 20
|
||||
provider_config_ids: [4]
|
||||
|
||||
- id: 104
|
||||
name: "strict-team-check"
|
||||
enabled: true
|
||||
cel_expression: "team == 'team-platform'"
|
||||
apply_to: "both"
|
||||
sampling_rate: 100
|
||||
timeout: 30
|
||||
provider_config_ids: [1, 3] # multiple providers run in parallel
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full example
|
||||
|
||||
```yaml
|
||||
# guardrails-values.yaml
|
||||
image:
|
||||
tag: "latest"
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "encryption-key"
|
||||
|
||||
guardrails:
|
||||
providers:
|
||||
- id: 1
|
||||
provider_name: "regex"
|
||||
policy_name: "block-secrets"
|
||||
enabled: true
|
||||
timeout: 5
|
||||
config:
|
||||
patterns:
|
||||
- pattern: "sk-[A-Za-z0-9]{20,}"
|
||||
description: "OpenAI API key"
|
||||
- pattern: "AKIA[0-9A-Z]{16}"
|
||||
description: "AWS access key"
|
||||
- pattern: "gh[ps]_[A-Za-z0-9]{36}"
|
||||
description: "GitHub token"
|
||||
|
||||
- id: 2
|
||||
provider_name: "azure"
|
||||
policy_name: "content-safety"
|
||||
enabled: true
|
||||
timeout: 10
|
||||
config:
|
||||
endpoint: "https://your-resource.cognitiveservices.azure.com"
|
||||
api_key: "env.AZURE_CONTENT_SAFETY_KEY"
|
||||
analyze_enabled: true
|
||||
analyze_severity_threshold: "medium"
|
||||
jailbreak_shield_enabled: true
|
||||
indirect_attack_shield_enabled: false
|
||||
copyright_enabled: false
|
||||
text_blocklist_enabled: false
|
||||
|
||||
rules:
|
||||
- id: 101
|
||||
name: "block-secrets-input"
|
||||
description: "Block prompts leaking credentials"
|
||||
enabled: true
|
||||
cel_expression: "true"
|
||||
apply_to: "input"
|
||||
sampling_rate: 100
|
||||
timeout: 10
|
||||
provider_config_ids: [1]
|
||||
|
||||
- id: 102
|
||||
name: "content-safety-both"
|
||||
description: "Azure content safety on input and output"
|
||||
enabled: true
|
||||
cel_expression: "true"
|
||||
apply_to: "both"
|
||||
sampling_rate: 100
|
||||
timeout: 15
|
||||
provider_config_ids: [2]
|
||||
```
|
||||
|
||||
```bash
|
||||
kubectl create secret generic azure-content-safety \
|
||||
--from-literal=key='your-azure-content-safety-api-key'
|
||||
|
||||
helm install bifrost bifrost/bifrost \
|
||||
-f guardrails-values.yaml \
|
||||
--set env[0].name=AZURE_CONTENT_SAFETY_KEY \
|
||||
--set env[0].valueFrom.secretKeyRef.name=azure-content-safety \
|
||||
--set env[0].valueFrom.secretKeyRef.key=key
|
||||
```
|
||||
549
docs/deployment-guides/helm/plugins.mdx
Normal file
549
docs/deployment-guides/helm/plugins.mdx
Normal file
@@ -0,0 +1,549 @@
|
||||
---
|
||||
title: "Plugins"
|
||||
description: "Configure Bifrost plugins in Helm — telemetry, logging, semantic cache, OpenTelemetry, Datadog, governance, and custom plugins"
|
||||
icon: "puzzle-piece"
|
||||
---
|
||||
|
||||
Plugins are configured under `bifrost.plugins`. Each plugin is independently enabled/disabled. Pre-hooks run in registration order; post-hooks run in reverse order.
|
||||
|
||||
<Note>
|
||||
**Telemetry, logging, and governance are auto-loaded built-ins** — they are always active and do not need to be explicitly enabled. Their configuration lives in `bifrost.client.*` and `bifrost.governance.*`, not in the `plugins` block.
|
||||
|
||||
The `plugins` block controls the opt-in plugins: `semanticCache`, `otel`, `datadog`, `maxim`, and custom plugins.
|
||||
</Note>
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
plugins:
|
||||
semanticCache:
|
||||
enabled: false
|
||||
otel:
|
||||
enabled: false
|
||||
datadog:
|
||||
enabled: false
|
||||
```
|
||||
|
||||
```bash
|
||||
# Enable an opt-in plugin at install time
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
--set bifrost.plugins.otel.enabled=true
|
||||
|
||||
# Or upgrade to enable a plugin without touching other values
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set bifrost.plugins.semanticCache.enabled=true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="Telemetry">
|
||||
|
||||
### Telemetry (Prometheus)
|
||||
|
||||
<Note>
|
||||
Telemetry is **always active** — it cannot be disabled. You do not need to set `bifrost.plugins.telemetry.enabled`.
|
||||
</Note>
|
||||
|
||||
Exposes Prometheus metrics at `GET /metrics`. Custom labels are set via `bifrost.client.prometheusLabels`:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
prometheusLabels:
|
||||
- "environment=production"
|
||||
- "region=us-east-1"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Verify metrics are exposed
|
||||
kubectl port-forward svc/bifrost 8080:8080 &
|
||||
curl http://localhost:8080/metrics | head -30
|
||||
```
|
||||
|
||||
**With Prometheus Push Gateway** (recommended for multi-replica / HA setups where pull-based scraping can miss pods):
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
plugins:
|
||||
telemetry:
|
||||
enabled: true
|
||||
config:
|
||||
push_gateway:
|
||||
enabled: true
|
||||
push_gateway_url: "http://prometheus-pushgateway.monitoring.svc.cluster.local:9091"
|
||||
job_name: "bifrost"
|
||||
instance_id: "" # auto-derived from pod name if empty
|
||||
push_interval: 15
|
||||
basic_auth:
|
||||
username: ""
|
||||
password: ""
|
||||
```
|
||||
|
||||
**ServiceMonitor for Prometheus Operator:**
|
||||
|
||||
```yaml
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
interval: 30s
|
||||
scrapeTimeout: 10s
|
||||
namespace: monitoring # namespace where Prometheus is deployed
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Logging">
|
||||
|
||||
### Request/Response Logging
|
||||
|
||||
<Note>
|
||||
Logging is **auto-loaded** when `bifrost.client.enableLogging: true` and a log store is configured. You do not need to set `bifrost.plugins.logging.enabled`.
|
||||
</Note>
|
||||
|
||||
Configure logging via the `client` block:
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.enableLogging` | Enable request/response logging | `true` |
|
||||
| `bifrost.client.disableContentLogging` | Strip message body from logs (HIPAA/PCI) | `false` |
|
||||
| `bifrost.client.loggingHeaders` | HTTP headers to capture in log metadata | `[]` |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
enableLogging: true
|
||||
disableContentLogging: false # set true for HIPAA/compliance
|
||||
loggingHeaders:
|
||||
- "x-request-id"
|
||||
- "x-user-id"
|
||||
- "x-team-id"
|
||||
```
|
||||
|
||||
```bash
|
||||
# Verify logs are being written
|
||||
kubectl port-forward svc/bifrost 8080:8080 &
|
||||
curl -s "http://localhost:8080/api/logs?limit=5" | jq .
|
||||
```
|
||||
|
||||
See [Client Configuration](/deployment-guides/helm/client) for the full reference.
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Governance">
|
||||
|
||||
### Governance
|
||||
|
||||
<Note>
|
||||
Governance is **always active** for OSS deployments. You do not need to set `bifrost.plugins.governance.enabled`.
|
||||
</Note>
|
||||
|
||||
Virtual key enforcement is controlled by the `client` block:
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.client.enforceAuthOnInference` | Require a virtual key (`x-bf-vk`) on every inference request | `false` |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
client:
|
||||
enforceAuthOnInference: true # require virtual key on all inference requests
|
||||
```
|
||||
|
||||
Define virtual keys, budgets, rate limits, and routing rules in `bifrost.governance.*`. See the [Governance](/deployment-guides/helm/governance) page.
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Semantic Cache">
|
||||
|
||||
### Semantic Cache
|
||||
|
||||
Caches LLM responses using vector similarity so semantically equivalent prompts return cached answers.
|
||||
|
||||
Two modes:
|
||||
- **Semantic mode** (`dimension > 1`): uses an embedding model + vector store for similarity search
|
||||
- **Direct / hash mode** (`dimension: 1`): exact-match hash-based caching, no embedding model needed
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.plugins.semanticCache.enabled` | Enable semantic caching | `false` |
|
||||
| `bifrost.plugins.semanticCache.version` | Plugin config version for DB-backed update tracking (`1` to `32767`) | `1` |
|
||||
| `bifrost.plugins.semanticCache.config.provider` | Embedding provider | `"openai"` |
|
||||
| `bifrost.plugins.semanticCache.config.embedding_model` | Embedding model name | `"text-embedding-3-small"` |
|
||||
| `bifrost.plugins.semanticCache.config.dimension` | Embedding dimension (`1` = direct/hash mode) | `1536` |
|
||||
| `bifrost.plugins.semanticCache.config.threshold` | Cosine similarity threshold (0–1) | `0.8` |
|
||||
| `bifrost.plugins.semanticCache.config.ttl` | Cache entry TTL (Go duration) | `"5m"` |
|
||||
| `bifrost.plugins.semanticCache.config.conversation_history_threshold` | Number of past messages to include in cache key | `3` |
|
||||
| `bifrost.plugins.semanticCache.config.cache_by_model` | Include model name in cache key | `true` |
|
||||
| `bifrost.plugins.semanticCache.config.cache_by_provider` | Include provider name in cache key | `true` |
|
||||
| `bifrost.plugins.semanticCache.config.exclude_system_prompt` | Exclude system prompt from cache key | `false` |
|
||||
| `bifrost.plugins.semanticCache.config.cleanup_on_shutdown` | Delete cache data on pod shutdown | `false` |
|
||||
|
||||
**Semantic mode (with OpenAI embeddings + Weaviate):**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic semantic-cache-secret \
|
||||
--from-literal=openai-key='sk-your-openai-embedding-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# semantic-cache-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: weaviate
|
||||
weaviate:
|
||||
enabled: true
|
||||
persistence:
|
||||
size: 20Gi
|
||||
|
||||
bifrost:
|
||||
plugins:
|
||||
semanticCache:
|
||||
enabled: true
|
||||
config:
|
||||
provider: "openai"
|
||||
keys:
|
||||
- value: "env.SEMANTIC_CACHE_OPENAI_KEY"
|
||||
weight: 1
|
||||
embedding_model: "text-embedding-3-small"
|
||||
dimension: 1536
|
||||
threshold: 0.85
|
||||
ttl: "1h"
|
||||
conversation_history_threshold: 5
|
||||
cache_by_model: true
|
||||
cache_by_provider: true
|
||||
|
||||
providerSecrets:
|
||||
semantic-cache-key:
|
||||
existingSecret: "semantic-cache-secret"
|
||||
key: "openai-key"
|
||||
envVar: "SEMANTIC_CACHE_OPENAI_KEY"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f semantic-cache-values.yaml
|
||||
```
|
||||
|
||||
**Direct / hash mode** (no embedding provider needed):
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
plugins:
|
||||
semanticCache:
|
||||
enabled: true
|
||||
config:
|
||||
dimension: 1 # triggers hash-based exact matching
|
||||
ttl: "30m"
|
||||
cache_by_model: true
|
||||
cache_by_provider: true
|
||||
```
|
||||
|
||||
<Note>
|
||||
The vector store (`vectorStore.*`) must be configured and enabled for semantic mode. Direct/hash mode works without a vector store but still requires a storage backend.
|
||||
</Note>
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="OpenTelemetry">
|
||||
|
||||
### OpenTelemetry (OTel)
|
||||
|
||||
Sends distributed traces and push-based metrics to any OTLP-compatible collector (Jaeger, Tempo, Honeycomb, etc.).
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.plugins.otel.enabled` | Enable OTel tracing | `false` |
|
||||
| `bifrost.plugins.otel.version` | Plugin config version for DB-backed update tracking (`1` to `32767`) | `1` |
|
||||
| `bifrost.plugins.otel.config.service_name` | Service name in traces | `"bifrost"` |
|
||||
| `bifrost.plugins.otel.config.collector_url` | OTLP collector endpoint | `""` |
|
||||
| `bifrost.plugins.otel.config.trace_type` | Trace type (`genai_extension`, `vercel`, or `open_inference`) | `"genai_extension"` |
|
||||
| `bifrost.plugins.otel.config.protocol` | Transport protocol (`grpc` or `http`) | `"grpc"` |
|
||||
| `bifrost.plugins.otel.config.metrics_enabled` | Enable OTLP push-based metrics | `false` |
|
||||
| `bifrost.plugins.otel.config.metrics_endpoint` | OTLP metrics endpoint | `""` |
|
||||
| `bifrost.plugins.otel.config.metrics_push_interval` | Push interval in seconds | `15` |
|
||||
| `bifrost.plugins.otel.config.headers` | Custom headers for the collector | `{}` |
|
||||
| `bifrost.plugins.otel.config.insecure` | Skip TLS verification | `false` |
|
||||
| `bifrost.plugins.otel.config.tls_ca_cert` | Path to CA cert for TLS | `""` |
|
||||
|
||||
```yaml
|
||||
# otel-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
plugins:
|
||||
otel:
|
||||
enabled: true
|
||||
config:
|
||||
service_name: "bifrost-production"
|
||||
collector_url: "otel-collector.observability.svc.cluster.local:4317"
|
||||
trace_type: "genai_extension"
|
||||
protocol: "grpc"
|
||||
insecure: true # set false in production with a proper cert
|
||||
metrics_enabled: true
|
||||
metrics_endpoint: "otel-collector.observability.svc.cluster.local:4317"
|
||||
metrics_push_interval: 15
|
||||
headers:
|
||||
x-honeycomb-team: "env.HONEYCOMB_API_KEY"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values -f otel-values.yaml
|
||||
```
|
||||
|
||||
**With authentication headers from a Kubernetes Secret:**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic otel-credentials \
|
||||
--from-literal=api-key='your-honeycomb-or-grafana-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
plugins:
|
||||
otel:
|
||||
enabled: true
|
||||
config:
|
||||
collector_url: "api.honeycomb.io:443"
|
||||
protocol: "grpc"
|
||||
headers:
|
||||
x-honeycomb-team: "env.OTEL_API_KEY"
|
||||
|
||||
providerSecrets:
|
||||
otel-key:
|
||||
existingSecret: "otel-credentials"
|
||||
key: "api-key"
|
||||
envVar: "OTEL_API_KEY"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Datadog">
|
||||
|
||||
### Datadog APM
|
||||
|
||||
Sends traces to a Datadog Agent running in the cluster.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.plugins.datadog.enabled` | Enable Datadog tracing | `false` |
|
||||
| `bifrost.plugins.datadog.version` | Plugin config version for DB-backed update tracking (`1` to `32767`) | `1` |
|
||||
| `bifrost.plugins.datadog.config.service_name` | Service name | `"bifrost"` |
|
||||
| `bifrost.plugins.datadog.config.agent_addr` | Datadog Agent address | `"localhost:8126"` |
|
||||
| `bifrost.plugins.datadog.config.env` | Deployment environment tag | `""` |
|
||||
| `bifrost.plugins.datadog.config.version` | Version tag | `""` |
|
||||
| `bifrost.plugins.datadog.config.enable_traces` | Enable trace collection | `true` |
|
||||
| `bifrost.plugins.datadog.config.custom_tags` | Extra tags on all spans | `{}` |
|
||||
|
||||
The Datadog Agent is typically deployed via the [Datadog Helm chart](https://docs.datadoghq.com/containers/kubernetes/installation/) as a DaemonSet, making it available at the node's hostIP.
|
||||
|
||||
```yaml
|
||||
# datadog-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
plugins:
|
||||
datadog:
|
||||
enabled: true
|
||||
config:
|
||||
service_name: "bifrost"
|
||||
agent_addr: "$(HOST_IP):8126" # uses Datadog DaemonSet pattern
|
||||
env: "production"
|
||||
version: "v1.4.11"
|
||||
enable_traces: true
|
||||
custom_tags:
|
||||
team: "platform"
|
||||
region: "us-east-1"
|
||||
|
||||
# Inject HOST_IP so Bifrost can reach the DaemonSet agent on the same node
|
||||
env:
|
||||
- name: HOST_IP
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: status.hostIP
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values -f datadog-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Maxim">
|
||||
|
||||
### Maxim Observability
|
||||
|
||||
Sends LLM request/response data to [Maxim](https://getmaxim.ai) for tracing, evaluation, and observability.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.plugins.maxim.enabled` | Enable Maxim plugin | `false` |
|
||||
| `bifrost.plugins.maxim.version` | Plugin config version for DB-backed update tracking (`1` to `32767`) | `1` |
|
||||
| `bifrost.plugins.maxim.config.api_key` | Maxim API key (plain text, prefer secret) | `""` |
|
||||
| `bifrost.plugins.maxim.config.log_repo_id` | Maxim log repository ID | `""` |
|
||||
| `bifrost.plugins.maxim.secretRef.name` | Kubernetes Secret name for API key | `""` |
|
||||
| `bifrost.plugins.maxim.secretRef.key` | Key within the secret | `"api-key"` |
|
||||
|
||||
```bash
|
||||
kubectl create secret generic maxim-credentials \
|
||||
--from-literal=api-key='your-maxim-api-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# maxim-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
plugins:
|
||||
maxim:
|
||||
enabled: true
|
||||
config:
|
||||
log_repo_id: "your-log-repo-id"
|
||||
secretRef:
|
||||
name: "maxim-credentials"
|
||||
key: "api-key"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values -f maxim-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Custom Plugin">
|
||||
|
||||
### Custom / Dynamic Plugins
|
||||
|
||||
Load a custom Go plugin (compiled `.so` file) at runtime.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.plugins.custom[].name` | Unique plugin name | `""` |
|
||||
| `bifrost.plugins.custom[].enabled` | Enable custom plugin | `false` |
|
||||
| `bifrost.plugins.custom[].path` | Path to compiled `.so` file in the container | `""` |
|
||||
| `bifrost.plugins.custom[].version` | Plugin config version (`1` to `32767`) | `1` |
|
||||
| `bifrost.plugins.custom[].config` | Arbitrary plugin-specific configuration | `{}` |
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
plugins:
|
||||
custom:
|
||||
- name: "my-custom-plugin"
|
||||
enabled: true
|
||||
path: "/plugins/my-plugin.so"
|
||||
version: 1
|
||||
config:
|
||||
api_endpoint: "https://my-service.example.com"
|
||||
timeout: 5000
|
||||
```
|
||||
|
||||
Mount the `.so` file via a volume:
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- name: custom-plugins
|
||||
configMap:
|
||||
name: bifrost-custom-plugins
|
||||
|
||||
volumeMounts:
|
||||
- name: custom-plugins
|
||||
mountPath: /plugins
|
||||
```
|
||||
|
||||
Or use an init container to download the plugin binary:
|
||||
|
||||
```yaml
|
||||
initContainers:
|
||||
- name: download-plugin
|
||||
image: curlimages/curl:8.6.0
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
- |
|
||||
curl -fsSL https://plugins.example.com/my-plugin.so \
|
||||
-o /plugins/my-plugin.so
|
||||
volumeMounts:
|
||||
- name: plugin-dir
|
||||
mountPath: /plugins
|
||||
|
||||
volumes:
|
||||
- name: plugin-dir
|
||||
emptyDir: {}
|
||||
|
||||
volumeMounts:
|
||||
- name: plugin-dir
|
||||
mountPath: /plugins
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values -f custom-plugin-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## All Plugins Together
|
||||
|
||||
```yaml
|
||||
# all-plugins-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "encryption-key"
|
||||
|
||||
plugins:
|
||||
telemetry:
|
||||
enabled: true
|
||||
config:
|
||||
custom_labels:
|
||||
- name: "environment"
|
||||
value: "production"
|
||||
|
||||
logging:
|
||||
enabled: true
|
||||
config:
|
||||
disable_content_logging: false
|
||||
logging_headers:
|
||||
- "x-request-id"
|
||||
|
||||
governance:
|
||||
enabled: true
|
||||
config:
|
||||
is_vk_mandatory: true
|
||||
|
||||
semanticCache:
|
||||
enabled: true
|
||||
config:
|
||||
provider: "openai"
|
||||
keys:
|
||||
- value: "env.CACHE_OPENAI_KEY"
|
||||
weight: 1
|
||||
embedding_model: "text-embedding-3-small"
|
||||
dimension: 1536
|
||||
threshold: 0.85
|
||||
ttl: "1h"
|
||||
|
||||
otel:
|
||||
enabled: true
|
||||
config:
|
||||
service_name: "bifrost"
|
||||
collector_url: "otel-collector.observability.svc.cluster.local:4317"
|
||||
protocol: "grpc"
|
||||
insecure: true
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f all-plugins-values.yaml
|
||||
```
|
||||
941
docs/deployment-guides/helm/providers.mdx
Normal file
941
docs/deployment-guides/helm/providers.mdx
Normal file
@@ -0,0 +1,941 @@
|
||||
---
|
||||
title: "Provider Setup"
|
||||
description: "Configure LLM providers in the Bifrost Helm chart — API keys, cloud-native auth, and self-hosted endpoints"
|
||||
icon: "plug"
|
||||
---
|
||||
|
||||
All providers are configured under `bifrost.providers` in your values file. Each provider entry contains a `keys` list where each key has a `name`, `value`, `weight`, and optional provider-specific config.
|
||||
|
||||
**Two ways to supply credentials:**
|
||||
|
||||
- **Direct value** — `value: "sk-..."` (fine for dev; avoid in production)
|
||||
- **Kubernetes Secret + env var** — store the key in a Secret, inject as an env var, and reference it with `value: "env.VAR_NAME"`
|
||||
|
||||
The `providerSecrets` block handles the Secret → env var injection automatically:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "primary"
|
||||
value: "env.OPENAI_API_KEY" # resolved at runtime
|
||||
weight: 1
|
||||
|
||||
providerSecrets:
|
||||
openai:
|
||||
existingSecret: "my-openai-secret"
|
||||
key: "api-key"
|
||||
envVar: "OPENAI_API_KEY" # injected into the pod
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="OpenAI">
|
||||
|
||||
### OpenAI
|
||||
|
||||
Supports multiple keys with weighted load balancing. The key with `use_for_batch_api: true` is eligible for the Batch API.
|
||||
|
||||
**Step 1 — Create secret**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic openai-credentials \
|
||||
--from-literal=api-key-1='sk-your-primary-key' \
|
||||
--from-literal=api-key-2='sk-your-secondary-key' \
|
||||
--from-literal=api-key-batch='sk-your-batch-key'
|
||||
```
|
||||
|
||||
**Step 2 — Values file**
|
||||
|
||||
```yaml
|
||||
# openai-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "env.OPENAI_KEY_1"
|
||||
weight: 2 # 50% of traffic
|
||||
models: ["*"]
|
||||
- name: "openai-secondary"
|
||||
value: "env.OPENAI_KEY_2"
|
||||
weight: 1 # 25%
|
||||
models: ["gpt-4o-mini"] # restrict to cheaper model
|
||||
- name: "openai-batch"
|
||||
value: "env.OPENAI_KEY_BATCH"
|
||||
weight: 1 # 25%
|
||||
models: ["*"]
|
||||
use_for_batch_api: true
|
||||
|
||||
providerSecrets:
|
||||
openai-key-1:
|
||||
existingSecret: "openai-credentials"
|
||||
key: "api-key-1"
|
||||
envVar: "OPENAI_KEY_1"
|
||||
openai-key-2:
|
||||
existingSecret: "openai-credentials"
|
||||
key: "api-key-2"
|
||||
envVar: "OPENAI_KEY_2"
|
||||
openai-key-batch:
|
||||
existingSecret: "openai-credentials"
|
||||
key: "api-key-batch"
|
||||
envVar: "OPENAI_KEY_BATCH"
|
||||
```
|
||||
|
||||
**Step 3 — Install**
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f openai-values.yaml
|
||||
```
|
||||
|
||||
**Optional — per-provider network config**
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "primary"
|
||||
value: "env.OPENAI_KEY_1"
|
||||
weight: 1
|
||||
network_config:
|
||||
default_request_timeout_in_seconds: 120
|
||||
max_retries: 3
|
||||
retry_backoff_initial_ms: 500
|
||||
retry_backoff_max_ms: 5000
|
||||
max_conns_per_host: 5000
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Anthropic">
|
||||
|
||||
### Anthropic
|
||||
|
||||
```bash
|
||||
kubectl create secret generic anthropic-credentials \
|
||||
--from-literal=api-key-1='sk-ant-your-primary-key' \
|
||||
--from-literal=api-key-2='sk-ant-your-secondary-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# anthropic-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
anthropic:
|
||||
keys:
|
||||
- name: "anthropic-primary"
|
||||
value: "env.ANTHROPIC_KEY_1"
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
- name: "anthropic-secondary"
|
||||
value: "env.ANTHROPIC_KEY_2"
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
|
||||
providerSecrets:
|
||||
anthropic-key-1:
|
||||
existingSecret: "anthropic-credentials"
|
||||
key: "api-key-1"
|
||||
envVar: "ANTHROPIC_KEY_1"
|
||||
anthropic-key-2:
|
||||
existingSecret: "anthropic-credentials"
|
||||
key: "api-key-2"
|
||||
envVar: "ANTHROPIC_KEY_2"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f anthropic-values.yaml
|
||||
```
|
||||
|
||||
**Override Anthropic beta headers** (optional):
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
anthropic:
|
||||
keys:
|
||||
- name: "primary"
|
||||
value: "env.ANTHROPIC_KEY_1"
|
||||
weight: 1
|
||||
network_config:
|
||||
beta_header_overrides:
|
||||
redact-thinking-: true
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Azure OpenAI">
|
||||
|
||||
### Azure OpenAI
|
||||
|
||||
Azure requires `azure_key_config` on every key with `endpoint` and `api_version`. Use top-level `aliases` to map logical model names to Azure deployment names.
|
||||
|
||||
Two auth modes are supported:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="API Key">
|
||||
|
||||
**Step 1 — Create secret**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic azure-credentials \
|
||||
--from-literal=api-key='your-azure-openai-api-key' \
|
||||
--from-literal=endpoint='https://your-resource.openai.azure.com'
|
||||
```
|
||||
|
||||
**Step 2 — Values file**
|
||||
|
||||
```yaml
|
||||
# azure-apikey-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
azure:
|
||||
keys:
|
||||
- name: "azure-primary"
|
||||
value: "env.AZURE_API_KEY"
|
||||
weight: 1
|
||||
models: ["gpt-4o", "gpt-4o-mini", "text-embedding-3-small"]
|
||||
azure_key_config:
|
||||
endpoint: "env.AZURE_ENDPOINT"
|
||||
api_version: "2024-10-21"
|
||||
aliases:
|
||||
gpt-4o: "gpt-4o-prod"
|
||||
gpt-4o-mini: "gpt-4o-mini-prod"
|
||||
text-embedding-3-small: "embeddings-prod"
|
||||
|
||||
providerSecrets:
|
||||
azure-api-key:
|
||||
existingSecret: "azure-credentials"
|
||||
key: "api-key"
|
||||
envVar: "AZURE_API_KEY"
|
||||
azure-endpoint:
|
||||
existingSecret: "azure-credentials"
|
||||
key: "endpoint"
|
||||
envVar: "AZURE_ENDPOINT"
|
||||
```
|
||||
|
||||
**Step 3 — Install**
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f azure-apikey-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Managed Identity / Workload Identity">
|
||||
|
||||
When `value` is empty, Bifrost uses `DefaultAzureCredential` — which automatically resolves credentials from:
|
||||
- AKS Workload Identity (recommended for production)
|
||||
- Azure VM managed identity
|
||||
- `az login` (developer machines)
|
||||
|
||||
**Step 1 — Annotate the service account** (AKS Workload Identity)
|
||||
|
||||
```bash
|
||||
# Associate the Kubernetes service account with your Azure managed identity
|
||||
kubectl annotate serviceaccount bifrost \
|
||||
azure.workload.identity/client-id="<MANAGED_IDENTITY_CLIENT_ID>"
|
||||
```
|
||||
|
||||
```yaml
|
||||
serviceAccount:
|
||||
annotations:
|
||||
azure.workload.identity/client-id: "<MANAGED_IDENTITY_CLIENT_ID>"
|
||||
```
|
||||
|
||||
**Step 2 — Values file**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic azure-config \
|
||||
--from-literal=endpoint='https://your-resource.openai.azure.com'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# azure-msi-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
serviceAccount:
|
||||
annotations:
|
||||
azure.workload.identity/client-id: "<MANAGED_IDENTITY_CLIENT_ID>"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
azure:
|
||||
keys:
|
||||
- name: "azure-workload-identity"
|
||||
value: "" # empty = DefaultAzureCredential
|
||||
weight: 1
|
||||
models: ["gpt-4o"]
|
||||
azure_key_config:
|
||||
endpoint: "env.AZURE_ENDPOINT"
|
||||
api_version: "2024-10-21"
|
||||
aliases:
|
||||
gpt-4o: "gpt-4o-prod"
|
||||
|
||||
providerSecrets:
|
||||
azure-endpoint:
|
||||
existingSecret: "azure-config"
|
||||
key: "endpoint"
|
||||
envVar: "AZURE_ENDPOINT"
|
||||
```
|
||||
|
||||
**Step 3 — Install**
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f azure-msi-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
**Multi-region failover** (two deployments, different regions):
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
azure:
|
||||
keys:
|
||||
- name: "eastus"
|
||||
value: "env.AZURE_KEY_EAST"
|
||||
weight: 1
|
||||
azure_key_config:
|
||||
endpoint: "env.AZURE_ENDPOINT_EAST"
|
||||
api_version: "2024-10-21"
|
||||
aliases:
|
||||
gpt-4o: "gpt-4o-eastus"
|
||||
- name: "westus"
|
||||
value: "env.AZURE_KEY_WEST"
|
||||
weight: 1
|
||||
azure_key_config:
|
||||
endpoint: "env.AZURE_ENDPOINT_WEST"
|
||||
api_version: "2024-10-21"
|
||||
aliases:
|
||||
gpt-4o: "gpt-4o-westus"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="AWS Bedrock">
|
||||
|
||||
### AWS Bedrock
|
||||
|
||||
Bedrock requires `bedrock_key_config` with at minimum a `region`. Three auth modes:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Static Credentials">
|
||||
|
||||
```bash
|
||||
kubectl create secret generic aws-credentials \
|
||||
--from-literal=access-key-id='AKIAIOSFODNN7EXAMPLE' \
|
||||
--from-literal=secret-access-key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# bedrock-static-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
bedrock:
|
||||
keys:
|
||||
- name: "bedrock-static"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
bedrock_key_config:
|
||||
region: "us-east-1"
|
||||
access_key: "env.AWS_ACCESS_KEY_ID"
|
||||
secret_key: "env.AWS_SECRET_ACCESS_KEY"
|
||||
deployments:
|
||||
# Logical name -> Bedrock inference profile
|
||||
anthropic.claude-3-5-sonnet: "us.anthropic.claude-3-5-sonnet-20240620-v1:0"
|
||||
|
||||
providerSecrets:
|
||||
aws-access-key:
|
||||
existingSecret: "aws-credentials"
|
||||
key: "access-key-id"
|
||||
envVar: "AWS_ACCESS_KEY_ID"
|
||||
aws-secret-key:
|
||||
existingSecret: "aws-credentials"
|
||||
key: "secret-access-key"
|
||||
envVar: "AWS_SECRET_ACCESS_KEY"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f bedrock-static-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="IRSA / EKS Pod Identity">
|
||||
|
||||
When only `region` is set, Bifrost inherits credentials from the AWS SDK default chain — IRSA (IAM Roles for Service Accounts), EC2 instance profile, or `AWS_*` env vars.
|
||||
|
||||
**Step 1 — Annotate the service account with the IAM role**
|
||||
|
||||
```bash
|
||||
kubectl annotate serviceaccount bifrost \
|
||||
eks.amazonaws.com/role-arn="arn:aws:iam::123456789012:role/BifrostBedrockRole"
|
||||
```
|
||||
|
||||
```yaml
|
||||
serviceAccount:
|
||||
annotations:
|
||||
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/BifrostBedrockRole"
|
||||
```
|
||||
|
||||
**Step 2 — Values file**
|
||||
|
||||
```yaml
|
||||
# bedrock-irsa-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
serviceAccount:
|
||||
annotations:
|
||||
eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/BifrostBedrockRole"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
bedrock:
|
||||
keys:
|
||||
- name: "bedrock-irsa"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
bedrock_key_config:
|
||||
region: "us-east-1"
|
||||
# No access_key / secret_key — SDK uses IRSA token automatically
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f bedrock-irsa-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="STS AssumeRole">
|
||||
|
||||
Assumes a cross-account role on top of the default credential chain.
|
||||
|
||||
```yaml
|
||||
# bedrock-assumerole-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
bedrock:
|
||||
keys:
|
||||
- name: "bedrock-assumerole"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
bedrock_key_config:
|
||||
region: "us-west-2"
|
||||
# Source identity from pod's default chain, then assume this role
|
||||
role_arn: "env.AWS_ROLE_ARN"
|
||||
external_id: "env.AWS_EXTERNAL_ID"
|
||||
session_name: "bifrost-session"
|
||||
```
|
||||
|
||||
```bash
|
||||
kubectl create secret generic aws-role-config \
|
||||
--from-literal=role-arn='arn:aws:iam::999999999999:role/CrossAccountBedrockRole' \
|
||||
--from-literal=external-id='your-external-id'
|
||||
```
|
||||
|
||||
```yaml
|
||||
providerSecrets:
|
||||
aws-role-arn:
|
||||
existingSecret: "aws-role-config"
|
||||
key: "role-arn"
|
||||
envVar: "AWS_ROLE_ARN"
|
||||
aws-external-id:
|
||||
existingSecret: "aws-role-config"
|
||||
key: "external-id"
|
||||
envVar: "AWS_EXTERNAL_ID"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f bedrock-assumerole-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
**Batch API — S3 configuration**
|
||||
|
||||
```yaml
|
||||
bedrock_key_config:
|
||||
region: "us-east-1"
|
||||
access_key: "env.AWS_ACCESS_KEY_ID"
|
||||
secret_key: "env.AWS_SECRET_ACCESS_KEY"
|
||||
batch_s3_config:
|
||||
buckets:
|
||||
- bucket_name: "my-bedrock-batch-bucket"
|
||||
prefix: "batch/"
|
||||
is_default: true
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Google Vertex AI">
|
||||
|
||||
### Google Vertex AI
|
||||
|
||||
Vertex requires `vertex_key_config` with `project_id` and `region`. Two auth modes:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Service Account Key">
|
||||
|
||||
```bash
|
||||
# Base64-encode the service account JSON
|
||||
SA_JSON=$(cat service-account-key.json | base64 -w 0)
|
||||
|
||||
kubectl create secret generic gcp-credentials \
|
||||
--from-literal=service-account-json="${SA_JSON}"
|
||||
```
|
||||
|
||||
```yaml
|
||||
# vertex-sa-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
vertex:
|
||||
keys:
|
||||
- name: "vertex-sa-key"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
vertex_key_config:
|
||||
project_id: "env.VERTEX_PROJECT_ID"
|
||||
region: "us-central1"
|
||||
auth_credentials: "env.VERTEX_AUTH_CREDENTIALS"
|
||||
|
||||
providerSecrets:
|
||||
vertex-project-id:
|
||||
existingSecret: "gcp-credentials"
|
||||
key: "project-id"
|
||||
envVar: "VERTEX_PROJECT_ID"
|
||||
vertex-sa:
|
||||
existingSecret: "gcp-credentials"
|
||||
key: "service-account-json"
|
||||
envVar: "VERTEX_AUTH_CREDENTIALS"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f vertex-sa-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="GKE Workload Identity / ADC">
|
||||
|
||||
When `auth_credentials` is omitted, Bifrost calls `google.FindDefaultCredentials` — which resolves to:
|
||||
- GKE Workload Identity (recommended)
|
||||
- GCE metadata server (on Compute Engine / Cloud Run)
|
||||
- `GOOGLE_APPLICATION_CREDENTIALS` path
|
||||
- `gcloud auth application-default login` (developer machines)
|
||||
|
||||
**Step 1 — Annotate the service account** (GKE Workload Identity)
|
||||
|
||||
```bash
|
||||
gcloud iam service-accounts add-iam-policy-binding \
|
||||
bifrost-sa@my-project.iam.gserviceaccount.com \
|
||||
--role roles/iam.workloadIdentityUser \
|
||||
--member "serviceAccount:my-project.svc.id.goog[default/bifrost]"
|
||||
```
|
||||
|
||||
```yaml
|
||||
serviceAccount:
|
||||
annotations:
|
||||
iam.gke.io/gcp-service-account: "bifrost-sa@my-project.iam.gserviceaccount.com"
|
||||
```
|
||||
|
||||
**Step 2 — Values file**
|
||||
|
||||
```yaml
|
||||
# vertex-wli-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
serviceAccount:
|
||||
annotations:
|
||||
iam.gke.io/gcp-service-account: "bifrost-sa@my-project.iam.gserviceaccount.com"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
vertex:
|
||||
keys:
|
||||
- name: "vertex-workload-identity"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
vertex_key_config:
|
||||
project_id: "my-gcp-project"
|
||||
region: "us-central1"
|
||||
# auth_credentials intentionally omitted → ADC lookup
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f vertex-wli-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Groq / Mistral / Gemini / Others">
|
||||
|
||||
### Standard API-Key Providers
|
||||
|
||||
These providers follow the same simple pattern — one or more keys with weights.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Groq">
|
||||
|
||||
```bash
|
||||
kubectl create secret generic groq-credentials \
|
||||
--from-literal=api-key='gsk_your_groq_api_key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
groq:
|
||||
keys:
|
||||
- name: "groq-primary"
|
||||
value: "env.GROQ_API_KEY"
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
|
||||
providerSecrets:
|
||||
groq-key:
|
||||
existingSecret: "groq-credentials"
|
||||
key: "api-key"
|
||||
envVar: "GROQ_API_KEY"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Gemini">
|
||||
|
||||
```bash
|
||||
kubectl create secret generic gemini-credentials \
|
||||
--from-literal=api-key='your-gemini-api-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
gemini:
|
||||
keys:
|
||||
- name: "gemini-main"
|
||||
value: "env.GEMINI_API_KEY"
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
|
||||
providerSecrets:
|
||||
gemini-key:
|
||||
existingSecret: "gemini-credentials"
|
||||
key: "api-key"
|
||||
envVar: "GEMINI_API_KEY"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Mistral">
|
||||
|
||||
```bash
|
||||
kubectl create secret generic mistral-credentials \
|
||||
--from-literal=api-key='your-mistral-api-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
mistral:
|
||||
keys:
|
||||
- name: "mistral-main"
|
||||
value: "env.MISTRAL_API_KEY"
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
|
||||
providerSecrets:
|
||||
mistral-key:
|
||||
existingSecret: "mistral-credentials"
|
||||
key: "api-key"
|
||||
envVar: "MISTRAL_API_KEY"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Cohere / Perplexity / xAI / Others">
|
||||
|
||||
All standard API-key providers follow the same pattern. Replace the provider name and env var name accordingly:
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
cohere:
|
||||
keys:
|
||||
- name: "cohere-main"
|
||||
value: "env.COHERE_API_KEY"
|
||||
weight: 1
|
||||
perplexity:
|
||||
keys:
|
||||
- name: "perplexity-main"
|
||||
value: "env.PERPLEXITY_API_KEY"
|
||||
weight: 1
|
||||
xai:
|
||||
keys:
|
||||
- name: "xai-main"
|
||||
value: "env.XAI_API_KEY"
|
||||
weight: 1
|
||||
cerebras:
|
||||
keys:
|
||||
- name: "cerebras-main"
|
||||
value: "env.CEREBRAS_API_KEY"
|
||||
weight: 1
|
||||
openrouter:
|
||||
keys:
|
||||
- name: "openrouter-main"
|
||||
value: "env.OPENROUTER_API_KEY"
|
||||
weight: 1
|
||||
nebius:
|
||||
keys:
|
||||
- name: "nebius-main"
|
||||
value: "env.NEBIUS_API_KEY"
|
||||
weight: 1
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
**Install command (any of the above)**
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
-f provider-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Self-Hosted">
|
||||
|
||||
### Self-Hosted Providers
|
||||
|
||||
Self-hosted providers point to a URL you operate. No API key is typically required (`value: ""`).
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Ollama">
|
||||
|
||||
```yaml
|
||||
# ollama-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
ollama:
|
||||
keys:
|
||||
- name: "ollama-local"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
ollama_key_config:
|
||||
url: "http://ollama.default.svc.cluster.local:11434"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f ollama-values.yaml
|
||||
```
|
||||
|
||||
Using an env var for the URL (useful across environments):
|
||||
|
||||
```bash
|
||||
kubectl create secret generic ollama-config \
|
||||
--from-literal=url='http://ollama.default.svc.cluster.local:11434'
|
||||
```
|
||||
|
||||
```yaml
|
||||
ollama_key_config:
|
||||
url: "env.OLLAMA_URL"
|
||||
|
||||
providerSecrets:
|
||||
ollama-url:
|
||||
existingSecret: "ollama-config"
|
||||
key: "url"
|
||||
envVar: "OLLAMA_URL"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="vLLM">
|
||||
|
||||
vLLM instances are model-specific — one key per served model.
|
||||
|
||||
```yaml
|
||||
# vllm-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
vllm:
|
||||
keys:
|
||||
- name: "vllm-llama3-70b"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["llama-3-70b"]
|
||||
vllm_key_config:
|
||||
url: "http://vllm.default.svc.cluster.local:8000"
|
||||
model_name: "meta-llama/Meta-Llama-3-70B-Instruct"
|
||||
- name: "vllm-mistral"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["mistral-7b"]
|
||||
vllm_key_config:
|
||||
url: "http://vllm-mistral.default.svc.cluster.local:8000"
|
||||
model_name: "mistralai/Mistral-7B-Instruct-v0.3"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f vllm-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="SGLang">
|
||||
|
||||
```yaml
|
||||
# sgl-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
sgl:
|
||||
keys:
|
||||
- name: "sgl-main"
|
||||
value: ""
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
sgl_key_config:
|
||||
url: "http://sgl-router.default.svc.cluster.local:30000"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f sgl-values.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="HuggingFace / Replicate">
|
||||
|
||||
These providers use `aliases` to map logical model names to provider-specific IDs.
|
||||
|
||||
```yaml
|
||||
bifrost:
|
||||
providers:
|
||||
huggingface:
|
||||
keys:
|
||||
- name: "hf-main"
|
||||
value: "env.HF_API_KEY"
|
||||
weight: 1
|
||||
models: ["llama-3", "mixtral"]
|
||||
aliases:
|
||||
llama-3: "meta-llama/Meta-Llama-3-8B-Instruct"
|
||||
mixtral: "mistralai/Mixtral-8x7B-Instruct-v0.1"
|
||||
|
||||
replicate:
|
||||
keys:
|
||||
- name: "replicate-main"
|
||||
value: "env.REPLICATE_API_KEY"
|
||||
weight: 1
|
||||
models: ["llama-3"]
|
||||
aliases:
|
||||
llama-3: "meta/meta-llama-3-70b-instruct"
|
||||
replicate_key_config:
|
||||
use_deployments_endpoint: false
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Multi-Provider Example
|
||||
|
||||
Combine providers in a single values file:
|
||||
|
||||
```yaml
|
||||
# multi-provider-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "env.OPENAI_API_KEY"
|
||||
weight: 2
|
||||
models: ["*"]
|
||||
anthropic:
|
||||
keys:
|
||||
- name: "anthropic-primary"
|
||||
value: "env.ANTHROPIC_API_KEY"
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
groq:
|
||||
keys:
|
||||
- name: "groq-primary"
|
||||
value: "env.GROQ_API_KEY"
|
||||
weight: 1
|
||||
models: ["*"]
|
||||
|
||||
providerSecrets:
|
||||
openai-key:
|
||||
existingSecret: "provider-keys"
|
||||
key: "openai"
|
||||
envVar: "OPENAI_API_KEY"
|
||||
anthropic-key:
|
||||
existingSecret: "provider-keys"
|
||||
key: "anthropic"
|
||||
envVar: "ANTHROPIC_API_KEY"
|
||||
groq-key:
|
||||
existingSecret: "provider-keys"
|
||||
key: "groq"
|
||||
envVar: "GROQ_API_KEY"
|
||||
|
||||
plugins:
|
||||
logging:
|
||||
enabled: true
|
||||
governance:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
```bash
|
||||
# Create a single secret with all provider keys
|
||||
kubectl create secret generic provider-keys \
|
||||
--from-literal=openai='sk-your-openai-key' \
|
||||
--from-literal=anthropic='sk-ant-your-anthropic-key' \
|
||||
--from-literal=groq='gsk_your-groq-key'
|
||||
|
||||
helm install bifrost bifrost/bifrost -f multi-provider-values.yaml
|
||||
```
|
||||
550
docs/deployment-guides/helm/storage.mdx
Normal file
550
docs/deployment-guides/helm/storage.mdx
Normal file
@@ -0,0 +1,550 @@
|
||||
---
|
||||
title: "Storage"
|
||||
description: "Configure Bifrost storage backends in Helm — SQLite, PostgreSQL (embedded and external), per-store overrides, and S3/GCS object storage for logs"
|
||||
icon: "database"
|
||||
---
|
||||
|
||||
Bifrost persists two types of data — **config** (providers, virtual keys, governance rules) and **logs** (request/response records). Each has its own store, both defaulting to the top-level `storage.mode`.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `storage.mode` | Default backend for both stores (`sqlite` or `postgres`) | `sqlite` |
|
||||
| `storage.configStore.type` | Override backend for the config store | `""` (inherits `storage.mode`) |
|
||||
| `storage.logsStore.type` | Override backend for the logs store | `""` (inherits `storage.mode`) |
|
||||
|
||||
<Note>
|
||||
When any store uses SQLite the chart deploys a **StatefulSet** with a PVC. With PostgreSQL only (no SQLite) it deploys a **Deployment**. Mixing backends (e.g. config=postgres, logs=sqlite) still requires a StatefulSet.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
<Tabs>
|
||||
|
||||
<Tab title="SQLite">
|
||||
|
||||
### SQLite (Default)
|
||||
|
||||
Simplest setup — no external database required. Bifrost runs as a StatefulSet with a persistent volume for the SQLite files.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `storage.persistence.enabled` | Create a PVC for SQLite data | `true` |
|
||||
| `storage.persistence.size` | PVC size | `10Gi` |
|
||||
| `storage.persistence.accessMode` | PVC access mode | `ReadWriteOnce` |
|
||||
| `storage.persistence.storageClass` | Storage class (leave empty for cluster default) | `""` |
|
||||
| `storage.persistence.existingClaim` | Reuse an existing PVC | `""` |
|
||||
|
||||
```yaml
|
||||
# sqlite-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
storage:
|
||||
mode: sqlite
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 20Gi
|
||||
# storageClass: "gp3" # uncomment to pin storage class
|
||||
|
||||
bifrost:
|
||||
encryptionKey: "your-32-byte-encryption-key-here"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f sqlite-values.yaml
|
||||
```
|
||||
|
||||
**Reuse an existing PVC** (e.g. after a StatefulSet migration):
|
||||
|
||||
```yaml
|
||||
storage:
|
||||
persistence:
|
||||
existingClaim: "bifrost-data"
|
||||
```
|
||||
|
||||
<Warning>
|
||||
Upgrading from SQLite to PostgreSQL requires a data migration — the two stores are not compatible. Plan accordingly before switching `storage.mode` on a running deployment.
|
||||
</Warning>
|
||||
|
||||
#### StatefulSet Migration (chart v2.0.0+)
|
||||
|
||||
Prior to v2.0.0, SQLite used a Deployment + manual PVC. v2.0.0 moved SQLite to a StatefulSet. If upgrading from an older chart:
|
||||
|
||||
```bash
|
||||
# 1. Scale down the old deployment
|
||||
kubectl scale deployment bifrost --replicas=0
|
||||
|
||||
# 2. Note the existing PVC name
|
||||
kubectl get pvc
|
||||
|
||||
# 3. Upgrade the chart, pointing at the existing claim
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set storage.persistence.existingClaim=<your-old-pvc-name> \
|
||||
--set image.tag=v1.4.11
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Embedded PostgreSQL">
|
||||
|
||||
### Embedded PostgreSQL
|
||||
|
||||
The chart can deploy a PostgreSQL instance alongside Bifrost. Good for simple production setups where you don't have an existing database.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `storage.mode` | Set to `postgres` | `sqlite` |
|
||||
| `postgresql.enabled` | Deploy PostgreSQL as a sub-deployment | `false` |
|
||||
| `postgresql.auth.username` | Database user | `bifrost` |
|
||||
| `postgresql.auth.password` | Database password | `bifrost_password` |
|
||||
| `postgresql.auth.database` | Database name | `bifrost` |
|
||||
| `postgresql.primary.persistence.size` | PVC size for PostgreSQL data | `8Gi` |
|
||||
|
||||
<Note>
|
||||
Ensure the database is created with **UTF8 encoding**. The embedded PostgreSQL deployment handles this automatically. See [PostgreSQL UTF8 Requirement](/quickstart/gateway/setting-up#postgresql-utf8-requirement) for manual setups.
|
||||
</Note>
|
||||
|
||||
```bash
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password='your-secure-postgres-password'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# embedded-postgres-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
auth:
|
||||
username: bifrost
|
||||
password: "your-secure-postgres-password" # use existingSecret in production
|
||||
database: bifrost
|
||||
primary:
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 50Gi
|
||||
resources:
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 2000m
|
||||
memory: 4Gi
|
||||
|
||||
bifrost:
|
||||
encryptionKey: "your-32-byte-encryption-key-here"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f embedded-postgres-values.yaml
|
||||
```
|
||||
|
||||
**Verify the connection from Bifrost:**
|
||||
|
||||
```bash
|
||||
kubectl exec -it deployment/bifrost -- nc -zv bifrost-postgresql 5432
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="External PostgreSQL">
|
||||
|
||||
### External PostgreSQL
|
||||
|
||||
Point Bifrost at an existing PostgreSQL instance — RDS, Cloud SQL, Azure Database, or self-managed.
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `postgresql.enabled` | Must be `false` | `false` |
|
||||
| `postgresql.external.enabled` | Enable external connection | `false` |
|
||||
| `postgresql.external.host` | Hostname or IP | `""` |
|
||||
| `postgresql.external.port` | Port | `5432` |
|
||||
| `postgresql.external.user` | Username | `bifrost` |
|
||||
| `postgresql.external.database` | Database name | `bifrost` |
|
||||
| `postgresql.external.sslMode` | SSL mode (`disable`, `require`, `verify-ca`, `verify-full`) | `disable` |
|
||||
| `postgresql.external.existingSecret` | Secret name for the password | `""` |
|
||||
| `postgresql.external.passwordKey` | Key within the secret | `"password"` |
|
||||
|
||||
```bash
|
||||
kubectl create secret generic external-postgres-credentials \
|
||||
--from-literal=password='your-external-postgres-password'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# external-postgres-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: false
|
||||
external:
|
||||
enabled: true
|
||||
host: "your-rds-endpoint.us-east-1.rds.amazonaws.com"
|
||||
port: 5432
|
||||
user: bifrost
|
||||
database: bifrost
|
||||
sslMode: require
|
||||
existingSecret: "external-postgres-credentials"
|
||||
passwordKey: "password"
|
||||
|
||||
bifrost:
|
||||
encryptionKey: "your-32-byte-encryption-key-here"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f external-postgres-values.yaml
|
||||
```
|
||||
|
||||
**Test connectivity before installing:**
|
||||
|
||||
```bash
|
||||
kubectl run pg-test --image=postgres:16-alpine --rm -it --restart=Never -- \
|
||||
psql "host=your-rds-endpoint.us-east-1.rds.amazonaws.com dbname=bifrost user=bifrost sslmode=require" \
|
||||
-c "SELECT version();"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
<Tab title="Mixed (Config=Postgres, Logs=SQLite)">
|
||||
|
||||
### Mixed Backend
|
||||
|
||||
Run the config store on PostgreSQL (fast lookups, shared across replicas) while keeping logs on SQLite (simpler, cheaper for append-heavy workloads).
|
||||
|
||||
```yaml
|
||||
# mixed-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
storage:
|
||||
mode: sqlite # default fallback
|
||||
configStore:
|
||||
type: postgres # override: config uses postgres
|
||||
logsStore:
|
||||
type: sqlite # explicit: logs use sqlite
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 20Gi # for the SQLite logs store
|
||||
|
||||
postgresql:
|
||||
external:
|
||||
enabled: true
|
||||
host: "your-postgres-host.example.com"
|
||||
port: 5432
|
||||
user: bifrost
|
||||
database: bifrost
|
||||
sslMode: require
|
||||
existingSecret: "postgres-credentials"
|
||||
passwordKey: "password"
|
||||
|
||||
bifrost:
|
||||
encryptionKey: "your-32-byte-encryption-key-here"
|
||||
```
|
||||
|
||||
```bash
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password='your-postgres-password'
|
||||
|
||||
helm install bifrost bifrost/bifrost -f mixed-values.yaml
|
||||
```
|
||||
|
||||
<Note>
|
||||
In mixed mode, Bifrost deploys a StatefulSet (because SQLite is in use) with both a PostgreSQL connection and a local PVC for the SQLite log store.
|
||||
</Note>
|
||||
|
||||
**PostgreSQL connection pool tuning** (high log volume):
|
||||
|
||||
```yaml
|
||||
storage:
|
||||
configStore:
|
||||
type: postgres
|
||||
maxIdleConns: 5
|
||||
maxOpenConns: 50
|
||||
logsStore:
|
||||
type: postgres
|
||||
maxIdleConns: 10
|
||||
maxOpenConns: 100
|
||||
```
|
||||
|
||||
</Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Object Storage for Logs
|
||||
|
||||
Offload large request/response payloads from the database to S3 or GCS. The DB retains only lightweight index records; payloads are fetched on demand.
|
||||
|
||||
<Tabs>
|
||||
<Tab title="AWS S3">
|
||||
|
||||
```bash
|
||||
kubectl create secret generic s3-credentials \
|
||||
--from-literal=access-key-id='AKIAIOSFODNN7EXAMPLE' \
|
||||
--from-literal=secret-access-key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
|
||||
```
|
||||
|
||||
```yaml
|
||||
storage:
|
||||
logsStore:
|
||||
objectStorage:
|
||||
enabled: true
|
||||
type: s3
|
||||
bucket: "bifrost-logs"
|
||||
prefix: "bifrost"
|
||||
compress: true # gzip compression
|
||||
|
||||
# S3 configuration
|
||||
region: us-east-1
|
||||
accessKeyId: "env.S3_ACCESS_KEY_ID"
|
||||
secretAccessKey: "env.S3_SECRET_ACCESS_KEY"
|
||||
# endpoint: "" # Custom endpoint for MinIO / Cloudflare R2
|
||||
# forcePathStyle: false # Set true for MinIO
|
||||
|
||||
bifrost:
|
||||
# inject S3 credentials as env vars
|
||||
providerSecrets:
|
||||
s3-access-key:
|
||||
existingSecret: "s3-credentials"
|
||||
key: "access-key-id"
|
||||
envVar: "S3_ACCESS_KEY_ID"
|
||||
s3-secret-key:
|
||||
existingSecret: "s3-credentials"
|
||||
key: "secret-access-key"
|
||||
envVar: "S3_SECRET_ACCESS_KEY"
|
||||
```
|
||||
|
||||
**Using IAM role (IRSA / instance profile) instead of static keys:**
|
||||
|
||||
```yaml
|
||||
storage:
|
||||
logsStore:
|
||||
objectStorage:
|
||||
enabled: true
|
||||
type: s3
|
||||
bucket: "bifrost-logs"
|
||||
region: us-east-1
|
||||
# No accessKeyId / secretAccessKey — uses SDK default chain
|
||||
roleArn: "arn:aws:iam::123456789012:role/BifrostS3Role"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Google Cloud Storage">
|
||||
|
||||
```bash
|
||||
kubectl create secret generic gcs-credentials \
|
||||
--from-literal=service-account-json="$(cat service-account-key.json)"
|
||||
```
|
||||
|
||||
```yaml
|
||||
storage:
|
||||
logsStore:
|
||||
objectStorage:
|
||||
enabled: true
|
||||
type: gcs
|
||||
bucket: "bifrost-logs"
|
||||
prefix: "bifrost"
|
||||
compress: true
|
||||
|
||||
# GCS configuration
|
||||
projectId: "my-gcp-project"
|
||||
credentialsJson: "env.GCS_CREDENTIALS_JSON" # omit for Workload Identity
|
||||
|
||||
bifrost:
|
||||
providerSecrets:
|
||||
gcs-creds:
|
||||
existingSecret: "gcs-credentials"
|
||||
key: "service-account-json"
|
||||
envVar: "GCS_CREDENTIALS_JSON"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="MinIO (Self-Hosted)">
|
||||
|
||||
```yaml
|
||||
storage:
|
||||
logsStore:
|
||||
objectStorage:
|
||||
enabled: true
|
||||
type: s3
|
||||
bucket: "bifrost-logs"
|
||||
prefix: "bifrost"
|
||||
compress: false
|
||||
|
||||
region: us-east-1 # can be any value for MinIO
|
||||
endpoint: "http://minio.minio-ns.svc.cluster.local:9000"
|
||||
accessKeyId: "env.MINIO_ACCESS_KEY"
|
||||
secretAccessKey: "env.MINIO_SECRET_KEY"
|
||||
forcePathStyle: true # required for MinIO
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
-f object-storage-values.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Vector Store
|
||||
|
||||
A vector store is required for [semantic caching](/deployment-guides/helm/plugins). Choose from Weaviate, Redis, or Qdrant (embedded or external), or Pinecone (external only).
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Weaviate">
|
||||
|
||||
```yaml
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: weaviate
|
||||
weaviate:
|
||||
enabled: true # deploy embedded Weaviate
|
||||
replicas: 1
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 20Gi
|
||||
resources:
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 2000m
|
||||
memory: 4Gi
|
||||
```
|
||||
|
||||
**External Weaviate:**
|
||||
|
||||
```yaml
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: weaviate
|
||||
weaviate:
|
||||
enabled: false
|
||||
external:
|
||||
enabled: true
|
||||
scheme: https
|
||||
host: "weaviate.example.com"
|
||||
apiKey: "env.WEAVIATE_API_KEY"
|
||||
grpcHost: "weaviate-grpc.example.com"
|
||||
grpcSecured: true
|
||||
existingSecret: "weaviate-credentials"
|
||||
apiKeyKey: "api-key"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Redis / Valkey">
|
||||
|
||||
```yaml
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: redis
|
||||
redis:
|
||||
enabled: true # deploy embedded Redis
|
||||
auth:
|
||||
enabled: true
|
||||
password: "redis_password"
|
||||
master:
|
||||
persistence:
|
||||
size: 8Gi
|
||||
```
|
||||
|
||||
**External Redis / AWS MemoryDB:**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic redis-credentials \
|
||||
--from-literal=password='your-redis-password'
|
||||
```
|
||||
|
||||
```yaml
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: redis
|
||||
redis:
|
||||
enabled: false
|
||||
external:
|
||||
enabled: true
|
||||
host: "your-redis.cache.amazonaws.com"
|
||||
port: 6379
|
||||
useTls: true
|
||||
clusterMode: true # required for AWS MemoryDB
|
||||
existingSecret: "redis-credentials"
|
||||
passwordKey: "password"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Qdrant">
|
||||
|
||||
```yaml
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: qdrant
|
||||
qdrant:
|
||||
enabled: true # deploy embedded Qdrant
|
||||
persistence:
|
||||
size: 10Gi
|
||||
```
|
||||
|
||||
**External Qdrant:**
|
||||
|
||||
```bash
|
||||
kubectl create secret generic qdrant-credentials \
|
||||
--from-literal=api-key='your-qdrant-api-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: qdrant
|
||||
qdrant:
|
||||
enabled: false
|
||||
external:
|
||||
enabled: true
|
||||
host: "qdrant.example.com"
|
||||
port: 6334
|
||||
useTls: true
|
||||
existingSecret: "qdrant-credentials"
|
||||
apiKeyKey: "api-key"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Pinecone">
|
||||
|
||||
Pinecone is external-only.
|
||||
|
||||
```bash
|
||||
kubectl create secret generic pinecone-credentials \
|
||||
--from-literal=api-key='your-pinecone-api-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: pinecone
|
||||
pinecone:
|
||||
external:
|
||||
enabled: true
|
||||
indexHost: "your-index.svc.us-east1-gcp.pinecone.io"
|
||||
existingSecret: "pinecone-credentials"
|
||||
apiKeyKey: "api-key"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
-f storage-values.yaml
|
||||
```
|
||||
401
docs/deployment-guides/helm/troubleshooting.mdx
Normal file
401
docs/deployment-guides/helm/troubleshooting.mdx
Normal file
@@ -0,0 +1,401 @@
|
||||
---
|
||||
title: "Troubleshooting"
|
||||
description: "Diagnose and fix common issues with Bifrost Helm deployments — pods, database, ingress, secrets, PVCs, and performance"
|
||||
icon: "wrench"
|
||||
---
|
||||
|
||||
This page covers the most common problems encountered when deploying Bifrost with Helm, along with diagnostic commands and fixes.
|
||||
|
||||
---
|
||||
|
||||
## Pod Not Starting
|
||||
|
||||
### Quick diagnostics
|
||||
|
||||
```bash
|
||||
# Show pod status
|
||||
kubectl get pods -l app.kubernetes.io/name=bifrost
|
||||
|
||||
# Show pod events (most useful first step)
|
||||
kubectl describe pod -l app.kubernetes.io/name=bifrost
|
||||
|
||||
# Show pod logs (use --previous if the pod has already crashed)
|
||||
kubectl logs -l app.kubernetes.io/name=bifrost
|
||||
kubectl logs -l app.kubernetes.io/name=bifrost --previous
|
||||
```
|
||||
|
||||
### Image pull errors (`ErrImagePull` / `ImagePullBackOff`)
|
||||
|
||||
```bash
|
||||
# Check which image is being pulled
|
||||
kubectl describe pod -l app.kubernetes.io/name=bifrost | grep "Image:"
|
||||
|
||||
# Verify imagePullSecrets are attached
|
||||
kubectl get pod -l app.kubernetes.io/name=bifrost -o jsonpath='{.items[0].spec.imagePullSecrets}'
|
||||
|
||||
# Test secret manually
|
||||
kubectl get secret <pull-secret-name> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq .
|
||||
```
|
||||
|
||||
Common causes:
|
||||
- `image.tag` not set — the chart requires it; the pod will not start without it
|
||||
- Pull secret missing or expired (ECR tokens expire after 12 hours)
|
||||
- Incorrect `image.repository` for enterprise registry
|
||||
|
||||
```bash
|
||||
# Fix: set the correct tag
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values --set image.tag=v1.4.11
|
||||
```
|
||||
|
||||
### PVC not binding (`Pending`)
|
||||
|
||||
```bash
|
||||
# Check PVC status
|
||||
kubectl get pvc -l app.kubernetes.io/instance=bifrost
|
||||
|
||||
# Show binding events
|
||||
kubectl describe pvc -l app.kubernetes.io/instance=bifrost
|
||||
```
|
||||
|
||||
Common causes:
|
||||
- No Persistent Volume provisioner in the cluster
|
||||
- `storageClass` set to a class that doesn't exist
|
||||
- `ReadWriteOnce` access mode with multiple replicas (SQLite PVCs are single-node)
|
||||
|
||||
```bash
|
||||
# List available storage classes
|
||||
kubectl get storageclass
|
||||
|
||||
# Fix: pin to a valid storage class
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set storage.persistence.storageClass=standard
|
||||
```
|
||||
|
||||
### ConfigMap / Secret errors
|
||||
|
||||
```bash
|
||||
# View the generated ConfigMap (contains rendered config.json)
|
||||
kubectl get configmap bifrost-config -o yaml
|
||||
|
||||
# View secrets the pod depends on
|
||||
kubectl get secret -l app.kubernetes.io/instance=bifrost
|
||||
|
||||
# Decode a specific secret value
|
||||
kubectl get secret bifrost-encryption -o jsonpath='{.data.key}' | base64 -d
|
||||
```
|
||||
|
||||
### CrashLoopBackOff
|
||||
|
||||
```bash
|
||||
# Get last log lines before the crash
|
||||
kubectl logs -l app.kubernetes.io/name=bifrost --previous --tail=50
|
||||
|
||||
# Common causes shown in logs:
|
||||
# "encryption key is not initialized" → no key provided; optional, but data will be stored in plaintext
|
||||
# "failed to connect to database" → see Database section below
|
||||
# "image.tag is required" → set image.tag in values
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Connection Issues
|
||||
|
||||
### Embedded PostgreSQL
|
||||
|
||||
```bash
|
||||
# Check if the PostgreSQL pod is running
|
||||
kubectl get pods -l app.kubernetes.io/name=bifrost-postgresql
|
||||
|
||||
# Connect directly to inspect the database
|
||||
kubectl exec -it deployment/bifrost-postgresql -- psql -U bifrost -d bifrost
|
||||
|
||||
# Test connectivity from the Bifrost pod
|
||||
kubectl exec -it deployment/bifrost -- nc -zv bifrost-postgresql 5432
|
||||
|
||||
# Check PostgreSQL logs
|
||||
kubectl logs deployment/bifrost-postgresql --tail=50
|
||||
```
|
||||
|
||||
### External PostgreSQL
|
||||
|
||||
```bash
|
||||
# Test connectivity from within the cluster
|
||||
kubectl run pg-test --image=postgres:16-alpine --rm -it --restart=Never -- \
|
||||
psql "host=your-db-host dbname=bifrost user=bifrost sslmode=require"
|
||||
|
||||
# Verify the secret value is correct
|
||||
kubectl get secret postgres-credentials -o jsonpath='{.data.password}' | base64 -d
|
||||
|
||||
# Check that the external host/port is reachable
|
||||
kubectl exec -it deployment/bifrost -- nc -zv your-db-host 5432
|
||||
```
|
||||
|
||||
Common causes:
|
||||
- `sslMode: disable` when the database requires SSL — set `sslMode: require`
|
||||
- Password in secret doesn't match the database user
|
||||
- Network policy blocking pod → database traffic
|
||||
- Database not UTF8 encoded (see [PostgreSQL UTF8 Requirement](/quickstart/gateway/setting-up#postgresql-utf8-requirement))
|
||||
|
||||
```bash
|
||||
# Fix: update the secret and restart
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password='correct-password' \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
kubectl rollout restart deployment/bifrost
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Ingress Not Working
|
||||
|
||||
```bash
|
||||
# Check ingress resource status
|
||||
kubectl describe ingress bifrost
|
||||
|
||||
# Check if the ingress controller is running
|
||||
kubectl get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx
|
||||
|
||||
# View ingress controller logs for routing errors
|
||||
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=50
|
||||
|
||||
# Verify DNS resolves to the correct load balancer IP
|
||||
nslookup bifrost.yourdomain.com
|
||||
kubectl get ingress bifrost -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
|
||||
|
||||
# Test without TLS first
|
||||
curl -v http://bifrost.yourdomain.com/health
|
||||
```
|
||||
|
||||
Common causes:
|
||||
- `ingress.className` not set or set to a class not installed in the cluster
|
||||
- TLS certificate not issued yet (cert-manager can take up to 60 seconds)
|
||||
- Service port mismatch — Bifrost listens on `8080` by default
|
||||
|
||||
```bash
|
||||
# Check cert-manager certificate status
|
||||
kubectl get certificate -l app.kubernetes.io/instance=bifrost
|
||||
kubectl describe certificate bifrost-tls
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Secret and Credential Issues
|
||||
|
||||
### Provider API key not resolving
|
||||
|
||||
If Bifrost logs show `env.OPENAI_API_KEY: not set` or similar:
|
||||
|
||||
```bash
|
||||
# Check the env var is present in the running pod
|
||||
kubectl exec -it deployment/bifrost -- env | grep OPENAI
|
||||
|
||||
# Verify the providerSecrets secret exists with the right key
|
||||
kubectl get secret provider-api-keys -o yaml
|
||||
|
||||
# Check the providerSecrets configuration rendered correctly
|
||||
kubectl get configmap bifrost-config -o yaml | grep -A5 providers
|
||||
```
|
||||
|
||||
### Encryption key issues
|
||||
|
||||
```bash
|
||||
# Verify the secret exists and contains the right key name
|
||||
kubectl get secret bifrost-encryption -o yaml
|
||||
|
||||
# Check the exact key name matches encryptionKeySecret.key in values
|
||||
# Default key name is "encryption-key" — if you used "key", set:
|
||||
# bifrost.encryptionKeySecret.key: "key"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## High Memory Usage
|
||||
|
||||
```bash
|
||||
# Check current resource usage
|
||||
kubectl top pods -l app.kubernetes.io/name=bifrost
|
||||
|
||||
# Check if OOM kills are happening
|
||||
kubectl describe pod -l app.kubernetes.io/name=bifrost | grep -A3 "OOMKilled\|Limits"
|
||||
|
||||
# View resource requests/limits on running pods
|
||||
kubectl get pod -l app.kubernetes.io/name=bifrost \
|
||||
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'
|
||||
```
|
||||
|
||||
**Increase resource limits:**
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set resources.limits.memory=4Gi \
|
||||
--set resources.requests.memory=1Gi
|
||||
```
|
||||
|
||||
**Tune Go runtime** (see [Docker Tuning](/deployment-guides/docker-tuning)):
|
||||
|
||||
```yaml
|
||||
env:
|
||||
- name: GOGC
|
||||
value: "200" # run GC less often
|
||||
- name: GOMEMLIMIT
|
||||
value: "3500MiB" # hard memory ceiling slightly below the container limit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## High CPU Usage / Latency
|
||||
|
||||
```bash
|
||||
# Check CPU usage
|
||||
kubectl top pods -l app.kubernetes.io/name=bifrost
|
||||
|
||||
# Check if HPA is scaling correctly
|
||||
kubectl get hpa bifrost
|
||||
kubectl describe hpa bifrost
|
||||
```
|
||||
|
||||
Common causes:
|
||||
- `initialPoolSize` too small — goroutines queuing up; increase to `500`–`1000`
|
||||
- `dropExcessRequests: false` with a small pool — queue depth growing unboundedly
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set bifrost.client.initialPoolSize=1000 \
|
||||
--set bifrost.client.dropExcessRequests=true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Autoscaling Issues
|
||||
|
||||
### HPA not scaling
|
||||
|
||||
```bash
|
||||
# Check HPA status and current metrics
|
||||
kubectl describe hpa bifrost
|
||||
|
||||
# Verify metrics server is installed
|
||||
kubectl top nodes
|
||||
kubectl top pods
|
||||
|
||||
# Common fix: metrics server not installed
|
||||
# Install with:
|
||||
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
|
||||
```
|
||||
|
||||
### Pods scaling down too aggressively (drops active SSE streams)
|
||||
|
||||
The default `scaleDown.stabilizationWindowSeconds: 300` and `preStop` sleep of 15 seconds should prevent this. If streams are still being cut:
|
||||
|
||||
```yaml
|
||||
terminationGracePeriodSeconds: 120 # increase if streams run longer than 105s
|
||||
|
||||
autoscaling:
|
||||
behavior:
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 600 # wait 10 min before scaling down
|
||||
policies:
|
||||
- type: Pods
|
||||
value: 1
|
||||
periodSeconds: 300 # remove at most 1 pod per 5 min
|
||||
|
||||
lifecycle:
|
||||
preStop:
|
||||
exec:
|
||||
command: ["sh", "-c", "sleep 30"] # give load balancer more time to drain
|
||||
```
|
||||
|
||||
```bash
|
||||
helm upgrade bifrost bifrost/bifrost --reuse-values -f graceful-shutdown-values.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## SQLite / PVC Issues
|
||||
|
||||
### StatefulSet migration (upgrading from chart < v2.0.0)
|
||||
|
||||
Older chart versions used a Deployment + manual PVC. v2.0.0 moved SQLite to a StatefulSet. If upgrading:
|
||||
|
||||
```bash
|
||||
# 1. Scale down the old deployment
|
||||
kubectl scale deployment bifrost --replicas=0
|
||||
|
||||
# 2. Note the existing PVC name
|
||||
kubectl get pvc
|
||||
|
||||
# 3. Upgrade, pointing at the existing claim
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set storage.persistence.existingClaim=<your-old-pvc-name> \
|
||||
--set image.tag=v1.4.11
|
||||
```
|
||||
|
||||
### Data lost after upgrade
|
||||
|
||||
```bash
|
||||
# Check if PVCs still exist (they persist after helm uninstall)
|
||||
kubectl get pvc -l app.kubernetes.io/instance=bifrost
|
||||
|
||||
# Re-attach by setting existingClaim
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set storage.persistence.existingClaim=<pvc-name>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cluster Mode Issues
|
||||
|
||||
### Peers not discovering each other
|
||||
|
||||
```bash
|
||||
# Check gossip port is reachable between pods
|
||||
kubectl exec -it bifrost-0 -- nc -zv bifrost-1.bifrost-headless 7946
|
||||
|
||||
# View gossip-related log lines
|
||||
kubectl logs -l app.kubernetes.io/name=bifrost --tail=100 | grep -i gossip
|
||||
|
||||
# Check the headless service exists
|
||||
kubectl get svc bifrost-headless
|
||||
```
|
||||
|
||||
For Kubernetes-based discovery, verify the service account has pod list permissions:
|
||||
|
||||
```bash
|
||||
kubectl auth can-i list pods --as=system:serviceaccount:default:bifrost
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Useful Diagnostic Commands
|
||||
|
||||
```bash
|
||||
# Full state dump for a support ticket
|
||||
kubectl get all -l app.kubernetes.io/instance=bifrost
|
||||
kubectl describe pod -l app.kubernetes.io/name=bifrost > pod-describe.txt
|
||||
kubectl logs -l app.kubernetes.io/name=bifrost --tail=200 > pod-logs.txt
|
||||
|
||||
# View the full rendered config.json
|
||||
kubectl get configmap bifrost-config -o jsonpath='{.data.config\.json}' | jq .
|
||||
|
||||
# Check current Helm values (shows all overrides)
|
||||
helm get values bifrost
|
||||
|
||||
# Check Helm release status
|
||||
helm status bifrost
|
||||
|
||||
# View Helm release history
|
||||
helm history bifrost
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Still Stuck?
|
||||
|
||||
- [GitHub Issues](https://github.com/maximhq/bifrost/issues) — search existing issues or open a new one
|
||||
- [Enterprise Support](mailto:support@getmaxim.ai) — for enterprise customers with SLA
|
||||
718
docs/deployment-guides/helm/values.mdx
Normal file
718
docs/deployment-guides/helm/values.mdx
Normal file
@@ -0,0 +1,718 @@
|
||||
---
|
||||
title: "Values Reference"
|
||||
description: "Complete reference for Bifrost Helm chart values — key parameters, how to supply them, and links to example files"
|
||||
icon: "sliders"
|
||||
---
|
||||
|
||||
This page covers every top-level parameter group in the Bifrost Helm chart's `values.yaml`, how to supply values via `--set` vs `-f`, and where to find ready-made example files.
|
||||
|
||||
<Note>
|
||||
The full values schema is available at [https://getbifrost.ai/schema](https://getbifrost.ai/schema). All `values.yaml` fields map directly to `config.json` fields generated by the chart.
|
||||
</Note>
|
||||
|
||||
## Supplying Values
|
||||
|
||||
### One-liner with `--set`
|
||||
|
||||
Good for a single field or quick experiments:
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
--set replicaCount=3 \
|
||||
--set bifrost.client.initialPoolSize=500
|
||||
```
|
||||
|
||||
### Values file with `-f`
|
||||
|
||||
Recommended for anything beyond a couple of fields:
|
||||
|
||||
```bash
|
||||
# Create your values file
|
||||
cat > my-values.yaml <<'EOF'
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
replicaCount: 2
|
||||
|
||||
bifrost:
|
||||
encryptionKey: "your-32-byte-encryption-key-here"
|
||||
client:
|
||||
initialPoolSize: 500
|
||||
enableLogging: true
|
||||
EOF
|
||||
|
||||
# Install
|
||||
helm install bifrost bifrost/bifrost -f my-values.yaml
|
||||
|
||||
# Upgrade later
|
||||
helm upgrade bifrost bifrost/bifrost -f my-values.yaml
|
||||
|
||||
# Upgrade and reuse all previously set values, overriding only one field
|
||||
helm upgrade bifrost bifrost/bifrost \
|
||||
--reuse-values \
|
||||
--set replicaCount=5
|
||||
```
|
||||
|
||||
### Multiple values files
|
||||
|
||||
Later files override earlier ones — useful for a base + environment-specific overlay:
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
-f base-values.yaml \
|
||||
-f production-overrides.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Parameters Reference
|
||||
|
||||
### Image
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `image.repository` | Container image repository | `docker.io/maximhq/bifrost` |
|
||||
| `image.tag` | **Required.** Image version (e.g. `v1.4.11`) | `""` |
|
||||
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
|
||||
| `imagePullSecrets` | List of pull secret names for private registries | `[]` |
|
||||
|
||||
```bash
|
||||
# Always specify the tag — the chart will not start without it
|
||||
helm install bifrost bifrost/bifrost --set image.tag=v1.4.11
|
||||
```
|
||||
|
||||
### Replicas & Autoscaling
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `replicaCount` | Static replica count (ignored when HPA is enabled) | `1` |
|
||||
| `autoscaling.enabled` | Enable Horizontal Pod Autoscaler | `false` |
|
||||
| `autoscaling.minReplicas` | Minimum replicas | `1` |
|
||||
| `autoscaling.maxReplicas` | Maximum replicas | `10` |
|
||||
| `autoscaling.targetCPUUtilizationPercentage` | CPU target for scaling | `80` |
|
||||
| `autoscaling.targetMemoryUtilizationPercentage` | Memory target for scaling | `80` |
|
||||
| `autoscaling.behavior.scaleDown.stabilizationWindowSeconds` | Cooldown before scale-down (important for SSE streams) | `300` |
|
||||
| `autoscaling.behavior.scaleDown.policies[0].value` | Max pods removed per period | `1` |
|
||||
|
||||
### Resources
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `resources.requests.cpu` | CPU request | `500m` |
|
||||
| `resources.requests.memory` | Memory request | `512Mi` |
|
||||
| `resources.limits.cpu` | CPU limit | `2000m` |
|
||||
| `resources.limits.memory` | Memory limit | `2Gi` |
|
||||
|
||||
### Service
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `service.type` | `ClusterIP`, `LoadBalancer`, or `NodePort` | `ClusterIP` |
|
||||
| `service.port` | Service port | `8080` |
|
||||
|
||||
### Ingress
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `ingress.enabled` | Enable ingress | `false` |
|
||||
| `ingress.className` | Ingress class (e.g. `nginx`, `traefik`) | `""` |
|
||||
| `ingress.annotations` | Ingress annotations | `{}` |
|
||||
| `ingress.hosts` | Host rules | see values.yaml |
|
||||
| `ingress.tls` | TLS configuration | `[]` |
|
||||
|
||||
```yaml
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
|
||||
hosts:
|
||||
- host: bifrost.yourdomain.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: bifrost-tls
|
||||
hosts:
|
||||
- bifrost.yourdomain.com
|
||||
```
|
||||
|
||||
### Probes
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `livenessProbe.initialDelaySeconds` | Seconds before first liveness check | `30` |
|
||||
| `livenessProbe.periodSeconds` | Liveness check interval | `30` |
|
||||
| `readinessProbe.initialDelaySeconds` | Seconds before first readiness check | `10` |
|
||||
| `readinessProbe.periodSeconds` | Readiness check interval | `10` |
|
||||
|
||||
Both probes hit `GET /health`.
|
||||
|
||||
### Graceful Shutdown
|
||||
|
||||
Bifrost supports long-lived SSE streaming connections. The default `preStop` hook and termination grace period let in-flight streams finish before the pod is killed:
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `terminationGracePeriodSeconds` | Total grace period | `60` |
|
||||
| `lifecycle.preStop.exec.command` | Sleep before SIGTERM so load balancer drains | `["sh", "-c", "sleep 15"]` |
|
||||
|
||||
Increase `terminationGracePeriodSeconds` if your typical stream responses take longer than 45 seconds.
|
||||
|
||||
### Service Account
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `serviceAccount.create` | Create a dedicated service account | `true` |
|
||||
| `serviceAccount.annotations` | Annotations (e.g. for IRSA, Workload Identity) | `{}` |
|
||||
| `serviceAccount.name` | Override the generated name | `""` |
|
||||
|
||||
### Pod Scheduling
|
||||
|
||||
```yaml
|
||||
# Spread replicas across nodes
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: bifrost
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
# Pin to specific node pool
|
||||
nodeSelector:
|
||||
node-type: ai-workload
|
||||
|
||||
# Tolerate GPU taints
|
||||
tolerations:
|
||||
- key: "gpu"
|
||||
operator: "Equal"
|
||||
value: "true"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
### Extra Environment Variables
|
||||
|
||||
Three ways to inject env vars:
|
||||
|
||||
```yaml
|
||||
# Inline key/value pairs
|
||||
env:
|
||||
- name: HTTP_PROXY
|
||||
value: "http://proxy.corp.example.com:3128"
|
||||
|
||||
# Map syntax (appended after env)
|
||||
extraEnv:
|
||||
NO_PROXY: "169.254.169.254,10.0.0.0/8"
|
||||
|
||||
# Bulk-load from existing Secrets or ConfigMaps
|
||||
envFrom:
|
||||
- secretRef:
|
||||
name: my-corp-secrets
|
||||
- configMapRef:
|
||||
name: my-app-config
|
||||
```
|
||||
|
||||
### Init Containers
|
||||
|
||||
```yaml
|
||||
initContainers:
|
||||
- name: wait-for-db
|
||||
image: busybox:1.35
|
||||
command: ["sh", "-c", "until nc -z postgres-svc 5432; do sleep 2; done"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Values Examples
|
||||
|
||||
The chart ships ready-made example files under [`helm-charts/bifrost/values-examples/`](https://github.com/maximhq/bifrost/tree/main/helm-charts/bifrost/values-examples):
|
||||
|
||||
| File | Use case |
|
||||
|------|----------|
|
||||
| `sqlite-only.yaml` | Minimal local/dev setup |
|
||||
| `postgres-only.yaml` | Single-store Postgres |
|
||||
| `production-ha.yaml` | HA: 3 replicas, Postgres, Weaviate, HPA, Ingress |
|
||||
| `providers-and-virtual-keys.yaml` | All 23 providers + 7 virtual key patterns |
|
||||
| `secrets-from-k8s.yaml` | All sensitive values from Kubernetes Secrets |
|
||||
| `external-postgres.yaml` | Point at an existing Postgres instance |
|
||||
| `postgres-redis.yaml` | Postgres + Redis vector store |
|
||||
| `postgres-weaviate.yaml` | Postgres + Weaviate vector store |
|
||||
| `postgres-qdrant.yaml` | Postgres + Qdrant vector store |
|
||||
| `semantic-cache-secret-example.yaml` | Semantic cache with secret injection |
|
||||
| `mixed-backend.yaml` | Config store = postgres, logs store = sqlite |
|
||||
|
||||
Install from an example file directly:
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
-f https://raw.githubusercontent.com/maximhq/bifrost/main/helm-charts/bifrost/values-examples/production-ha.yaml \
|
||||
--set image.tag=v1.4.11
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Helm Operations
|
||||
|
||||
### View current values
|
||||
|
||||
```bash
|
||||
helm get values bifrost
|
||||
```
|
||||
|
||||
### Diff before upgrading (requires helm-diff plugin)
|
||||
|
||||
```bash
|
||||
helm diff upgrade bifrost bifrost/bifrost -f my-values.yaml
|
||||
```
|
||||
|
||||
### Rollback
|
||||
|
||||
```bash
|
||||
helm history bifrost
|
||||
helm rollback bifrost # to previous revision
|
||||
helm rollback bifrost 2 # to revision 2
|
||||
```
|
||||
|
||||
### Uninstall
|
||||
|
||||
```bash
|
||||
helm uninstall bifrost
|
||||
|
||||
# Also remove PVCs (deletes all data)
|
||||
kubectl delete pvc -l app.kubernetes.io/instance=bifrost
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## All Key Parameters
|
||||
|
||||
A quick-reference table of the most commonly used top-level parameters:
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `image.tag` | **Required.** Bifrost image version (e.g., `v1.4.11`) | `""` |
|
||||
| `replicaCount` | Number of replicas | `1` |
|
||||
| `storage.mode` | Storage backend (`sqlite` or `postgres`) | `sqlite` |
|
||||
| `storage.persistence.size` | PVC size for SQLite | `10Gi` |
|
||||
| `postgresql.enabled` | Deploy embedded PostgreSQL | `false` |
|
||||
| `vectorStore.enabled` | Enable vector store | `false` |
|
||||
| `vectorStore.type` | Vector store type (`weaviate`, `redis`, `qdrant`) | `none` |
|
||||
| `bifrost.encryptionKey` | Optional encryption key (use `encryptionKeySecret` in production). If omitted, data is stored in plaintext. | `""` |
|
||||
| `ingress.enabled` | Enable ingress | `false` |
|
||||
| `autoscaling.enabled` | Enable HPA | `false` |
|
||||
|
||||
### Secret Reference Parameters
|
||||
|
||||
Use existing Kubernetes Secrets instead of plain-text values. Every sensitive field in the chart has a corresponding `existingSecret` / `secretRef` alternative:
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `bifrost.encryptionKeySecret.name` | Secret name for encryption key | `""` |
|
||||
| `bifrost.encryptionKeySecret.key` | Key within the secret | `"encryption-key"` |
|
||||
| `postgresql.external.existingSecret` | Secret name for PostgreSQL password | `""` |
|
||||
| `postgresql.external.passwordKey` | Key within the secret | `"password"` |
|
||||
| `vectorStore.redis.external.existingSecret` | Secret name for Redis password | `""` |
|
||||
| `vectorStore.redis.external.passwordKey` | Key within the secret | `"password"` |
|
||||
| `vectorStore.weaviate.external.existingSecret` | Secret name for Weaviate API key | `""` |
|
||||
| `vectorStore.weaviate.external.apiKeyKey` | Key within the secret | `"api-key"` |
|
||||
| `vectorStore.qdrant.external.existingSecret` | Secret name for Qdrant API key | `""` |
|
||||
| `vectorStore.qdrant.external.apiKeyKey` | Key within the secret | `"api-key"` |
|
||||
| `bifrost.plugins.maxim.secretRef.name` | Secret name for Maxim API key | `""` |
|
||||
| `bifrost.plugins.maxim.secretRef.key` | Key within the secret | `"api-key"` |
|
||||
| `bifrost.providerSecrets.<provider>.existingSecret` | Secret name for provider API key | `""` |
|
||||
| `bifrost.providerSecrets.<provider>.key` | Key within the secret | `"api-key"` |
|
||||
| `bifrost.providerSecrets.<provider>.envVar` | Environment variable name to inject | `""` |
|
||||
|
||||
---
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Comprehensive Example
|
||||
|
||||
A production-ready values file combining the most common settings:
|
||||
|
||||
```yaml
|
||||
# my-values.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
replicaCount: 3
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
auth:
|
||||
password: "secure-password" # use existingSecret in production
|
||||
|
||||
autoscaling:
|
||||
enabled: true
|
||||
minReplicas: 3
|
||||
maxReplicas: 10
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
hosts:
|
||||
- host: bifrost.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "key"
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "primary"
|
||||
value: "env.OPENAI_API_KEY"
|
||||
weight: 1
|
||||
providerSecrets:
|
||||
openai:
|
||||
existingSecret: "provider-api-keys"
|
||||
key: "openai-api-key"
|
||||
envVar: "OPENAI_API_KEY"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f my-values.yaml
|
||||
```
|
||||
|
||||
### Node Affinity & Scheduling
|
||||
|
||||
Deploy to specific nodes and spread replicas across hosts:
|
||||
|
||||
```yaml
|
||||
nodeSelector:
|
||||
node-type: ai-workload
|
||||
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: bifrost
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
tolerations:
|
||||
- key: "gpu"
|
||||
operator: "Equal"
|
||||
value: "true"
|
||||
effect: "NoSchedule"
|
||||
```
|
||||
|
||||
### Deployment & Pod Annotations
|
||||
|
||||
Useful for tooling like [Keel](https://keel.sh) for automatic image updates or Datadog APM injection:
|
||||
|
||||
```yaml
|
||||
deploymentAnnotations:
|
||||
keel.sh/policy: force
|
||||
keel.sh/trigger: poll
|
||||
|
||||
podAnnotations:
|
||||
ad.datadoghq.com/bifrost.logs: '[{"source":"bifrost","service":"bifrost"}]'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Patterns
|
||||
|
||||
Ready-made values files for the most common deployment scenarios. Each pattern builds on the [quickstart](/deployment-guides/helm).
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Development">
|
||||
|
||||
Simple setup for local testing. SQLite, single replica, no autoscaling.
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost \
|
||||
--set image.tag=v1.4.11 \
|
||||
--set 'bifrost.providers.openai.keys[0].name=dev-key' \
|
||||
--set 'bifrost.providers.openai.keys[0].value=sk-your-key' \
|
||||
--set 'bifrost.providers.openai.keys[0].weight=1'
|
||||
```
|
||||
|
||||
```bash
|
||||
# Access
|
||||
kubectl port-forward svc/bifrost 8080:8080
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Multi-Provider">
|
||||
|
||||
Multiple LLM providers with weighted load balancing.
|
||||
|
||||
```bash
|
||||
kubectl create secret generic provider-keys \
|
||||
--from-literal=openai-api-key='sk-...' \
|
||||
--from-literal=anthropic-api-key='sk-ant-...' \
|
||||
--from-literal=gemini-api-key='your-gemini-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# multi-provider.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
bifrost:
|
||||
encryptionKey: "your-encryption-key"
|
||||
|
||||
client:
|
||||
enableLogging: true
|
||||
allowDirectKeys: false
|
||||
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "env.OPENAI_API_KEY"
|
||||
weight: 2 # 50% of traffic
|
||||
anthropic:
|
||||
keys:
|
||||
- name: "anthropic-primary"
|
||||
value: "env.ANTHROPIC_API_KEY"
|
||||
weight: 1 # 25%
|
||||
gemini:
|
||||
keys:
|
||||
- name: "gemini-primary"
|
||||
value: "env.GEMINI_API_KEY"
|
||||
weight: 1 # 25%
|
||||
|
||||
providerSecrets:
|
||||
openai:
|
||||
existingSecret: "provider-keys"
|
||||
key: "openai-api-key"
|
||||
envVar: "OPENAI_API_KEY"
|
||||
anthropic:
|
||||
existingSecret: "provider-keys"
|
||||
key: "anthropic-api-key"
|
||||
envVar: "ANTHROPIC_API_KEY"
|
||||
gemini:
|
||||
existingSecret: "provider-keys"
|
||||
key: "gemini-api-key"
|
||||
envVar: "GEMINI_API_KEY"
|
||||
|
||||
plugins:
|
||||
telemetry:
|
||||
enabled: true
|
||||
logging:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f multi-provider.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="External Database">
|
||||
|
||||
Use an existing PostgreSQL instance — RDS, Cloud SQL, Azure Database, or self-managed.
|
||||
|
||||
```bash
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password='your-external-postgres-password'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# external-db.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: false
|
||||
external:
|
||||
enabled: true
|
||||
host: "your-rds-endpoint.us-east-1.rds.amazonaws.com"
|
||||
port: 5432
|
||||
user: "bifrost"
|
||||
database: "bifrost"
|
||||
sslMode: "require"
|
||||
existingSecret: "postgres-credentials"
|
||||
passwordKey: "password"
|
||||
|
||||
bifrost:
|
||||
encryptionKey: "your-encryption-key"
|
||||
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "sk-..."
|
||||
weight: 1
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f external-db.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="AI Workloads">
|
||||
|
||||
Semantic response caching for high-volume AI inference.
|
||||
|
||||
```bash
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=key='your-32-byte-encryption-key'
|
||||
|
||||
kubectl create secret generic provider-keys \
|
||||
--from-literal=openai-api-key='sk-your-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# ai-workload.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
auth:
|
||||
password: "secure-password"
|
||||
primary:
|
||||
persistence:
|
||||
size: 50Gi
|
||||
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: weaviate
|
||||
weaviate:
|
||||
enabled: true
|
||||
persistence:
|
||||
size: 50Gi
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "key"
|
||||
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "env.OPENAI_API_KEY"
|
||||
weight: 1
|
||||
|
||||
providerSecrets:
|
||||
openai:
|
||||
existingSecret: "provider-keys"
|
||||
key: "openai-api-key"
|
||||
envVar: "OPENAI_API_KEY"
|
||||
|
||||
plugins:
|
||||
semanticCache:
|
||||
enabled: true
|
||||
config:
|
||||
provider: "openai"
|
||||
keys:
|
||||
- value: "env.OPENAI_API_KEY"
|
||||
weight: 1
|
||||
embedding_model: "text-embedding-3-small"
|
||||
dimension: 1536
|
||||
threshold: 0.85
|
||||
ttl: "1h"
|
||||
cache_by_model: true
|
||||
cache_by_provider: true
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f ai-workload.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Kubernetes Secrets Only">
|
||||
|
||||
Zero credentials in values files — all sensitive data in Kubernetes Secrets.
|
||||
|
||||
```bash
|
||||
kubectl create secret generic postgres-credentials \
|
||||
--from-literal=password='your-postgres-password'
|
||||
|
||||
kubectl create secret generic bifrost-encryption \
|
||||
--from-literal=key='your-encryption-key'
|
||||
|
||||
kubectl create secret generic provider-keys \
|
||||
--from-literal=openai-api-key='sk-...' \
|
||||
--from-literal=anthropic-api-key='sk-ant-...'
|
||||
|
||||
kubectl create secret generic qdrant-credentials \
|
||||
--from-literal=api-key='your-qdrant-api-key'
|
||||
```
|
||||
|
||||
```yaml
|
||||
# secrets-only.yaml
|
||||
image:
|
||||
tag: "v1.4.11"
|
||||
|
||||
storage:
|
||||
mode: postgres
|
||||
|
||||
postgresql:
|
||||
enabled: false
|
||||
external:
|
||||
enabled: true
|
||||
host: "postgres.example.com"
|
||||
port: 5432
|
||||
user: "bifrost"
|
||||
database: "bifrost"
|
||||
sslMode: "require"
|
||||
existingSecret: "postgres-credentials"
|
||||
passwordKey: "password"
|
||||
|
||||
vectorStore:
|
||||
enabled: true
|
||||
type: qdrant
|
||||
qdrant:
|
||||
enabled: false
|
||||
external:
|
||||
enabled: true
|
||||
host: "qdrant.example.com"
|
||||
port: 6334
|
||||
existingSecret: "qdrant-credentials"
|
||||
apiKeyKey: "api-key"
|
||||
|
||||
bifrost:
|
||||
encryptionKeySecret:
|
||||
name: "bifrost-encryption"
|
||||
key: "key"
|
||||
|
||||
providers:
|
||||
openai:
|
||||
keys:
|
||||
- name: "openai-primary"
|
||||
value: "env.OPENAI_API_KEY"
|
||||
weight: 1
|
||||
anthropic:
|
||||
keys:
|
||||
- name: "anthropic-primary"
|
||||
value: "env.ANTHROPIC_API_KEY"
|
||||
weight: 1
|
||||
|
||||
providerSecrets:
|
||||
openai:
|
||||
existingSecret: "provider-keys"
|
||||
key: "openai-api-key"
|
||||
envVar: "OPENAI_API_KEY"
|
||||
anthropic:
|
||||
existingSecret: "provider-keys"
|
||||
key: "anthropic-api-key"
|
||||
envVar: "ANTHROPIC_API_KEY"
|
||||
```
|
||||
|
||||
```bash
|
||||
helm install bifrost bifrost/bifrost -f secrets-only.yaml
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
77
docs/deployment-guides/how-to/install-make.mdx
Normal file
77
docs/deployment-guides/how-to/install-make.mdx
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: "Install make command"
|
||||
description: "This guide explains how to install make command."
|
||||
icon: "compact-disc"
|
||||
---
|
||||
|
||||
|
||||
## Windows
|
||||
|
||||
### Option A: Chocolatey (easy)
|
||||
|
||||
```
|
||||
# Run in an elevated PowerShell (Run as Administrator)
|
||||
choco install make
|
||||
# verify
|
||||
make --version
|
||||
```
|
||||
|
||||
### Option B: Scoop (no admin needed)
|
||||
```
|
||||
# In a normal PowerShell
|
||||
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
|
||||
iwr get.scoop.sh -useb | iex
|
||||
scoop install make
|
||||
make --version
|
||||
```
|
||||
|
||||
### Option C: MSYS2 (full Unix-like env)
|
||||
|
||||
```
|
||||
# 1) Install MSYS2 from https://www.msys2.org/
|
||||
# 2) In "MSYS2 MSYS" terminal:
|
||||
pacman -Syu # then reopen terminal if asked
|
||||
pacman -S make
|
||||
make --version
|
||||
```
|
||||
|
||||
<Note> Visual Studio’s nmake is a different tool (not GNU make). </Note>
|
||||
|
||||
## Ubuntu / Debian
|
||||
|
||||
```
|
||||
sudo apt update
|
||||
# Pulls in compilers and common build tools, including make
|
||||
sudo apt install build-essential
|
||||
# (or just) sudo apt install make
|
||||
make --version
|
||||
```
|
||||
|
||||
## macOS
|
||||
|
||||
### Option A: Xcode Command Line Tools (most common)
|
||||
|
||||
```
|
||||
xcode-select --install # follow the prompt
|
||||
make --version
|
||||
```
|
||||
|
||||
This provides Apple’s/BSD-flavored make, which is fine for most projects.
|
||||
|
||||
### Option B: Homebrew (get GNU make ≥ 4.x as gmake)
|
||||
|
||||
```
|
||||
# Install Homebrew if needed: https://brew.sh
|
||||
brew install make
|
||||
gmake --version
|
||||
```
|
||||
|
||||
If a project specifically requires GNU make as make, you can use:
|
||||
|
||||
echo 'alias make="gmake"' >> ~/.zshrc && source ~/.zshrc
|
||||
|
||||
## Troubleshooting tips
|
||||
|
||||
- If make isn’t found, restart your terminal (or on Windows, open a new PowerShell) so your PATH updates.
|
||||
- Run which make (where make on Windows) to confirm which binary you’re using.
|
||||
- For Windows builds that depend on Unix tools (sed, grep, etc.), prefer MSYS2 or WSL for a smoother experience.
|
||||
444
docs/deployment-guides/how-to/multinode.mdx
Normal file
444
docs/deployment-guides/how-to/multinode.mdx
Normal file
@@ -0,0 +1,444 @@
|
||||
---
|
||||
title: "Multinode Deployment"
|
||||
description: "Deploy multiple Bifrost nodes with shared configuration for high availability in OSS deployments"
|
||||
icon: "layer-group"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Running multiple Bifrost nodes provides high availability, load distribution, and fault tolerance for your AI gateway. This guide covers the recommended approach for deploying multiple Bifrost nodes in OSS deployments.
|
||||
|
||||
<Warning>
|
||||
Running multiple OSS Bifrost nodes with a Postgres backend is not supported.
|
||||
|
||||
Here is the short technical explanation:
|
||||
|
||||
- Bifrost is designed to keep all critical information in memory, including provider configs, API keys, budgets, usage, and traffic distribution.
|
||||
- Once a node is initialized, it does not read this information back from the database.
|
||||
- In the Enterprise version, we use a slightly modified version of RAFT to synchronize this state in real time across nodes, while the database acts only as a dumb store.
|
||||
- Based on our current view, OSS is sufficient for startups and medium-scale teams, and can easily handle around 3,000–5,000 RPS on a single instance.
|
||||
- If you need high availability and enterprise capabilities such as real-time synchronization, the Enterprise plan is the right fit.
|
||||
- And yes, that is part of how we draw the OSS vs Enterprise line 💰.
|
||||
</Warning>
|
||||
|
||||
### OSS vs Enterprise
|
||||
|
||||
| Aspect | OSS Approach | Enterprise Approach |
|
||||
|--------|--------------|---------------------|
|
||||
| **Configuration Source** | Shared `config.json` file | Database with P2P sync |
|
||||
| **Sync Mechanism** | File sharing (ConfigMap, volumes) | Gossip protocol (real-time) |
|
||||
| **Config Updates** | Modify file + restart nodes | UI/API with automatic propagation |
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
All configuration in Bifrost is loaded into memory at startup. For OSS multinode deployments, the recommended approach is to use `config.json` **without** `config_store` enabled.
|
||||
|
||||
### `config.json` as Single Source of Truth
|
||||
|
||||
When you deploy without `config_store`:
|
||||
|
||||
- **No database involved** - `config.json` is the only configuration source
|
||||
- **Shared file** - All nodes read from the same `config.json` file
|
||||
- **Identical configuration** - Since the source is shared, all nodes automatically have the same configuration
|
||||
- **No sync needed** - The shared file itself ensures consistency
|
||||
|
||||
<Frame>
|
||||
<img src="/media/oss-multinode.png" alt="OSS multi-node setup" />
|
||||
</Frame>
|
||||
---
|
||||
|
||||
## Why not to use `config_store` for Multinode OSS?
|
||||
|
||||
Using `config_store` (database-backed configuration) with multiple nodes in OSS creates a **synchronization problem**:
|
||||
|
||||
1. **Config changes are local** - When you update configuration via the UI or API, it updates the database and the in-memory config on that specific node only
|
||||
2. **No propagation mechanism** - Other nodes don't know about the change; they keep their existing in-memory configuration
|
||||
3. **Nodes become out of sync** - Different nodes end up with different configurations
|
||||
4. **Restart required** - You'd have to restart all nodes after every config change to bring them back in sync
|
||||
|
||||
This defeats the purpose of having database-backed configuration with real-time updates.
|
||||
|
||||
<Warning>
|
||||
Without P2P clustering (Enterprise feature), there's no mechanism to notify other nodes of configuration changes. For OSS multinode deployments, use the shared `config.json` approach instead.
|
||||
</Warning>
|
||||
|
||||
### Enterprise Solution
|
||||
|
||||
Bifrost Enterprise includes **P2P clustering** with gossip protocol that automatically syncs configuration changes across all nodes in real-time. See the [Clustering documentation](/enterprise/clustering) for details.
|
||||
|
||||
---
|
||||
|
||||
## Setting Up Multinode OSS Deployment
|
||||
|
||||
### Example config.json
|
||||
|
||||
Create a `config.json` **without** `config_store` or `logs_store`:
|
||||
|
||||
<Note>
|
||||
If you use PostgreSQL for `logs_store`, ensure the target database is UTF8 encoded. See [PostgreSQL UTF8 Requirement](../../quickstart/gateway/setting-up#postgresql-utf8-requirement).
|
||||
</Note>
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"client": {
|
||||
"drop_excess_requests": false,
|
||||
"enable_logging": false
|
||||
},
|
||||
"config_store": {
|
||||
"enabled": false
|
||||
},
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {...}
|
||||
},
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "openai-primary",
|
||||
"value": "env.OPENAI_API_KEY",
|
||||
"models": ["gpt-4o", "gpt-4o-mini"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
},
|
||||
"anthropic": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "anthropic-primary",
|
||||
"value": "env.ANTHROPIC_API_KEY",
|
||||
"models": ["claude-sonnet-4-20250514", "claude-3-5-haiku-20241022"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
Notice `config_store` is disabled. This ensures all configuration comes from the file only.
|
||||
</Note>
|
||||
|
||||
### Kubernetes Deployment
|
||||
|
||||
Use a ConfigMap to share the same configuration across all pods:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: bifrost-config
|
||||
namespace: default
|
||||
data:
|
||||
config.json: |
|
||||
{
|
||||
"$schema": "https://www.getbifrost.ai/schema",
|
||||
"client": {
|
||||
"drop_excess_requests": false,
|
||||
"enable_logging": false
|
||||
},
|
||||
"config_store": {
|
||||
"enabled": false
|
||||
},
|
||||
"logs_store": {
|
||||
"enabled": true,
|
||||
"type": "postgres",
|
||||
"config": {...}
|
||||
},
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"name": "openai-primary",
|
||||
"value": "env.OPENAI_API_KEY",
|
||||
"models": ["gpt-4o", "gpt-4o-mini"],
|
||||
"weight": 1.0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: default
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bifrost
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bifrost
|
||||
spec:
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http
|
||||
env:
|
||||
- name: OPENAI_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: provider-secrets
|
||||
key: openai-api-key
|
||||
volumeMounts:
|
||||
- name: config
|
||||
mountPath: /app
|
||||
readOnly: true
|
||||
resources:
|
||||
requests:
|
||||
cpu: 250m
|
||||
memory: 256Mi
|
||||
limits:
|
||||
cpu: 1000m
|
||||
memory: 1Gi
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
volumes:
|
||||
- name: config
|
||||
configMap:
|
||||
name: bifrost-config
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: bifrost
|
||||
namespace: default
|
||||
spec:
|
||||
type: LoadBalancer
|
||||
selector:
|
||||
app: bifrost
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
protocol: TCP
|
||||
name: http
|
||||
```
|
||||
|
||||
### Docker Compose
|
||||
|
||||
Share the configuration using a bind mount:
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
ports:
|
||||
- "80:80"
|
||||
volumes:
|
||||
- ./nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
depends_on:
|
||||
- bifrost-1
|
||||
- bifrost-2
|
||||
- bifrost-3
|
||||
|
||||
bifrost-1:
|
||||
image: maximhq/bifrost:latest
|
||||
environment:
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
|
||||
volumes:
|
||||
- ./config.json:/app/config.json:ro
|
||||
expose:
|
||||
- "8080"
|
||||
|
||||
bifrost-2:
|
||||
image: maximhq/bifrost:latest
|
||||
environment:
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
|
||||
volumes:
|
||||
- ./config.json:/app/config.json:ro
|
||||
expose:
|
||||
- "8080"
|
||||
|
||||
bifrost-3:
|
||||
image: maximhq/bifrost:latest
|
||||
environment:
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
|
||||
volumes:
|
||||
- ./config.json:/app/config.json:ro
|
||||
expose:
|
||||
- "8080"
|
||||
```
|
||||
|
||||
**nginx.conf** for load balancing:
|
||||
|
||||
```nginx
|
||||
events {
|
||||
worker_connections 1024;
|
||||
}
|
||||
|
||||
http {
|
||||
upstream bifrost {
|
||||
least_conn;
|
||||
server bifrost-1:8080;
|
||||
server bifrost-2:8080;
|
||||
server bifrost-3:8080;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
|
||||
location / {
|
||||
proxy_pass http://bifrost;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_connect_timeout 60s;
|
||||
proxy_send_timeout 60s;
|
||||
proxy_read_timeout 60s;
|
||||
}
|
||||
|
||||
location /health {
|
||||
access_log off;
|
||||
return 200 "healthy\n";
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Bare Metal / VM Deployment
|
||||
|
||||
For bare metal or VM deployments, distribute the configuration file using:
|
||||
|
||||
- **NFS mount** - Mount a shared NFS directory containing `config.json`
|
||||
- **rsync** - Sync the config file from a central location to all nodes
|
||||
- **Configuration management** - Use Ansible, Chef, or Puppet to deploy identical configs
|
||||
|
||||
Example with rsync:
|
||||
|
||||
```bash
|
||||
# On config server - push to all nodes
|
||||
for node in node1 node2 node3; do
|
||||
rsync -avz /etc/bifrost/config.json $node:/etc/bifrost/config.json
|
||||
done
|
||||
|
||||
# Restart nodes after config update
|
||||
for node in node1 node2 node3; do
|
||||
ssh $node "systemctl restart bifrost"
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Updating Configuration
|
||||
|
||||
To update configuration in a multinode OSS deployment:
|
||||
|
||||
1. **Modify the shared `config.json` file**
|
||||
- Update the ConfigMap (Kubernetes)
|
||||
- Edit the shared file (Docker Compose / bare metal)
|
||||
|
||||
2. **Restart the nodes**
|
||||
- Rolling restart is supported - nodes can be restarted one at a time
|
||||
- Each node picks up the new configuration on startup
|
||||
|
||||
### Kubernetes Rolling Restart
|
||||
|
||||
```bash
|
||||
# Update ConfigMap
|
||||
kubectl apply -f configmap.yaml
|
||||
|
||||
# Trigger rolling restart
|
||||
kubectl rollout restart deployment/bifrost
|
||||
|
||||
# Watch the rollout
|
||||
kubectl rollout status deployment/bifrost
|
||||
```
|
||||
|
||||
### Docker Compose Restart
|
||||
|
||||
```bash
|
||||
# After updating config.json
|
||||
docker-compose restart bifrost-1
|
||||
docker-compose restart bifrost-2
|
||||
docker-compose restart bifrost-3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Use Environment Variables for Secrets
|
||||
|
||||
Never put API keys directly in `config.json`. Use the `env.` prefix to reference environment variables:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openai": {
|
||||
"keys": [
|
||||
{
|
||||
"value": "env.OPENAI_API_KEY"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then provide the actual keys via environment variables or Kubernetes secrets.
|
||||
|
||||
### Load Balancer Configuration
|
||||
|
||||
Always put a load balancer in front of your Bifrost nodes:
|
||||
|
||||
- **Kubernetes**: Use a Service with `type: LoadBalancer` or an Ingress
|
||||
- **Docker/VMs**: Use nginx, HAProxy, or a cloud load balancer
|
||||
|
||||
### Health Checks
|
||||
|
||||
Configure health checks to ensure traffic only goes to healthy nodes:
|
||||
|
||||
- **Liveness endpoint**: `GET /health`
|
||||
- **Readiness endpoint**: `GET /health`
|
||||
|
||||
### Resource Allocation
|
||||
|
||||
For production deployments:
|
||||
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
limits:
|
||||
cpu: 2000m
|
||||
memory: 2Gi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Scenario | Recommendation |
|
||||
|----------|----------------|
|
||||
| Single node | Use `config_store` for UI access |
|
||||
| Multinode OSS | Use shared `config.json` without `config_store` |
|
||||
| Multinode Enterprise | Use P2P clustering with `config_store` |
|
||||
|
||||
For OSS multinode deployments, the shared `config.json` approach provides a simple, reliable way to keep all nodes in sync without the complexity of database synchronization.
|
||||
185
docs/deployment-guides/how-to/nginx-reverse-proxy.mdx
Normal file
185
docs/deployment-guides/how-to/nginx-reverse-proxy.mdx
Normal file
@@ -0,0 +1,185 @@
|
||||
---
|
||||
title: "Nginx reverse proxy"
|
||||
description: "Run Bifrost behind NGINX with streaming-safe settings for SSE and WebSocket traffic"
|
||||
icon: "shuffle"
|
||||
---
|
||||
|
||||
This guide shows how to put NGINX in front of Bifrost for TLS termination, centralized routing, and load balancing.
|
||||
|
||||
<Note>
|
||||
Incoming reverse-proxy behavior is configured in your infrastructure layer (NGINX/Ingress), not in `config.json`.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## When to use this setup
|
||||
|
||||
- You want HTTPS termination in front of Bifrost.
|
||||
- You run multiple Bifrost replicas and want L7 load balancing.
|
||||
- You need one stable gateway URL for SDKs and agent clients.
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose deployment
|
||||
|
||||
Use this when Bifrost and NGINX run as services in the same Compose project.
|
||||
|
||||
```yaml
|
||||
services:
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
ports:
|
||||
- "80:80"
|
||||
volumes:
|
||||
- ./nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
depends_on:
|
||||
- bifrost-1
|
||||
- bifrost-2
|
||||
- bifrost-3
|
||||
|
||||
bifrost-1:
|
||||
image: maximhq/bifrost:latest
|
||||
expose:
|
||||
- "8080"
|
||||
|
||||
bifrost-2:
|
||||
image: maximhq/bifrost:latest
|
||||
expose:
|
||||
- "8080"
|
||||
|
||||
bifrost-3:
|
||||
image: maximhq/bifrost:latest
|
||||
expose:
|
||||
- "8080"
|
||||
```
|
||||
|
||||
```nginx
|
||||
events {
|
||||
worker_connections 1024;
|
||||
}
|
||||
|
||||
http {
|
||||
upstream bifrost_backend {
|
||||
least_conn;
|
||||
server bifrost-1:8080;
|
||||
server bifrost-2:8080;
|
||||
server bifrost-3:8080;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
|
||||
location / {
|
||||
proxy_pass http://bifrost_backend;
|
||||
|
||||
# Preserve original request context
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# Keep streaming responses stable
|
||||
proxy_http_version 1.1;
|
||||
proxy_buffering off;
|
||||
proxy_request_buffering off;
|
||||
proxy_read_timeout 300s;
|
||||
proxy_send_timeout 300s;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If you expose WebSocket traffic through the same endpoint, add upgrade headers in the same `location /` block:
|
||||
|
||||
```nginx
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## VM or bare-metal deployment
|
||||
|
||||
Use the same NGINX `location /` settings as above, and point `upstream` servers to hostnames/IPs reachable from that VM.
|
||||
|
||||
If you terminate TLS directly on NGINX, add:
|
||||
|
||||
```nginx
|
||||
listen 443 ssl;
|
||||
server_name bifrost.example.com;
|
||||
ssl_certificate /etc/nginx/certs/fullchain.pem;
|
||||
ssl_certificate_key /etc/nginx/certs/privkey.pem;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Kubernetes (NGINX Ingress)
|
||||
|
||||
If you deploy with Helm, use Ingress values instead of a standalone NGINX config:
|
||||
|
||||
```yaml
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
|
||||
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
|
||||
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
|
||||
nginx.ingress.kubernetes.io/proxy-buffering: "off"
|
||||
hosts:
|
||||
- host: bifrost.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: bifrost-tls
|
||||
hosts:
|
||||
- bifrost.example.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verify the proxy path
|
||||
|
||||
```bash
|
||||
# Docker Compose: render final config and validate syntax
|
||||
docker compose config
|
||||
|
||||
# Kubernetes: validate ingress manifest locally
|
||||
kubectl apply --dry-run=client -f ingress.yaml
|
||||
```
|
||||
|
||||
```bash
|
||||
# Health check through reverse proxy
|
||||
curl -i http://bifrost.example.com/health
|
||||
|
||||
# Streaming check through NGINX
|
||||
curl -N http://bifrost.example.com/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"stream": true,
|
||||
"messages": [{"role": "user", "content": "test stream"}]
|
||||
}'
|
||||
```
|
||||
|
||||
If streaming responses arrive in delayed bursts, confirm buffering is disabled in NGINX or Ingress annotations.
|
||||
|
||||
---
|
||||
|
||||
## Related guides
|
||||
|
||||
- [Helm quick start](/deployment-guides/helm)
|
||||
- [Helm values reference](/deployment-guides/helm/values)
|
||||
- [Multinode deployment](/deployment-guides/how-to/multinode)
|
||||
|
||||
---
|
||||
|
||||
## Runnable example files
|
||||
|
||||
Use the complete Docker Compose + Helm/Kubernetes example in the repository:
|
||||
|
||||
- [docker-compose.yml](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/docker-compose.yml)
|
||||
- [helm-values.yaml](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/helm-values.yaml)
|
||||
- [k8s-ingress.yaml](https://github.com/maximhq/bifrost/blob/main/examples/configs/withnginxreverseproxy/k8s-ingress.yaml)
|
||||
1691
docs/deployment-guides/k8s.mdx
Normal file
1691
docs/deployment-guides/k8s.mdx
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user