307 lines
9.5 KiB
Plaintext
307 lines
9.5 KiB
Plaintext
---
|
|
title: "Prometheus"
|
|
description: "Monitor Bifrost metrics with Prometheus scraping or Push Gateway for multi-node deployments"
|
|
icon: "chart-line"
|
|
---
|
|
|
|
## Overview
|
|
|
|
Bifrost exposes Prometheus metrics via two methods:
|
|
|
|
1. **Pull-based (Scraping)**: Traditional `/metrics` endpoint that Prometheus can scrape
|
|
2. **Push-based (Push Gateway)**: Push metrics to a Prometheus Push Gateway for cluster deployments
|
|
|
|
<Note>
|
|
**For multi-node deployments**: Use the Push Gateway method to ensure accurate metric aggregation. Traditional scraping may miss nodes behind load balancers.
|
|
</Note>
|
|
|
|
---
|
|
|
|
## Pull-based Scraping
|
|
|
|
Bifrost automatically exposes a `/metrics` endpoint when the telemetry plugin is enabled (enabled by default). No additional configuration is needed.
|
|
|
|
<Info>
|
|
When Bifrost's authentication is enabled (`auth_config.is_enabled = true`), the `/metrics` endpoint requires Basic auth credentials. You must include the same `admin_username` and `admin_password` from your `auth_config` in the Prometheus scrape configuration. Without this, Prometheus will receive `401 Unauthorized` responses and scraping will silently fail.
|
|
</Info>
|
|
|
|
### Prometheus Configuration
|
|
|
|
Add Bifrost to your Prometheus `prometheus.yml`:
|
|
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'bifrost'
|
|
static_configs:
|
|
- targets: ['bifrost-host:8080']
|
|
scrape_interval: 15s
|
|
```
|
|
|
|
If Bifrost authentication is enabled, add `basic_auth` to your scrape config:
|
|
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'bifrost'
|
|
static_configs:
|
|
- targets: ['bifrost-host:8080']
|
|
scrape_interval: 15s
|
|
basic_auth:
|
|
username: '<admin_username>'
|
|
password: '<admin_password>'
|
|
```
|
|
|
|
### Endpoint
|
|
|
|
```
|
|
GET /metrics
|
|
```
|
|
|
|
Returns metrics in Prometheus exposition format.
|
|
|
|
---
|
|
|
|
## Push-based (Push Gateway)
|
|
|
|
For multi-node cluster deployments, the Prometheus plugin pushes metrics to a [Prometheus Push Gateway](https://github.com/prometheus/pushgateway). This ensures all nodes' metrics are captured regardless of load balancer routing.
|
|
|
|
### Configuration
|
|
|
|
| Field | Type | Required | Default | Description |
|
|
|-------|------|----------|---------|-------------|
|
|
| `push_gateway_url` | `string` | ✅ Yes | - | Push Gateway URL (e.g., `http://pushgateway:9091`) |
|
|
| `job_name` | `string` | ❌ No | `bifrost` | Job label for pushed metrics |
|
|
| `instance_id` | `string` | ❌ No | hostname | Instance identifier for metric grouping |
|
|
| `push_interval` | `integer` | ❌ No | `15` | Push interval in seconds (1-300) |
|
|
| `basic_auth` | `object` | ❌ No | - | Basic auth credentials |
|
|
|
|
### Basic Auth Configuration
|
|
|
|
| Field | Type | Required | Description |
|
|
|-------|------|----------|-------------|
|
|
| `username` | `string` | ✅ Yes | Basic auth username |
|
|
| `password` | `string` | ✅ Yes | Basic auth password |
|
|
|
|
---
|
|
|
|
## Setup
|
|
|
|
<Tabs group="setup-method">
|
|
<Tab title="UI">
|
|
|
|
1. Navigate to **Observability** → **Prometheus** in the Bifrost UI
|
|
2. The `/metrics` endpoint is shown at the top for scraping configuration
|
|
3. To enable Push Gateway:
|
|
- Enter the **Push Gateway URL**
|
|
- Configure **Job Name** and **Push Interval** as needed
|
|
- Optionally set a custom **Instance ID**
|
|
- Enable **Basic Authentication** if required
|
|
- Toggle **Enable Push Gateway** on
|
|
- Click **Save Prometheus Configuration**
|
|
|
|
</Tab>
|
|
<Tab title="Config File">
|
|
|
|
```json
|
|
{
|
|
"plugins": [
|
|
{
|
|
"name": "telemetry",
|
|
"enabled": true,
|
|
"config": {
|
|
"push_gateway": {
|
|
"enabled": true,
|
|
"push_gateway_url": "http://pushgateway:9091",
|
|
"job_name": "bifrost",
|
|
"push_interval": 15
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### With Basic Auth
|
|
|
|
```json
|
|
{
|
|
"plugins": [
|
|
{
|
|
"name": "telemetry",
|
|
"enabled": true,
|
|
"config": {
|
|
"push_gateway": {
|
|
"enabled": true,
|
|
"push_gateway_url": "http://pushgateway:9091",
|
|
"job_name": "bifrost",
|
|
"push_interval": 15,
|
|
"instance_id": "bifrost-node-1",
|
|
"basic_auth": {
|
|
"username": "admin",
|
|
"password": "secret"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
---
|
|
|
|
## Available Metrics
|
|
|
|
The following metrics are available from both the `/metrics` endpoint and Push Gateway:
|
|
|
|
### HTTP Metrics
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `http_requests_total` | Counter | Total HTTP requests by path, method, status |
|
|
| `http_request_duration_seconds` | Histogram | HTTP request latency |
|
|
| `http_request_size_bytes` | Histogram | Request body size |
|
|
| `http_response_size_bytes` | Histogram | Response body size |
|
|
|
|
### Bifrost LLM Metrics
|
|
|
|
| Metric | Type | Description |
|
|
|--------|------|-------------|
|
|
| `bifrost_upstream_requests_total` | Counter | Total requests to LLM providers |
|
|
| `bifrost_upstream_latency_seconds` | Histogram | Provider request latency |
|
|
| `bifrost_success_requests_total` | Counter | Successful provider requests |
|
|
| `bifrost_error_requests_total` | Counter | Failed provider requests |
|
|
| `bifrost_input_tokens_total` | Counter | Total input tokens processed |
|
|
| `bifrost_output_tokens_total` | Counter | Total output tokens generated |
|
|
| `bifrost_cost_total` | Counter | Total cost in USD |
|
|
| `bifrost_cache_hits_total` | Counter | Cache hits by type |
|
|
| `bifrost_stream_first_token_latency_seconds` | Histogram | Time to first token (streaming) |
|
|
| `bifrost_stream_inter_token_latency_seconds` | Histogram | Inter-token latency (streaming) |
|
|
| `bifrost_key_rotation_events_total` | Counter | Per-attempt retry/rotation events with key identifiers (see below) <sup>v1.5.0-prerelease4+</sup> |
|
|
|
|
### Default Labels
|
|
|
|
All Bifrost metrics include these labels:
|
|
|
|
- `provider` - LLM provider name
|
|
- `model` - Model identifier
|
|
- `method` - Request type (chat, completion, embedding, etc.)
|
|
- `virtual_key_id` / `virtual_key_name` - Virtual key identifiers
|
|
- `selected_key_id` / `selected_key_name` - API key that successfully served the request (`""` when all attempts failed)
|
|
- `number_of_retries` - Total attempts minus one (across all keys)
|
|
- `fallback_index` - Fallback position
|
|
- `team_id` / `team_name` - Team identifiers (if governance enabled)
|
|
- `customer_id` / `customer_name` - Customer identifiers (if governance enabled)
|
|
|
|
<Note>
|
|
**v1.5.0-prerelease4+**: `selected_key_id` / `selected_key_name` are only populated when the request succeeds. On final errors both are empty — use `bifrost_key_rotation_events_total` or the `attempt_trail` log field to see which keys were tried.
|
|
</Note>
|
|
|
|
### Key Rotation Events <sup>v1.5.0-prerelease4+</sup>
|
|
|
|
`bifrost_key_rotation_events_total` is incremented once per **failed attempt** (not per request), giving you time-series visibility into retry pressure:
|
|
|
|
| Label | Values | Description |
|
|
|-------|--------|-------------|
|
|
| `provider` | e.g. `openai` | LLM provider |
|
|
| `requested_model` | e.g. `gpt-4o` | Model as requested (before any alias resolution) |
|
|
| `key_id` | UUID | The provider API key that failed on this attempt |
|
|
| `key_name` | string | Human-readable name of the provider API key |
|
|
| `fail_reason` | error type string | Provider error type (e.g. `rate_limit_error`, `network_error`) |
|
|
|
|
**Example queries:**
|
|
|
|
```promql
|
|
# Rate-limit events per provider over time
|
|
sum by (provider, fail_reason) (
|
|
rate(bifrost_key_rotation_events_total[5m])
|
|
)
|
|
|
|
# Which specific keys are hitting rate limits most often
|
|
topk(5, sum by (provider, key_name, fail_reason) (
|
|
rate(bifrost_key_rotation_events_total{fail_reason="rate_limit_error"}[1h])
|
|
))
|
|
```
|
|
|
|
---
|
|
|
|
## Push Gateway Setup
|
|
|
|
If you don't have a Push Gateway running, deploy one:
|
|
|
|
### Docker
|
|
|
|
```bash
|
|
docker run -d -p 9091:9091 prom/pushgateway
|
|
```
|
|
|
|
### Kubernetes (Helm)
|
|
|
|
```bash
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
helm install pushgateway prometheus-community/prometheus-pushgateway
|
|
```
|
|
|
|
### Configure Prometheus to Scrape Push Gateway
|
|
|
|
Add to your `prometheus.yml`:
|
|
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'pushgateway'
|
|
honor_labels: true
|
|
static_configs:
|
|
- targets: ['pushgateway:9091']
|
|
```
|
|
|
|
<Note>
|
|
The `honor_labels: true` setting is important - it preserves the `job` and `instance` labels pushed by Bifrost instead of overwriting them with the Push Gateway's labels.
|
|
</Note>
|
|
|
|
---
|
|
|
|
## Pull vs Push: When to Use Each
|
|
|
|
| Scenario | Recommended Method |
|
|
|----------|-------------------|
|
|
| Single Bifrost instance | Pull (scraping) |
|
|
| Multiple instances, direct access | Pull (scraping) |
|
|
| Multiple instances behind load balancer | **Push (Push Gateway)** |
|
|
| Kubernetes with service mesh | Pull or Push |
|
|
| Serverless / ephemeral instances | **Push (Push Gateway)** |
|
|
|
|
### Why Push for Clusters?
|
|
|
|
When multiple Bifrost instances run behind a load balancer:
|
|
|
|
1. **Scraping randomness**: Each scrape may hit different nodes, missing metrics from others
|
|
2. **Instance tracking**: Push Gateway properly tracks per-instance metrics via `instance` label
|
|
3. **Aggregation**: Downstream tools (Grafana, Datadog) can aggregate across all instances
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Push Gateway Connection Failed
|
|
|
|
```
|
|
failed to push metrics to push gateway: connection refused
|
|
```
|
|
|
|
- Verify the Push Gateway URL is correct and reachable from Bifrost
|
|
- Check firewall rules between Bifrost and Push Gateway
|
|
- Ensure Push Gateway is running: `curl http://pushgateway:9091/metrics`
|
|
|
|
### Metrics Not Appearing
|
|
|
|
- Verify the telemetry plugin is enabled (required for metrics collection)
|
|
- Check Bifrost logs for push errors
|
|
- Verify Prometheus is scraping the Push Gateway with `honor_labels: true`
|
|
|
|
### Authentication Failed
|
|
|
|
- Double-check username and password
|
|
- Ensure basic auth is configured on the Push Gateway side
|
|
- Check for special characters that may need escaping
|