---
title: "Telemetry"
description: "Comprehensive Prometheus-based monitoring for Bifrost Gateway with custom metrics and labels."
icon: "gauge"
---
## Overview
Bifrost provides built-in telemetry and monitoring capabilities through Prometheus metrics collection. The telemetry system tracks both HTTP-level performance metrics and upstream provider interactions, giving you complete visibility into your AI gateway's performance and usage patterns.
**Key Features:**
- **Prometheus Integration** - Native metrics collection at `/metrics` endpoint
- **Comprehensive Tracking** - Success/error rates, token usage, costs, and cache performance
- **Custom Labels** - Configurable dimensions for detailed analysis
- **Dynamic Headers** - Runtime label injection via `x-bf-prom-*` headers
- **Cost Monitoring** - Real-time tracking of AI provider costs in USD
- **Cache Analytics** - Direct and semantic cache hit tracking
- **Async Collection** - Zero-latency impact on request processing
- **Multi-Level Tracking** - HTTP transport + upstream provider metrics
The telemetry plugin operates asynchronously to ensure metrics collection doesn't impact request latency or connection performance.
---
## Default Metrics
### HTTP Transport Metrics
These metrics track all incoming HTTP requests to Bifrost:
| Metric | Type | Description |
|--------|------|-------------|
| `http_requests_total` | Counter | Total number of HTTP requests |
| `http_request_duration_seconds` | Histogram | Duration of HTTP requests |
| `http_request_size_bytes` | Histogram | Size of incoming HTTP requests |
| `http_response_size_bytes` | Histogram | Size of outgoing HTTP responses |
Labels:
- `path`: HTTP endpoint path
- `method`: HTTP verb (e.g., `GET`, `POST`, `PUT`, `DELETE`)
- `status`: HTTP status code
- custom labels: Custom labels configured in the Bifrost configuration
### Upstream Provider Metrics
These metrics track requests forwarded to AI providers:
| Metric | Type | Description | Labels |
|--------|------|-------------|---------|
| `bifrost_upstream_requests_total` | Counter | Total requests forwarded to upstream providers | Base Labels, custom labels |
| `bifrost_success_requests_total` | Counter | Total successful requests to upstream providers | Base Labels, custom labels |
| `bifrost_error_requests_total` | Counter | Total failed requests to upstream providers | Base Labels, `status_code`, custom labels |
| `bifrost_upstream_latency_seconds` | Histogram | Latency of upstream provider requests | Base Labels, `is_success`, custom labels |
| `bifrost_input_tokens_total` | Counter | Total input tokens sent to upstream providers | Base Labels, custom labels |
| `bifrost_output_tokens_total` | Counter | Total output tokens received from upstream providers | Base Labels, custom labels |
| `bifrost_cache_hits_total` | Counter | Total cache hits by type (direct/semantic) | Base Labels, `cache_type`, custom labels |
| `bifrost_cost_total` | Counter | Total cost in USD for upstream provider requests | Base Labels, custom labels |
Base Labels:
- `provider`: AI provider name (e.g., `openai`, `anthropic`, `azure`)
- `model`: Model name (e.g., `gpt-4o-mini`, `claude-3-sonnet`)
- `method`: Request type (`chat`, `text`, `embedding`, `speech`, `transcription`)
- `virtual_key_id`: Virtual key ID
- `virtual_key_name`: Virtual key name
- `routing_engines_used`: Comma-separated routing engines used ("routing-rule", "governance", "loadbalancing")
- `routing_rule_id`: Routing rule ID that matched the request
- `routing_rule_name`: Routing rule name that matched the request
- `selected_key_id`: ID of the key that successfully served the request (`null` on final errors)
- `selected_key_name`: Name of the key that successfully served the request (`null` on final errors)
- `number_of_retries`: Number of retries
- `fallback_index`: Fallback index (0 for first attempt, 1 for second attempt, etc.)
- custom labels: Custom labels configured in the Bifrost configuration
### Streaming Metrics
These metrics capture latency characteristics specific to streaming responses:
| Metric | Type | Description | Labels |
|--------|------|-------------|---------|
| `bifrost_stream_first_token_latency_seconds` | Histogram | Time from request start to first streamed token | Base Labels |
| `bifrost_stream_inter_token_latency_seconds` | Histogram | Latency between subsequent streamed tokens | Base Labels |
---
## Monitoring Examples
### Success Rate Monitoring
Track the success rate of requests to different providers:
```promql
# Success rate by provider
rate(bifrost_success_requests_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100
```
### Token Usage Analysis
Monitor token consumption across different models:
```promql
# Input tokens per minute by model
increase(bifrost_input_tokens_total[1m])
# Output tokens per minute by model
increase(bifrost_output_tokens_total[1m])
# Token efficiency (output/input ratio)
rate(bifrost_output_tokens_total[5m]) /
rate(bifrost_input_tokens_total[5m])
```
### Cost Tracking
Monitor spending across providers and models:
```promql
# Cost per second by provider
sum by (provider) (rate(bifrost_cost_total[1m]))
# Daily cost estimate
sum by (provider) (increase(bifrost_cost_total[1d]))
# Cost per request by provider and model
sum by (provider, model) (rate(bifrost_cost_total[5m])) /
sum by (provider, model) (rate(bifrost_upstream_requests_total[5m]))
```
### Cache Performance
Track cache effectiveness:
```promql
# Cache hit rate by type
rate(bifrost_cache_hits_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100
# Direct vs semantic cache hits
sum by (cache_type) (rate(bifrost_cache_hits_total[5m]))
```
### Error Rate Analysis
Monitor error patterns:
```promql
# Error rate by provider
rate(bifrost_error_requests_total[5m]) /
rate(bifrost_upstream_requests_total[5m]) * 100
# Errors by model
sum by (model) (rate(bifrost_error_requests_total[5m]))
```
---
## Configuration
Configure custom Prometheus labels to add dimensions for filtering and analysis:

1. **Navigate to Configuration**
- Open Bifrost UI at `http://localhost:8080`
- Go to **Config** tab
2. **Prometheus Labels**
```
Custom Labels: team, environment, organization, project
```
```bash
# Update prometheus labels via API
curl -X PATCH http://localhost:8080/config \
-H "Content-Type: application/json" \
-d '{
"client": {
"prometheus_labels": ["team", "environment", "organization", "project"]
}
}'
```
```json
{
"client": {
"prometheus_labels": ["team", "environment", "organization", "project"],
"drop_excess_requests": false,
"initial_pool_size": 300
}
}
```
### Dynamic Label Injection
Add custom label values at runtime using `x-bf-prom-*` headers:
```bash
# Add custom labels to specific requests
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-bf-prom-team: engineering" \
-H "x-bf-prom-environment: production" \
-H "x-bf-prom-organization: my-org" \
-H "x-bf-prom-project: my-project" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
**Header Format:**
- Prefix: `x-bf-prom-`
- Label name: Any string after the prefix
- Value: String value for the label
---
## Infrastructure Setup
### Development & Testing
For local development and testing, use the provided Docker Compose setup:
```bash
# Navigate to telemetry plugin directory
cd plugins/telemetry
# Start Prometheus and Grafana
docker-compose up -d
# Access endpoints
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
# Bifrost metrics: http://localhost:8080/metrics
```
**Development Only**: The provided Docker Compose setup is for testing purposes only. Do not use in production without proper security, scaling, and persistence configuration.
You can use the Prometheus scraping endpoint to create your own Grafana dashboards. Given below are few examples created using the Docker Compose setup.

### Production Deployment
For production environments:
1. **Deploy Prometheus** with proper persistence, retention, and security
2. **Configure scraping** to target your Bifrost instances at `/metrics`
3. **Set up Grafana** with authentication and dashboards
4. **Configure alerts** based on your SLA requirements
**Prometheus Scrape Configuration:**
```yaml
scrape_configs:
- job_name: "bifrost-gateway"
static_configs:
- targets: ["bifrost-instance-1:8080", "bifrost-instance-2:8080"]
scrape_interval: 30s
metrics_path: /metrics
# If Bifrost auth is enabled, add:
# basic_auth:
# username: ''
# password: ''
```
If you have Bifrost authentication enabled (`auth_config`), you must include `basic_auth` in the scrape config with your `admin_username` and `admin_password`. See the [Prometheus docs](/features/observability/prometheus#pull-based-scraping) for details.
### Production Alerting Examples
Configure alerts for critical scenarios using the new metrics:
**High Error Rate Alert:**
```yaml
- alert: BifrostHighErrorRate
expr: sum by (provider) (rate(bifrost_error_requests_total[5m])) / sum by (provider) (rate(bifrost_upstream_requests_total[5m])) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected for provider {{ $labels.provider }} ({{ $value | humanizePercentage }})"
```
**High Cost Alert:**
```yaml
- alert: BifrostHighCosts
expr: sum by (provider) (increase(bifrost_cost_total[1d])) > 100 # $100/day threshold
for: 10m
labels:
severity: warning
annotations:
summary: "Daily cost for provider {{ $labels.provider }} exceeds $100 ({{ $value | printf \"%.2f\" }})"
```
**Cache Performance Alert:**
```yaml
- alert: BifrostLowCacheHitRate
expr: sum by (provider) (rate(bifrost_cache_hits_total[15m])) / sum by (provider) (rate(bifrost_upstream_requests_total[15m])) < 0.1
for: 5m
labels:
severity: info
annotations:
summary: "Cache hit rate for provider {{ $labels.provider }} below 10% ({{ $value | humanizePercentage }})"
```
---
## Next Steps
- **[Prometheus Documentation](https://prometheus.io/docs/)** - Official Prometheus guides
- **[Grafana Setup](https://grafana.com/docs/)** - Dashboard creation and management
- **[Tracing](./observability/default)** - Request/response logging for detailed analysis