first commit
This commit is contained in:
322
docs/features/telemetry.mdx
Normal file
322
docs/features/telemetry.mdx
Normal file
@@ -0,0 +1,322 @@
|
||||
---
|
||||
title: "Telemetry"
|
||||
description: "Comprehensive Prometheus-based monitoring for Bifrost Gateway with custom metrics and labels."
|
||||
icon: "gauge"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost provides built-in telemetry and monitoring capabilities through Prometheus metrics collection. The telemetry system tracks both HTTP-level performance metrics and upstream provider interactions, giving you complete visibility into your AI gateway's performance and usage patterns.
|
||||
|
||||
**Key Features:**
|
||||
- **Prometheus Integration** - Native metrics collection at `/metrics` endpoint
|
||||
- **Comprehensive Tracking** - Success/error rates, token usage, costs, and cache performance
|
||||
- **Custom Labels** - Configurable dimensions for detailed analysis
|
||||
- **Dynamic Headers** - Runtime label injection via `x-bf-prom-*` headers
|
||||
- **Cost Monitoring** - Real-time tracking of AI provider costs in USD
|
||||
- **Cache Analytics** - Direct and semantic cache hit tracking
|
||||
- **Async Collection** - Zero-latency impact on request processing
|
||||
- **Multi-Level Tracking** - HTTP transport + upstream provider metrics
|
||||
|
||||
The telemetry plugin operates asynchronously to ensure metrics collection doesn't impact request latency or connection performance.
|
||||
|
||||
---
|
||||
|
||||
## Default Metrics
|
||||
|
||||
### HTTP Transport Metrics
|
||||
|
||||
These metrics track all incoming HTTP requests to Bifrost:
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `http_requests_total` | Counter | Total number of HTTP requests |
|
||||
| `http_request_duration_seconds` | Histogram | Duration of HTTP requests |
|
||||
| `http_request_size_bytes` | Histogram | Size of incoming HTTP requests |
|
||||
| `http_response_size_bytes` | Histogram | Size of outgoing HTTP responses |
|
||||
|
||||
Labels:
|
||||
- `path`: HTTP endpoint path
|
||||
- `method`: HTTP verb (e.g., `GET`, `POST`, `PUT`, `DELETE`)
|
||||
- `status`: HTTP status code
|
||||
- custom labels: Custom labels configured in the Bifrost configuration
|
||||
|
||||
### Upstream Provider Metrics
|
||||
|
||||
These metrics track requests forwarded to AI providers:
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|---------|
|
||||
| `bifrost_upstream_requests_total` | Counter | Total requests forwarded to upstream providers | Base Labels, custom labels |
|
||||
| `bifrost_success_requests_total` | Counter | Total successful requests to upstream providers | Base Labels, custom labels |
|
||||
| `bifrost_error_requests_total` | Counter | Total failed requests to upstream providers | Base Labels, `status_code`, custom labels |
|
||||
| `bifrost_upstream_latency_seconds` | Histogram | Latency of upstream provider requests | Base Labels, `is_success`, custom labels |
|
||||
| `bifrost_input_tokens_total` | Counter | Total input tokens sent to upstream providers | Base Labels, custom labels |
|
||||
| `bifrost_output_tokens_total` | Counter | Total output tokens received from upstream providers | Base Labels, custom labels |
|
||||
| `bifrost_cache_hits_total` | Counter | Total cache hits by type (direct/semantic) | Base Labels, `cache_type`, custom labels |
|
||||
| `bifrost_cost_total` | Counter | Total cost in USD for upstream provider requests | Base Labels, custom labels |
|
||||
|
||||
Base Labels:
|
||||
- `provider`: AI provider name (e.g., `openai`, `anthropic`, `azure`)
|
||||
- `model`: Model name (e.g., `gpt-4o-mini`, `claude-3-sonnet`)
|
||||
- `method`: Request type (`chat`, `text`, `embedding`, `speech`, `transcription`)
|
||||
- `virtual_key_id`: Virtual key ID
|
||||
- `virtual_key_name`: Virtual key name
|
||||
- `routing_engines_used`: Comma-separated routing engines used ("routing-rule", "governance", "loadbalancing")
|
||||
- `routing_rule_id`: Routing rule ID that matched the request
|
||||
- `routing_rule_name`: Routing rule name that matched the request
|
||||
- `selected_key_id`: ID of the key that successfully served the request (`null` on final errors)
|
||||
- `selected_key_name`: Name of the key that successfully served the request (`null` on final errors)
|
||||
- `number_of_retries`: Number of retries
|
||||
- `fallback_index`: Fallback index (0 for first attempt, 1 for second attempt, etc.)
|
||||
- custom labels: Custom labels configured in the Bifrost configuration
|
||||
|
||||
### Streaming Metrics
|
||||
|
||||
These metrics capture latency characteristics specific to streaming responses:
|
||||
|
||||
| Metric | Type | Description | Labels |
|
||||
|--------|------|-------------|---------|
|
||||
| `bifrost_stream_first_token_latency_seconds` | Histogram | Time from request start to first streamed token | Base Labels |
|
||||
| `bifrost_stream_inter_token_latency_seconds` | Histogram | Latency between subsequent streamed tokens | Base Labels |
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Examples
|
||||
|
||||
### Success Rate Monitoring
|
||||
Track the success rate of requests to different providers:
|
||||
|
||||
```promql
|
||||
# Success rate by provider
|
||||
rate(bifrost_success_requests_total[5m]) /
|
||||
rate(bifrost_upstream_requests_total[5m]) * 100
|
||||
```
|
||||
|
||||
### Token Usage Analysis
|
||||
Monitor token consumption across different models:
|
||||
|
||||
```promql
|
||||
# Input tokens per minute by model
|
||||
increase(bifrost_input_tokens_total[1m])
|
||||
|
||||
# Output tokens per minute by model
|
||||
increase(bifrost_output_tokens_total[1m])
|
||||
|
||||
# Token efficiency (output/input ratio)
|
||||
rate(bifrost_output_tokens_total[5m]) /
|
||||
rate(bifrost_input_tokens_total[5m])
|
||||
```
|
||||
|
||||
### Cost Tracking
|
||||
Monitor spending across providers and models:
|
||||
|
||||
```promql
|
||||
# Cost per second by provider
|
||||
sum by (provider) (rate(bifrost_cost_total[1m]))
|
||||
|
||||
# Daily cost estimate
|
||||
sum by (provider) (increase(bifrost_cost_total[1d]))
|
||||
|
||||
# Cost per request by provider and model
|
||||
sum by (provider, model) (rate(bifrost_cost_total[5m])) /
|
||||
sum by (provider, model) (rate(bifrost_upstream_requests_total[5m]))
|
||||
```
|
||||
|
||||
### Cache Performance
|
||||
Track cache effectiveness:
|
||||
|
||||
```promql
|
||||
# Cache hit rate by type
|
||||
rate(bifrost_cache_hits_total[5m]) /
|
||||
rate(bifrost_upstream_requests_total[5m]) * 100
|
||||
|
||||
# Direct vs semantic cache hits
|
||||
sum by (cache_type) (rate(bifrost_cache_hits_total[5m]))
|
||||
```
|
||||
|
||||
### Error Rate Analysis
|
||||
Monitor error patterns:
|
||||
|
||||
```promql
|
||||
# Error rate by provider
|
||||
rate(bifrost_error_requests_total[5m]) /
|
||||
rate(bifrost_upstream_requests_total[5m]) * 100
|
||||
|
||||
# Errors by model
|
||||
sum by (model) (rate(bifrost_error_requests_total[5m]))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
Configure custom Prometheus labels to add dimensions for filtering and analysis:
|
||||
|
||||
<Tabs group="config-method">
|
||||
<Tab title="Web UI">
|
||||
|
||||

|
||||
|
||||
1. **Navigate to Configuration**
|
||||
- Open Bifrost UI at `http://localhost:8080`
|
||||
- Go to **Config** tab
|
||||
|
||||
2. **Prometheus Labels**
|
||||
```
|
||||
Custom Labels: team, environment, organization, project
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="API">
|
||||
|
||||
```bash
|
||||
# Update prometheus labels via API
|
||||
curl -X PATCH http://localhost:8080/config \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"client": {
|
||||
"prometheus_labels": ["team", "environment", "organization", "project"]
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"prometheus_labels": ["team", "environment", "organization", "project"],
|
||||
"drop_excess_requests": false,
|
||||
"initial_pool_size": 300
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### Dynamic Label Injection
|
||||
|
||||
Add custom label values at runtime using `x-bf-prom-*` headers:
|
||||
|
||||
```bash
|
||||
# Add custom labels to specific requests
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-bf-prom-team: engineering" \
|
||||
-H "x-bf-prom-environment: production" \
|
||||
-H "x-bf-prom-organization: my-org" \
|
||||
-H "x-bf-prom-project: my-project" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}'
|
||||
```
|
||||
|
||||
**Header Format:**
|
||||
- Prefix: `x-bf-prom-`
|
||||
- Label name: Any string after the prefix
|
||||
- Value: String value for the label
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure Setup
|
||||
|
||||
### Development & Testing
|
||||
|
||||
For local development and testing, use the provided Docker Compose setup:
|
||||
|
||||
```bash
|
||||
# Navigate to telemetry plugin directory
|
||||
cd plugins/telemetry
|
||||
|
||||
# Start Prometheus and Grafana
|
||||
docker-compose up -d
|
||||
|
||||
# Access endpoints
|
||||
# Prometheus: http://localhost:9090
|
||||
# Grafana: http://localhost:3000 (admin/admin)
|
||||
# Bifrost metrics: http://localhost:8080/metrics
|
||||
```
|
||||
|
||||
<Warning>
|
||||
**Development Only**: The provided Docker Compose setup is for testing purposes only. Do not use in production without proper security, scaling, and persistence configuration.
|
||||
</Warning>
|
||||
|
||||
You can use the Prometheus scraping endpoint to create your own Grafana dashboards. Given below are few examples created using the Docker Compose setup.
|
||||
|
||||

|
||||
|
||||
### Production Deployment
|
||||
|
||||
For production environments:
|
||||
|
||||
1. **Deploy Prometheus** with proper persistence, retention, and security
|
||||
2. **Configure scraping** to target your Bifrost instances at `/metrics`
|
||||
3. **Set up Grafana** with authentication and dashboards
|
||||
4. **Configure alerts** based on your SLA requirements
|
||||
|
||||
**Prometheus Scrape Configuration:**
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: "bifrost-gateway"
|
||||
static_configs:
|
||||
- targets: ["bifrost-instance-1:8080", "bifrost-instance-2:8080"]
|
||||
scrape_interval: 30s
|
||||
metrics_path: /metrics
|
||||
# If Bifrost auth is enabled, add:
|
||||
# basic_auth:
|
||||
# username: '<admin_username>'
|
||||
# password: '<admin_password>'
|
||||
```
|
||||
|
||||
<Info>
|
||||
If you have Bifrost authentication enabled (`auth_config`), you must include `basic_auth` in the scrape config with your `admin_username` and `admin_password`. See the [Prometheus docs](/features/observability/prometheus#pull-based-scraping) for details.
|
||||
</Info>
|
||||
|
||||
### Production Alerting Examples
|
||||
|
||||
Configure alerts for critical scenarios using the new metrics:
|
||||
|
||||
**High Error Rate Alert:**
|
||||
```yaml
|
||||
- alert: BifrostHighErrorRate
|
||||
expr: sum by (provider) (rate(bifrost_error_requests_total[5m])) / sum by (provider) (rate(bifrost_upstream_requests_total[5m])) > 0.05
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High error rate detected for provider {{ $labels.provider }} ({{ $value | humanizePercentage }})"
|
||||
```
|
||||
|
||||
**High Cost Alert:**
|
||||
```yaml
|
||||
- alert: BifrostHighCosts
|
||||
expr: sum by (provider) (increase(bifrost_cost_total[1d])) > 100 # $100/day threshold
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Daily cost for provider {{ $labels.provider }} exceeds $100 ({{ $value | printf \"%.2f\" }})"
|
||||
```
|
||||
|
||||
**Cache Performance Alert:**
|
||||
```yaml
|
||||
- alert: BifrostLowCacheHitRate
|
||||
expr: sum by (provider) (rate(bifrost_cache_hits_total[15m])) / sum by (provider) (rate(bifrost_upstream_requests_total[15m])) < 0.1
|
||||
for: 5m
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
summary: "Cache hit rate for provider {{ $labels.provider }} below 10% ({{ $value | humanizePercentage }})"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Prometheus Documentation](https://prometheus.io/docs/)** - Official Prometheus guides
|
||||
- **[Grafana Setup](https://grafana.com/docs/)** - Dashboard creation and management
|
||||
- **[Tracing](./observability/default)** - Request/response logging for detailed analysis
|
||||
Reference in New Issue
Block a user