first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/docs/enterprise/datadog-connector.mdx
+++ b/docs/enterprise/datadog-connector.mdx
@@ -0,0 +1,548 @@
+---
+title: "Datadog"
+description: "Native Datadog integration for APM traces, LLM Observability, and metrics"
+icon: "dog"
+---
+
+## Overview
+
+<Frame>
+  <img src="/media/dd-trace.png" alt="Datadog LLM Observability dashboard" />
+</Frame>
+
+The **Datadog plugin** provides native integration with the Datadog observability platform, offering three pillars of observability for your LLM operations:
+
+- **APM Traces** - Distributed tracing via dd-trace-go v2 with W3C Trace Context support for end-to-end request visibility
+- **LLM Observability** - Native Datadog LLM Obs integration for AI/ML-specific monitoring
+- **Metrics** - Operational metrics via DogStatsD or the Metrics API
+
+Unlike the [OTel plugin](/features/observability/otel) which sends generic OpenTelemetry data, the Datadog plugin leverages Datadog's native SDKs for richer integration with Datadog-specific features like LLM Observability dashboards and ML App grouping.
+
+---
+
+## Deployment Modes
+
+<Frame>
+    <img src="/media/dd-mode.png" alt="Datadog LLM Observability dashboard" />
+</Frame>
+
+The plugin supports two deployment modes:
+
+| Mode | Description | Requirements | Best For |
+|------|-------------|--------------|----------|
+| **Agent** (default) | Sends data through a local Datadog Agent | Datadog Agent running on host | Production deployments with existing agent infrastructure |
+| **Agentless** | Sends data directly to Datadog APIs | API key only | Serverless, containers, or simplified deployments |
+
+### Agent Mode
+
+In agent mode, the plugin communicates with a locally running Datadog Agent:
+
+- **APM Traces** → Agent at `localhost:8126`
+- **Metrics** → DogStatsD at `localhost:8125`
+
+The agent handles batching, retries, and provides lower latency. This is the recommended mode for production deployments where you already have the Datadog Agent installed.
+
+### Agentless Mode
+
+In agentless mode, the plugin sends data directly to Datadog's intake APIs:
+
+- **APM Traces** → `https://trace.agent.{site}`
+- **LLM Observability** → Direct API submission
+- **Metrics** → Datadog Metrics API
+
+This mode requires an API key but simplifies deployment by eliminating the need for a local agent. Ideal for serverless environments, Kubernetes pods, or quick testing.
+
+---
+
+## Configuration
+
+### Required Fields
+
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `service_name` | `string` | No | `bifrost` | Service name displayed in Datadog APM |
+| `ml_app` | `string` | No | (uses `service_name`) | ML application name for LLM Observability grouping |
+| `agent_addr` | `string` | No | `localhost:8126` | Datadog Agent address (agent mode only) |
+| `dogstatsd_addr` | `string` | No | `localhost:8125` | DogStatsD server address (agent mode only) |
+| `env` | `string` | No | - | Environment tag (e.g., `production`, `staging`) |
+| `version` | `string` | No | - | Service version tag |
+| `custom_tags` | `object` | No | - | Additional tags for all traces and metrics |
+| `enable_metrics` | `bool` | No | `true` | Enable metrics emission |
+| `enable_traces` | `bool` | No | `true` | Enable APM traces |
+| `enable_llm_obs` | `bool` | No | `true` | Enable LLM Observability |
+| `agentless` | `bool` | No | `false` | Use agentless mode (direct API) |
+| `api_key` | `EnvVar` | Agentless only | - | Datadog API key (supports `env.VAR_NAME`) |
+| `site` | `string` | No | `datadoghq.com` | Datadog site/region |
+
+### Environment Variable Substitution
+
+The `api_key` and `custom_tags` fields support environment variable substitution using the `env.` prefix:
+
+```json
+{
+  "api_key": "env.DD_API_KEY",
+  "custom_tags": {
+    "team": "env.TEAM_NAME",
+    "cost_center": "env.COST_CENTER"
+  }
+}
+```
+
+---
+
+## Setup
+
+<Tabs group="setup-method">
+<Tab title="UI">
+
+<Frame>
+    <img src="/media/dd-config-page.png" alt="Datadog LLM Observability dashboard" />
+</Frame>
+
+Configure the Datadog plugin through the Bifrost UI:
+
+1. Navigate to **Settings** → **Plugins**
+2. Enable the **Datadog** plugin
+3. Configure the required fields based on your deployment mode
+
+{/* Screenshot placeholder - user will add */}
+
+</Tab>
+<Tab title="Go SDK">
+
+```go
+package main
+
+import (
+    "context"
+    bifrost "github.com/maximhq/bifrost/core"
+    "github.com/maximhq/bifrost/core/schemas"
+    "github.com/maximhq/bifrost/framework/modelcatalog"
+    datadog "github.com/maximhq/bifrost-enterprise/plugins/datadog"
+)
+
+func main() {
+    ctx := context.Background()
+    logger := schemas.NewLogger()
+    
+    // Initialize model catalog (required for cost calculation)
+    modelCatalog := modelcatalog.NewModelCatalog(logger)
+    
+    // Agent mode configuration
+    ddPlugin, err := datadog.Init(ctx, &datadog.Config{
+        ServiceName: "my-llm-service",
+        Env:         "production",
+        Version:     "1.0.0",
+        CustomTags: map[string]string{
+            "team": "platform",
+        },
+    }, logger, modelCatalog, "1.0.0")
+    if err != nil {
+        panic(err)
+    }
+    
+    // Initialize Bifrost with the plugin
+    client, err := bifrost.Init(ctx, schemas.BifrostConfig{
+        Account: &yourAccount,
+        Plugins: []schemas.Plugin{ddPlugin},
+    })
+    if err != nil {
+        panic(err)
+    }
+    defer client.Shutdown()
+    
+    // All requests are now traced to Datadog
+}
+```
+
+For agentless mode:
+
+```go
+// Agentless mode configuration
+enableAgentless := true
+ddPlugin, err := datadog.Init(ctx, &datadog.Config{
+    ServiceName: "my-llm-service",
+    Env:         "production",
+    Agentless:   &enableAgentless,
+    APIKey:      &schemas.EnvVar{EnvVarName: "DD_API_KEY"},
+    Site:        "datadoghq.com",
+}, logger, modelCatalog, "1.0.0")
+```
+
+</Tab>
+<Tab title="config.json">
+
+### Agent Mode (Minimal)
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "datadog",
+      "config": {
+        "service_name": "bifrost",
+        "env": "production"
+      }
+    }
+  ]
+}
+```
+
+### Agent Mode (Full Configuration)
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "datadog",
+      "config": {
+        "service_name": "my-llm-gateway",
+        "ml_app": "my-ml-application",
+        "agent_addr": "localhost:8126",
+        "dogstatsd_addr": "localhost:8125",
+        "env": "production",
+        "version": "1.2.3",
+        "custom_tags": {
+          "team": "platform",
+          "cost_center": "env.COST_CENTER"
+        },
+        "enable_metrics": true,
+        "enable_traces": true,
+        "enable_llm_obs": true
+      }
+    }
+  ]
+}
+```
+
+### Agentless Mode
+
+```json
+{
+  "plugins": [
+    {
+      "enabled": true,
+      "name": "datadog",
+      "config": {
+        "service_name": "my-llm-gateway",
+        "env": "production",
+        "agentless": true,
+        "api_key": "env.DD_API_KEY",
+        "site": "datadoghq.com"
+      }
+    }
+  ]
+}
+```
+
+Set the environment variable:
+
+```bash
+export DD_API_KEY="your-datadog-api-key"
+```
+
+</Tab>
+</Tabs>
+
+---
+
+## Datadog Sites
+
+The plugin supports all Datadog regional sites. Set the `site` field to match your Datadog account region:
+
+| Site | Region | Value |
+|------|--------|-------|
+| US1 (default) | United States | `datadoghq.com` |
+| US3 | United States | `us3.datadoghq.com` |
+| US5 | United States | `us5.datadoghq.com` |
+| EU1 | Europe | `datadoghq.eu` |
+| AP1 | Asia Pacific (Japan) | `ap1.datadoghq.com` |
+| AP2 | Asia Pacific (Australia) | `ap2.datadoghq.com` |
+| US1-FED | US Government | `ddog-gov.com` |
+
+<Note>
+Ensure your API key corresponds to the selected site. API keys from one region will not work with another.
+</Note>
+
+---
+
+## LLM Observability
+
+<Frame>
+    <img src="/media/dd-llmobs.png" alt="Datadog LLM Observability dashboard" />
+</Frame>
+
+The Datadog plugin integrates with [Datadog LLM Observability](https://docs.datadoghq.com/llm_observability/) to provide AI/ML-specific monitoring capabilities.
+
+### ML App Grouping
+
+LLM traces are grouped under an **ML App** in Datadog. By default, this uses your `service_name`, but you can specify a dedicated ML App name:
+
+```json
+{
+  "service_name": "bifrost-gateway",
+  "ml_app": "customer-support-ai"
+}
+```
+
+This allows you to:
+- Group related LLM operations across multiple services
+- Track costs and performance by application
+- Apply ML-specific alerts and dashboards
+
+### Session Tracking
+
+The plugin supports session tracking via the `x-bf-session-id` header. Include this header in your requests to group related LLM calls into a conversation session:
+
+```bash
+curl -X POST https://your-bifrost-gateway/v1/chat/completions \
+  -H "Authorization: Bearer $API_KEY" \
+  -H "x-bf-session-id: user-123-session-456" \
+  -d '{...}'
+```
+
+Sessions appear in Datadog LLM Observability, allowing you to trace entire conversation flows.
+
+### W3C Distributed Tracing
+
+The plugin supports [W3C Trace Context](https://www.w3.org/TR/trace-context/) for distributed tracing across services. When your upstream service sends a `traceparent` header, Bifrost automatically links its spans as children of the parent trace.
+
+```bash
+curl -X POST https://your-bifrost-gateway/v1/chat/completions \
+  -H "Authorization: Bearer $API_KEY" \
+  -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
+  -d '{...}'
+```
+
+This enables:
+- **End-to-end visibility** - See LLM calls in the context of your full application trace
+- **Cross-service correlation** - Link frontend requests → backend services → Bifrost → LLM providers
+- **Latency attribution** - Understand how LLM latency contributes to overall request time
+
+The `traceparent` header format follows the W3C standard:
+```
+traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
+```
+
+All Datadog APM spans created by Bifrost will be linked to the parent span, appearing as children in the Datadog trace view.
+
+### What's Captured
+
+For each LLM operation, the plugin sends to LLM Observability:
+
+- **Input/Output Messages** - Full conversation history with role attribution
+- **Token Usage** - Input, output, and total token counts
+- **Cost** - Calculated cost in USD based on model pricing
+- **Latency** - Request duration and time-to-first-token for streaming
+- **Model Info** - Provider, model name, and request parameters
+- **Tool Calls** - Function/tool call details for agentic workflows
+
+---
+
+## Metrics Reference
+
+The plugin emits the following metrics to Datadog:
+
+| Metric | Type | Description | Tags |
+|--------|------|-------------|------|
+| `bifrost.requests.total` | Counter | Total LLM requests | provider, model, request_type |
+| `bifrost.success.total` | Counter | Successful requests | provider, model, request_type |
+| `bifrost.errors.total` | Counter | Failed requests | provider, model, request_type, reason |
+| `bifrost.latency.seconds` | Histogram | Request latency distribution | provider, model, request_type |
+| `bifrost.tokens.input` | Counter | Input/prompt tokens consumed | provider, model |
+| `bifrost.tokens.output` | Counter | Output/completion tokens generated | provider, model |
+| `bifrost.tokens.total` | Counter | Total tokens (input + output) | provider, model |
+| `bifrost.cost.usd` | Gauge | Request cost in USD | provider, model |
+| `bifrost.cache.hits` | Counter | Cache hits | provider, model, cache_type |
+| `bifrost.stream.first_token_latency` | Histogram | Time to first token (streaming) | provider, model |
+| `bifrost.stream.inter_token_latency` | Histogram | Inter-token latency (streaming) | provider, model |
+
+### Custom Tags
+
+All metrics include your configured `custom_tags` plus automatic tags for:
+- `provider` - LLM provider (openai, anthropic, etc.)
+- `model` - Model name
+- `request_type` - Type of request (chat, embedding, etc.)
+- `env` - Environment from configuration
+
+---
+
+## Captured Data
+
+Each APM trace includes comprehensive LLM operation metadata:
+
+### Span Attributes
+
+- **Span Name** - Based on request type (`genai.chat`, `genai.embedding`, etc.)
+- **Service Info** - `service.name`, `service.version`, `env`
+- **Provider & Model** - `gen_ai.provider.name`, `gen_ai.request.model`
+
+### Request Parameters
+
+- Temperature, max_tokens, top_p, stop sequences
+- Presence/frequency penalties
+- Tool configurations and parallel tool calls
+- Custom parameters via `ExtraParams`
+
+### Input/Output Data
+
+- Complete chat history with role-based messages
+- Prompt text for completions
+- Response content with role attribution
+- Tool calls and results
+- Reasoning and refusal content (when present)
+
+### Performance Metrics
+
+- Token usage (prompt, completion, total)
+- Cost calculations in USD
+- Latency and timing (start/end timestamps)
+- Time to first token (streaming)
+- Error details with status codes
+
+### Bifrost Context
+
+- Virtual key ID and name
+- Selected key ID and name
+- Team ID and name
+- Customer ID and name
+- Retry count and fallback index
+
+---
+
+## Supported Request Types
+
+The Datadog plugin captures all Bifrost request types:
+
+| Request Type | Span Name | LLM Obs Type |
+|--------------|-----------|--------------|
+| Chat Completion | `genai.chat` | LLM Span |
+| Chat Completion (streaming) | `genai.chat` | LLM Span |
+| Text Completion | `genai.text` | LLM Span |
+| Text Completion (streaming) | `genai.text` | LLM Span |
+| Embeddings | `genai.embedding` | Embedding Span |
+| Speech Generation | `genai.speech` | Task Span |
+| Speech Generation (streaming) | `genai.speech` | Task Span |
+| Transcription | `genai.transcription` | Task Span |
+| Transcription (streaming) | `genai.transcription` | Task Span |
+| Responses API | `genai.responses` | LLM Span |
+| Responses API (streaming) | `genai.responses` | LLM Span |
+
+---
+
+## When to Use
+
+### Datadog Plugin
+
+Choose the Datadog plugin when you:
+
+- Use Datadog as your primary observability platform
+- Want native LLM Observability integration with ML App grouping
+- Need seamless correlation with existing Datadog APM traces via W3C distributed tracing
+- Require Datadog-specific features like notebooks and dashboards
+- Want session tracking for conversation flows
+
+### vs. OTel Plugin
+
+Use the [OTel plugin](/features/observability/otel) when you:
+
+- Need multi-vendor observability (send to multiple backends)
+- Are using Datadog via an OpenTelemetry Collector
+- Want vendor flexibility to switch backends without code changes
+- Prefer standardized OpenTelemetry semantic conventions
+
+<Note>
+You can use both plugins simultaneously if needed. The Datadog plugin provides native integration while OTel can send to additional backends.
+</Note>
+
+### vs. Built-in Observability
+
+Use [Built-in Observability](/features/observability/default) for:
+
+- Local development and testing
+- Simple self-hosted deployments
+- No external dependencies required
+- Direct database access to logs
+
+---
+
+## Troubleshooting
+
+### Agent Connectivity Issues
+
+Verify the Datadog Agent is running and accessible:
+
+```bash
+# Check agent status
+datadog-agent status
+
+# Test APM endpoint
+curl -v http://localhost:8126/info
+
+# Test DogStatsD (should accept UDP packets)
+echo "test.metric:1|c" | nc -u -w1 localhost 8125
+```
+
+### Agentless Mode Not Working
+
+1. Verify your API key is valid:
+```bash
+curl -X GET "https://api.datadoghq.com/api/v1/validate" \
+  -H "DD-API-KEY: $DD_API_KEY"
+```
+
+2. Ensure the `site` matches your API key's region
+
+3. Check that the API key environment variable is set:
+```bash
+echo $DD_API_KEY
+```
+
+### Missing Traces
+
+1. Enable debug logging in Bifrost:
+```bash
+bifrost-http --log-level debug
+```
+
+2. Verify traces are enabled in your configuration:
+```json
+{
+  "enable_traces": true,
+  "enable_llm_obs": true
+}
+```
+
+3. Check for errors in the Bifrost logs related to the Datadog plugin
+
+### Missing Metrics
+
+1. Verify DogStatsD is running (agent mode):
+```bash
+datadog-agent status | grep DogStatsD
+```
+
+2. Ensure metrics are enabled:
+```json
+{
+  "enable_metrics": true
+}
+```
+
+3. For agentless mode, verify your API key has metrics submission permissions
+
+### LLM Observability Not Appearing
+
+1. LLM Observability requires `enable_llm_obs: true` (default)
+2. Verify your Datadog plan includes LLM Observability
+3. Check the ML App name in Datadog under **LLM Observability** → **Applications**
+
+---
+
+## Next Steps
+
+- **[OTel Plugin](/features/observability/otel)** - OpenTelemetry integration for multi-vendor observability
+- **[Built-in Observability](/features/observability/default)** - Local logging for development
+- **[Telemetry](/features/telemetry)** - Prometheus metrics and dashboards