Files
bifrost/docs/enterprise/datadog-connector.mdx
Beyhan Oğur 880f412e2c first commit
2026-04-26 21:52:23 +03:00

549 lines
16 KiB
Plaintext

---
title: "Datadog"
description: "Native Datadog integration for APM traces, LLM Observability, and metrics"
icon: "dog"
---
## Overview
<Frame>
<img src="/media/dd-trace.png" alt="Datadog LLM Observability dashboard" />
</Frame>
The **Datadog plugin** provides native integration with the Datadog observability platform, offering three pillars of observability for your LLM operations:
- **APM Traces** - Distributed tracing via dd-trace-go v2 with W3C Trace Context support for end-to-end request visibility
- **LLM Observability** - Native Datadog LLM Obs integration for AI/ML-specific monitoring
- **Metrics** - Operational metrics via DogStatsD or the Metrics API
Unlike the [OTel plugin](/features/observability/otel) which sends generic OpenTelemetry data, the Datadog plugin leverages Datadog's native SDKs for richer integration with Datadog-specific features like LLM Observability dashboards and ML App grouping.
---
## Deployment Modes
<Frame>
<img src="/media/dd-mode.png" alt="Datadog LLM Observability dashboard" />
</Frame>
The plugin supports two deployment modes:
| Mode | Description | Requirements | Best For |
|------|-------------|--------------|----------|
| **Agent** (default) | Sends data through a local Datadog Agent | Datadog Agent running on host | Production deployments with existing agent infrastructure |
| **Agentless** | Sends data directly to Datadog APIs | API key only | Serverless, containers, or simplified deployments |
### Agent Mode
In agent mode, the plugin communicates with a locally running Datadog Agent:
- **APM Traces** → Agent at `localhost:8126`
- **Metrics** → DogStatsD at `localhost:8125`
The agent handles batching, retries, and provides lower latency. This is the recommended mode for production deployments where you already have the Datadog Agent installed.
### Agentless Mode
In agentless mode, the plugin sends data directly to Datadog's intake APIs:
- **APM Traces** → `https://trace.agent.{site}`
- **LLM Observability** → Direct API submission
- **Metrics** → Datadog Metrics API
This mode requires an API key but simplifies deployment by eliminating the need for a local agent. Ideal for serverless environments, Kubernetes pods, or quick testing.
---
## Configuration
### Required Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `service_name` | `string` | No | `bifrost` | Service name displayed in Datadog APM |
| `ml_app` | `string` | No | (uses `service_name`) | ML application name for LLM Observability grouping |
| `agent_addr` | `string` | No | `localhost:8126` | Datadog Agent address (agent mode only) |
| `dogstatsd_addr` | `string` | No | `localhost:8125` | DogStatsD server address (agent mode only) |
| `env` | `string` | No | - | Environment tag (e.g., `production`, `staging`) |
| `version` | `string` | No | - | Service version tag |
| `custom_tags` | `object` | No | - | Additional tags for all traces and metrics |
| `enable_metrics` | `bool` | No | `true` | Enable metrics emission |
| `enable_traces` | `bool` | No | `true` | Enable APM traces |
| `enable_llm_obs` | `bool` | No | `true` | Enable LLM Observability |
| `agentless` | `bool` | No | `false` | Use agentless mode (direct API) |
| `api_key` | `EnvVar` | Agentless only | - | Datadog API key (supports `env.VAR_NAME`) |
| `site` | `string` | No | `datadoghq.com` | Datadog site/region |
### Environment Variable Substitution
The `api_key` and `custom_tags` fields support environment variable substitution using the `env.` prefix:
```json
{
"api_key": "env.DD_API_KEY",
"custom_tags": {
"team": "env.TEAM_NAME",
"cost_center": "env.COST_CENTER"
}
}
```
---
## Setup
<Tabs group="setup-method">
<Tab title="UI">
<Frame>
<img src="/media/dd-config-page.png" alt="Datadog LLM Observability dashboard" />
</Frame>
Configure the Datadog plugin through the Bifrost UI:
1. Navigate to **Settings** → **Plugins**
2. Enable the **Datadog** plugin
3. Configure the required fields based on your deployment mode
{/* Screenshot placeholder - user will add */}
</Tab>
<Tab title="Go SDK">
```go
package main
import (
"context"
bifrost "github.com/maximhq/bifrost/core"
"github.com/maximhq/bifrost/core/schemas"
"github.com/maximhq/bifrost/framework/modelcatalog"
datadog "github.com/maximhq/bifrost-enterprise/plugins/datadog"
)
func main() {
ctx := context.Background()
logger := schemas.NewLogger()
// Initialize model catalog (required for cost calculation)
modelCatalog := modelcatalog.NewModelCatalog(logger)
// Agent mode configuration
ddPlugin, err := datadog.Init(ctx, &datadog.Config{
ServiceName: "my-llm-service",
Env: "production",
Version: "1.0.0",
CustomTags: map[string]string{
"team": "platform",
},
}, logger, modelCatalog, "1.0.0")
if err != nil {
panic(err)
}
// Initialize Bifrost with the plugin
client, err := bifrost.Init(ctx, schemas.BifrostConfig{
Account: &yourAccount,
Plugins: []schemas.Plugin{ddPlugin},
})
if err != nil {
panic(err)
}
defer client.Shutdown()
// All requests are now traced to Datadog
}
```
For agentless mode:
```go
// Agentless mode configuration
enableAgentless := true
ddPlugin, err := datadog.Init(ctx, &datadog.Config{
ServiceName: "my-llm-service",
Env: "production",
Agentless: &enableAgentless,
APIKey: &schemas.EnvVar{EnvVarName: "DD_API_KEY"},
Site: "datadoghq.com",
}, logger, modelCatalog, "1.0.0")
```
</Tab>
<Tab title="config.json">
### Agent Mode (Minimal)
```json
{
"plugins": [
{
"enabled": true,
"name": "datadog",
"config": {
"service_name": "bifrost",
"env": "production"
}
}
]
}
```
### Agent Mode (Full Configuration)
```json
{
"plugins": [
{
"enabled": true,
"name": "datadog",
"config": {
"service_name": "my-llm-gateway",
"ml_app": "my-ml-application",
"agent_addr": "localhost:8126",
"dogstatsd_addr": "localhost:8125",
"env": "production",
"version": "1.2.3",
"custom_tags": {
"team": "platform",
"cost_center": "env.COST_CENTER"
},
"enable_metrics": true,
"enable_traces": true,
"enable_llm_obs": true
}
}
]
}
```
### Agentless Mode
```json
{
"plugins": [
{
"enabled": true,
"name": "datadog",
"config": {
"service_name": "my-llm-gateway",
"env": "production",
"agentless": true,
"api_key": "env.DD_API_KEY",
"site": "datadoghq.com"
}
}
]
}
```
Set the environment variable:
```bash
export DD_API_KEY="your-datadog-api-key"
```
</Tab>
</Tabs>
---
## Datadog Sites
The plugin supports all Datadog regional sites. Set the `site` field to match your Datadog account region:
| Site | Region | Value |
|------|--------|-------|
| US1 (default) | United States | `datadoghq.com` |
| US3 | United States | `us3.datadoghq.com` |
| US5 | United States | `us5.datadoghq.com` |
| EU1 | Europe | `datadoghq.eu` |
| AP1 | Asia Pacific (Japan) | `ap1.datadoghq.com` |
| AP2 | Asia Pacific (Australia) | `ap2.datadoghq.com` |
| US1-FED | US Government | `ddog-gov.com` |
<Note>
Ensure your API key corresponds to the selected site. API keys from one region will not work with another.
</Note>
---
## LLM Observability
<Frame>
<img src="/media/dd-llmobs.png" alt="Datadog LLM Observability dashboard" />
</Frame>
The Datadog plugin integrates with [Datadog LLM Observability](https://docs.datadoghq.com/llm_observability/) to provide AI/ML-specific monitoring capabilities.
### ML App Grouping
LLM traces are grouped under an **ML App** in Datadog. By default, this uses your `service_name`, but you can specify a dedicated ML App name:
```json
{
"service_name": "bifrost-gateway",
"ml_app": "customer-support-ai"
}
```
This allows you to:
- Group related LLM operations across multiple services
- Track costs and performance by application
- Apply ML-specific alerts and dashboards
### Session Tracking
The plugin supports session tracking via the `x-bf-session-id` header. Include this header in your requests to group related LLM calls into a conversation session:
```bash
curl -X POST https://your-bifrost-gateway/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "x-bf-session-id: user-123-session-456" \
-d '{...}'
```
Sessions appear in Datadog LLM Observability, allowing you to trace entire conversation flows.
### W3C Distributed Tracing
The plugin supports [W3C Trace Context](https://www.w3.org/TR/trace-context/) for distributed tracing across services. When your upstream service sends a `traceparent` header, Bifrost automatically links its spans as children of the parent trace.
```bash
curl -X POST https://your-bifrost-gateway/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
-d '{...}'
```
This enables:
- **End-to-end visibility** - See LLM calls in the context of your full application trace
- **Cross-service correlation** - Link frontend requests → backend services → Bifrost → LLM providers
- **Latency attribution** - Understand how LLM latency contributes to overall request time
The `traceparent` header format follows the W3C standard:
```
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
```
All Datadog APM spans created by Bifrost will be linked to the parent span, appearing as children in the Datadog trace view.
### What's Captured
For each LLM operation, the plugin sends to LLM Observability:
- **Input/Output Messages** - Full conversation history with role attribution
- **Token Usage** - Input, output, and total token counts
- **Cost** - Calculated cost in USD based on model pricing
- **Latency** - Request duration and time-to-first-token for streaming
- **Model Info** - Provider, model name, and request parameters
- **Tool Calls** - Function/tool call details for agentic workflows
---
## Metrics Reference
The plugin emits the following metrics to Datadog:
| Metric | Type | Description | Tags |
|--------|------|-------------|------|
| `bifrost.requests.total` | Counter | Total LLM requests | provider, model, request_type |
| `bifrost.success.total` | Counter | Successful requests | provider, model, request_type |
| `bifrost.errors.total` | Counter | Failed requests | provider, model, request_type, reason |
| `bifrost.latency.seconds` | Histogram | Request latency distribution | provider, model, request_type |
| `bifrost.tokens.input` | Counter | Input/prompt tokens consumed | provider, model |
| `bifrost.tokens.output` | Counter | Output/completion tokens generated | provider, model |
| `bifrost.tokens.total` | Counter | Total tokens (input + output) | provider, model |
| `bifrost.cost.usd` | Gauge | Request cost in USD | provider, model |
| `bifrost.cache.hits` | Counter | Cache hits | provider, model, cache_type |
| `bifrost.stream.first_token_latency` | Histogram | Time to first token (streaming) | provider, model |
| `bifrost.stream.inter_token_latency` | Histogram | Inter-token latency (streaming) | provider, model |
### Custom Tags
All metrics include your configured `custom_tags` plus automatic tags for:
- `provider` - LLM provider (openai, anthropic, etc.)
- `model` - Model name
- `request_type` - Type of request (chat, embedding, etc.)
- `env` - Environment from configuration
---
## Captured Data
Each APM trace includes comprehensive LLM operation metadata:
### Span Attributes
- **Span Name** - Based on request type (`genai.chat`, `genai.embedding`, etc.)
- **Service Info** - `service.name`, `service.version`, `env`
- **Provider & Model** - `gen_ai.provider.name`, `gen_ai.request.model`
### Request Parameters
- Temperature, max_tokens, top_p, stop sequences
- Presence/frequency penalties
- Tool configurations and parallel tool calls
- Custom parameters via `ExtraParams`
### Input/Output Data
- Complete chat history with role-based messages
- Prompt text for completions
- Response content with role attribution
- Tool calls and results
- Reasoning and refusal content (when present)
### Performance Metrics
- Token usage (prompt, completion, total)
- Cost calculations in USD
- Latency and timing (start/end timestamps)
- Time to first token (streaming)
- Error details with status codes
### Bifrost Context
- Virtual key ID and name
- Selected key ID and name
- Team ID and name
- Customer ID and name
- Retry count and fallback index
---
## Supported Request Types
The Datadog plugin captures all Bifrost request types:
| Request Type | Span Name | LLM Obs Type |
|--------------|-----------|--------------|
| Chat Completion | `genai.chat` | LLM Span |
| Chat Completion (streaming) | `genai.chat` | LLM Span |
| Text Completion | `genai.text` | LLM Span |
| Text Completion (streaming) | `genai.text` | LLM Span |
| Embeddings | `genai.embedding` | Embedding Span |
| Speech Generation | `genai.speech` | Task Span |
| Speech Generation (streaming) | `genai.speech` | Task Span |
| Transcription | `genai.transcription` | Task Span |
| Transcription (streaming) | `genai.transcription` | Task Span |
| Responses API | `genai.responses` | LLM Span |
| Responses API (streaming) | `genai.responses` | LLM Span |
---
## When to Use
### Datadog Plugin
Choose the Datadog plugin when you:
- Use Datadog as your primary observability platform
- Want native LLM Observability integration with ML App grouping
- Need seamless correlation with existing Datadog APM traces via W3C distributed tracing
- Require Datadog-specific features like notebooks and dashboards
- Want session tracking for conversation flows
### vs. OTel Plugin
Use the [OTel plugin](/features/observability/otel) when you:
- Need multi-vendor observability (send to multiple backends)
- Are using Datadog via an OpenTelemetry Collector
- Want vendor flexibility to switch backends without code changes
- Prefer standardized OpenTelemetry semantic conventions
<Note>
You can use both plugins simultaneously if needed. The Datadog plugin provides native integration while OTel can send to additional backends.
</Note>
### vs. Built-in Observability
Use [Built-in Observability](/features/observability/default) for:
- Local development and testing
- Simple self-hosted deployments
- No external dependencies required
- Direct database access to logs
---
## Troubleshooting
### Agent Connectivity Issues
Verify the Datadog Agent is running and accessible:
```bash
# Check agent status
datadog-agent status
# Test APM endpoint
curl -v http://localhost:8126/info
# Test DogStatsD (should accept UDP packets)
echo "test.metric:1|c" | nc -u -w1 localhost 8125
```
### Agentless Mode Not Working
1. Verify your API key is valid:
```bash
curl -X GET "https://api.datadoghq.com/api/v1/validate" \
-H "DD-API-KEY: $DD_API_KEY"
```
2. Ensure the `site` matches your API key's region
3. Check that the API key environment variable is set:
```bash
echo $DD_API_KEY
```
### Missing Traces
1. Enable debug logging in Bifrost:
```bash
bifrost-http --log-level debug
```
2. Verify traces are enabled in your configuration:
```json
{
"enable_traces": true,
"enable_llm_obs": true
}
```
3. Check for errors in the Bifrost logs related to the Datadog plugin
### Missing Metrics
1. Verify DogStatsD is running (agent mode):
```bash
datadog-agent status | grep DogStatsD
```
2. Ensure metrics are enabled:
```json
{
"enable_metrics": true
}
```
3. For agentless mode, verify your API key has metrics submission permissions
### LLM Observability Not Appearing
1. LLM Observability requires `enable_llm_obs: true` (default)
2. Verify your Datadog plan includes LLM Observability
3. Check the ML App name in Datadog under **LLM Observability** → **Applications**
---
## Next Steps
- **[OTel Plugin](/features/observability/otel)** - OpenTelemetry integration for multi-vendor observability
- **[Built-in Observability](/features/observability/default)** - Local logging for development
- **[Telemetry](/features/telemetry)** - Prometheus metrics and dashboards