first commit
This commit is contained in:
548
docs/enterprise/datadog-connector.mdx
Normal file
548
docs/enterprise/datadog-connector.mdx
Normal file
@@ -0,0 +1,548 @@
|
||||
---
|
||||
title: "Datadog"
|
||||
description: "Native Datadog integration for APM traces, LLM Observability, and metrics"
|
||||
icon: "dog"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
<Frame>
|
||||
<img src="/media/dd-trace.png" alt="Datadog LLM Observability dashboard" />
|
||||
</Frame>
|
||||
|
||||
The **Datadog plugin** provides native integration with the Datadog observability platform, offering three pillars of observability for your LLM operations:
|
||||
|
||||
- **APM Traces** - Distributed tracing via dd-trace-go v2 with W3C Trace Context support for end-to-end request visibility
|
||||
- **LLM Observability** - Native Datadog LLM Obs integration for AI/ML-specific monitoring
|
||||
- **Metrics** - Operational metrics via DogStatsD or the Metrics API
|
||||
|
||||
Unlike the [OTel plugin](/features/observability/otel) which sends generic OpenTelemetry data, the Datadog plugin leverages Datadog's native SDKs for richer integration with Datadog-specific features like LLM Observability dashboards and ML App grouping.
|
||||
|
||||
---
|
||||
|
||||
## Deployment Modes
|
||||
|
||||
<Frame>
|
||||
<img src="/media/dd-mode.png" alt="Datadog LLM Observability dashboard" />
|
||||
</Frame>
|
||||
|
||||
The plugin supports two deployment modes:
|
||||
|
||||
| Mode | Description | Requirements | Best For |
|
||||
|------|-------------|--------------|----------|
|
||||
| **Agent** (default) | Sends data through a local Datadog Agent | Datadog Agent running on host | Production deployments with existing agent infrastructure |
|
||||
| **Agentless** | Sends data directly to Datadog APIs | API key only | Serverless, containers, or simplified deployments |
|
||||
|
||||
### Agent Mode
|
||||
|
||||
In agent mode, the plugin communicates with a locally running Datadog Agent:
|
||||
|
||||
- **APM Traces** → Agent at `localhost:8126`
|
||||
- **Metrics** → DogStatsD at `localhost:8125`
|
||||
|
||||
The agent handles batching, retries, and provides lower latency. This is the recommended mode for production deployments where you already have the Datadog Agent installed.
|
||||
|
||||
### Agentless Mode
|
||||
|
||||
In agentless mode, the plugin sends data directly to Datadog's intake APIs:
|
||||
|
||||
- **APM Traces** → `https://trace.agent.{site}`
|
||||
- **LLM Observability** → Direct API submission
|
||||
- **Metrics** → Datadog Metrics API
|
||||
|
||||
This mode requires an API key but simplifies deployment by eliminating the need for a local agent. Ideal for serverless environments, Kubernetes pods, or quick testing.
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Required Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `service_name` | `string` | No | `bifrost` | Service name displayed in Datadog APM |
|
||||
| `ml_app` | `string` | No | (uses `service_name`) | ML application name for LLM Observability grouping |
|
||||
| `agent_addr` | `string` | No | `localhost:8126` | Datadog Agent address (agent mode only) |
|
||||
| `dogstatsd_addr` | `string` | No | `localhost:8125` | DogStatsD server address (agent mode only) |
|
||||
| `env` | `string` | No | - | Environment tag (e.g., `production`, `staging`) |
|
||||
| `version` | `string` | No | - | Service version tag |
|
||||
| `custom_tags` | `object` | No | - | Additional tags for all traces and metrics |
|
||||
| `enable_metrics` | `bool` | No | `true` | Enable metrics emission |
|
||||
| `enable_traces` | `bool` | No | `true` | Enable APM traces |
|
||||
| `enable_llm_obs` | `bool` | No | `true` | Enable LLM Observability |
|
||||
| `agentless` | `bool` | No | `false` | Use agentless mode (direct API) |
|
||||
| `api_key` | `EnvVar` | Agentless only | - | Datadog API key (supports `env.VAR_NAME`) |
|
||||
| `site` | `string` | No | `datadoghq.com` | Datadog site/region |
|
||||
|
||||
### Environment Variable Substitution
|
||||
|
||||
The `api_key` and `custom_tags` fields support environment variable substitution using the `env.` prefix:
|
||||
|
||||
```json
|
||||
{
|
||||
"api_key": "env.DD_API_KEY",
|
||||
"custom_tags": {
|
||||
"team": "env.TEAM_NAME",
|
||||
"cost_center": "env.COST_CENTER"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
<Tabs group="setup-method">
|
||||
<Tab title="UI">
|
||||
|
||||
<Frame>
|
||||
<img src="/media/dd-config-page.png" alt="Datadog LLM Observability dashboard" />
|
||||
</Frame>
|
||||
|
||||
Configure the Datadog plugin through the Bifrost UI:
|
||||
|
||||
1. Navigate to **Settings** → **Plugins**
|
||||
2. Enable the **Datadog** plugin
|
||||
3. Configure the required fields based on your deployment mode
|
||||
|
||||
{/* Screenshot placeholder - user will add */}
|
||||
|
||||
</Tab>
|
||||
<Tab title="Go SDK">
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
bifrost "github.com/maximhq/bifrost/core"
|
||||
"github.com/maximhq/bifrost/core/schemas"
|
||||
"github.com/maximhq/bifrost/framework/modelcatalog"
|
||||
datadog "github.com/maximhq/bifrost-enterprise/plugins/datadog"
|
||||
)
|
||||
|
||||
func main() {
|
||||
ctx := context.Background()
|
||||
logger := schemas.NewLogger()
|
||||
|
||||
// Initialize model catalog (required for cost calculation)
|
||||
modelCatalog := modelcatalog.NewModelCatalog(logger)
|
||||
|
||||
// Agent mode configuration
|
||||
ddPlugin, err := datadog.Init(ctx, &datadog.Config{
|
||||
ServiceName: "my-llm-service",
|
||||
Env: "production",
|
||||
Version: "1.0.0",
|
||||
CustomTags: map[string]string{
|
||||
"team": "platform",
|
||||
},
|
||||
}, logger, modelCatalog, "1.0.0")
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
|
||||
// Initialize Bifrost with the plugin
|
||||
client, err := bifrost.Init(ctx, schemas.BifrostConfig{
|
||||
Account: &yourAccount,
|
||||
Plugins: []schemas.Plugin{ddPlugin},
|
||||
})
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
defer client.Shutdown()
|
||||
|
||||
// All requests are now traced to Datadog
|
||||
}
|
||||
```
|
||||
|
||||
For agentless mode:
|
||||
|
||||
```go
|
||||
// Agentless mode configuration
|
||||
enableAgentless := true
|
||||
ddPlugin, err := datadog.Init(ctx, &datadog.Config{
|
||||
ServiceName: "my-llm-service",
|
||||
Env: "production",
|
||||
Agentless: &enableAgentless,
|
||||
APIKey: &schemas.EnvVar{EnvVarName: "DD_API_KEY"},
|
||||
Site: "datadoghq.com",
|
||||
}, logger, modelCatalog, "1.0.0")
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="config.json">
|
||||
|
||||
### Agent Mode (Minimal)
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"enabled": true,
|
||||
"name": "datadog",
|
||||
"config": {
|
||||
"service_name": "bifrost",
|
||||
"env": "production"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Agent Mode (Full Configuration)
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"enabled": true,
|
||||
"name": "datadog",
|
||||
"config": {
|
||||
"service_name": "my-llm-gateway",
|
||||
"ml_app": "my-ml-application",
|
||||
"agent_addr": "localhost:8126",
|
||||
"dogstatsd_addr": "localhost:8125",
|
||||
"env": "production",
|
||||
"version": "1.2.3",
|
||||
"custom_tags": {
|
||||
"team": "platform",
|
||||
"cost_center": "env.COST_CENTER"
|
||||
},
|
||||
"enable_metrics": true,
|
||||
"enable_traces": true,
|
||||
"enable_llm_obs": true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Agentless Mode
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": [
|
||||
{
|
||||
"enabled": true,
|
||||
"name": "datadog",
|
||||
"config": {
|
||||
"service_name": "my-llm-gateway",
|
||||
"env": "production",
|
||||
"agentless": true,
|
||||
"api_key": "env.DD_API_KEY",
|
||||
"site": "datadoghq.com"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Set the environment variable:
|
||||
|
||||
```bash
|
||||
export DD_API_KEY="your-datadog-api-key"
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Datadog Sites
|
||||
|
||||
The plugin supports all Datadog regional sites. Set the `site` field to match your Datadog account region:
|
||||
|
||||
| Site | Region | Value |
|
||||
|------|--------|-------|
|
||||
| US1 (default) | United States | `datadoghq.com` |
|
||||
| US3 | United States | `us3.datadoghq.com` |
|
||||
| US5 | United States | `us5.datadoghq.com` |
|
||||
| EU1 | Europe | `datadoghq.eu` |
|
||||
| AP1 | Asia Pacific (Japan) | `ap1.datadoghq.com` |
|
||||
| AP2 | Asia Pacific (Australia) | `ap2.datadoghq.com` |
|
||||
| US1-FED | US Government | `ddog-gov.com` |
|
||||
|
||||
<Note>
|
||||
Ensure your API key corresponds to the selected site. API keys from one region will not work with another.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## LLM Observability
|
||||
|
||||
<Frame>
|
||||
<img src="/media/dd-llmobs.png" alt="Datadog LLM Observability dashboard" />
|
||||
</Frame>
|
||||
|
||||
The Datadog plugin integrates with [Datadog LLM Observability](https://docs.datadoghq.com/llm_observability/) to provide AI/ML-specific monitoring capabilities.
|
||||
|
||||
### ML App Grouping
|
||||
|
||||
LLM traces are grouped under an **ML App** in Datadog. By default, this uses your `service_name`, but you can specify a dedicated ML App name:
|
||||
|
||||
```json
|
||||
{
|
||||
"service_name": "bifrost-gateway",
|
||||
"ml_app": "customer-support-ai"
|
||||
}
|
||||
```
|
||||
|
||||
This allows you to:
|
||||
- Group related LLM operations across multiple services
|
||||
- Track costs and performance by application
|
||||
- Apply ML-specific alerts and dashboards
|
||||
|
||||
### Session Tracking
|
||||
|
||||
The plugin supports session tracking via the `x-bf-session-id` header. Include this header in your requests to group related LLM calls into a conversation session:
|
||||
|
||||
```bash
|
||||
curl -X POST https://your-bifrost-gateway/v1/chat/completions \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "x-bf-session-id: user-123-session-456" \
|
||||
-d '{...}'
|
||||
```
|
||||
|
||||
Sessions appear in Datadog LLM Observability, allowing you to trace entire conversation flows.
|
||||
|
||||
### W3C Distributed Tracing
|
||||
|
||||
The plugin supports [W3C Trace Context](https://www.w3.org/TR/trace-context/) for distributed tracing across services. When your upstream service sends a `traceparent` header, Bifrost automatically links its spans as children of the parent trace.
|
||||
|
||||
```bash
|
||||
curl -X POST https://your-bifrost-gateway/v1/chat/completions \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
|
||||
-d '{...}'
|
||||
```
|
||||
|
||||
This enables:
|
||||
- **End-to-end visibility** - See LLM calls in the context of your full application trace
|
||||
- **Cross-service correlation** - Link frontend requests → backend services → Bifrost → LLM providers
|
||||
- **Latency attribution** - Understand how LLM latency contributes to overall request time
|
||||
|
||||
The `traceparent` header format follows the W3C standard:
|
||||
```
|
||||
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
|
||||
```
|
||||
|
||||
All Datadog APM spans created by Bifrost will be linked to the parent span, appearing as children in the Datadog trace view.
|
||||
|
||||
### What's Captured
|
||||
|
||||
For each LLM operation, the plugin sends to LLM Observability:
|
||||
|
||||
- **Input/Output Messages** - Full conversation history with role attribution
|
||||
- **Token Usage** - Input, output, and total token counts
|
||||
- **Cost** - Calculated cost in USD based on model pricing
|
||||
- **Latency** - Request duration and time-to-first-token for streaming
|
||||
- **Model Info** - Provider, model name, and request parameters
|
||||
- **Tool Calls** - Function/tool call details for agentic workflows
|
||||
|
||||
---
|
||||
|
||||
## Metrics Reference
|
||||
|
||||
The plugin emits the following metrics to Datadog:
|
||||
|
||||
| Metric | Type | Description | Tags |
|
||||
|--------|------|-------------|------|
|
||||
| `bifrost.requests.total` | Counter | Total LLM requests | provider, model, request_type |
|
||||
| `bifrost.success.total` | Counter | Successful requests | provider, model, request_type |
|
||||
| `bifrost.errors.total` | Counter | Failed requests | provider, model, request_type, reason |
|
||||
| `bifrost.latency.seconds` | Histogram | Request latency distribution | provider, model, request_type |
|
||||
| `bifrost.tokens.input` | Counter | Input/prompt tokens consumed | provider, model |
|
||||
| `bifrost.tokens.output` | Counter | Output/completion tokens generated | provider, model |
|
||||
| `bifrost.tokens.total` | Counter | Total tokens (input + output) | provider, model |
|
||||
| `bifrost.cost.usd` | Gauge | Request cost in USD | provider, model |
|
||||
| `bifrost.cache.hits` | Counter | Cache hits | provider, model, cache_type |
|
||||
| `bifrost.stream.first_token_latency` | Histogram | Time to first token (streaming) | provider, model |
|
||||
| `bifrost.stream.inter_token_latency` | Histogram | Inter-token latency (streaming) | provider, model |
|
||||
|
||||
### Custom Tags
|
||||
|
||||
All metrics include your configured `custom_tags` plus automatic tags for:
|
||||
- `provider` - LLM provider (openai, anthropic, etc.)
|
||||
- `model` - Model name
|
||||
- `request_type` - Type of request (chat, embedding, etc.)
|
||||
- `env` - Environment from configuration
|
||||
|
||||
---
|
||||
|
||||
## Captured Data
|
||||
|
||||
Each APM trace includes comprehensive LLM operation metadata:
|
||||
|
||||
### Span Attributes
|
||||
|
||||
- **Span Name** - Based on request type (`genai.chat`, `genai.embedding`, etc.)
|
||||
- **Service Info** - `service.name`, `service.version`, `env`
|
||||
- **Provider & Model** - `gen_ai.provider.name`, `gen_ai.request.model`
|
||||
|
||||
### Request Parameters
|
||||
|
||||
- Temperature, max_tokens, top_p, stop sequences
|
||||
- Presence/frequency penalties
|
||||
- Tool configurations and parallel tool calls
|
||||
- Custom parameters via `ExtraParams`
|
||||
|
||||
### Input/Output Data
|
||||
|
||||
- Complete chat history with role-based messages
|
||||
- Prompt text for completions
|
||||
- Response content with role attribution
|
||||
- Tool calls and results
|
||||
- Reasoning and refusal content (when present)
|
||||
|
||||
### Performance Metrics
|
||||
|
||||
- Token usage (prompt, completion, total)
|
||||
- Cost calculations in USD
|
||||
- Latency and timing (start/end timestamps)
|
||||
- Time to first token (streaming)
|
||||
- Error details with status codes
|
||||
|
||||
### Bifrost Context
|
||||
|
||||
- Virtual key ID and name
|
||||
- Selected key ID and name
|
||||
- Team ID and name
|
||||
- Customer ID and name
|
||||
- Retry count and fallback index
|
||||
|
||||
---
|
||||
|
||||
## Supported Request Types
|
||||
|
||||
The Datadog plugin captures all Bifrost request types:
|
||||
|
||||
| Request Type | Span Name | LLM Obs Type |
|
||||
|--------------|-----------|--------------|
|
||||
| Chat Completion | `genai.chat` | LLM Span |
|
||||
| Chat Completion (streaming) | `genai.chat` | LLM Span |
|
||||
| Text Completion | `genai.text` | LLM Span |
|
||||
| Text Completion (streaming) | `genai.text` | LLM Span |
|
||||
| Embeddings | `genai.embedding` | Embedding Span |
|
||||
| Speech Generation | `genai.speech` | Task Span |
|
||||
| Speech Generation (streaming) | `genai.speech` | Task Span |
|
||||
| Transcription | `genai.transcription` | Task Span |
|
||||
| Transcription (streaming) | `genai.transcription` | Task Span |
|
||||
| Responses API | `genai.responses` | LLM Span |
|
||||
| Responses API (streaming) | `genai.responses` | LLM Span |
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
### Datadog Plugin
|
||||
|
||||
Choose the Datadog plugin when you:
|
||||
|
||||
- Use Datadog as your primary observability platform
|
||||
- Want native LLM Observability integration with ML App grouping
|
||||
- Need seamless correlation with existing Datadog APM traces via W3C distributed tracing
|
||||
- Require Datadog-specific features like notebooks and dashboards
|
||||
- Want session tracking for conversation flows
|
||||
|
||||
### vs. OTel Plugin
|
||||
|
||||
Use the [OTel plugin](/features/observability/otel) when you:
|
||||
|
||||
- Need multi-vendor observability (send to multiple backends)
|
||||
- Are using Datadog via an OpenTelemetry Collector
|
||||
- Want vendor flexibility to switch backends without code changes
|
||||
- Prefer standardized OpenTelemetry semantic conventions
|
||||
|
||||
<Note>
|
||||
You can use both plugins simultaneously if needed. The Datadog plugin provides native integration while OTel can send to additional backends.
|
||||
</Note>
|
||||
|
||||
### vs. Built-in Observability
|
||||
|
||||
Use [Built-in Observability](/features/observability/default) for:
|
||||
|
||||
- Local development and testing
|
||||
- Simple self-hosted deployments
|
||||
- No external dependencies required
|
||||
- Direct database access to logs
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Agent Connectivity Issues
|
||||
|
||||
Verify the Datadog Agent is running and accessible:
|
||||
|
||||
```bash
|
||||
# Check agent status
|
||||
datadog-agent status
|
||||
|
||||
# Test APM endpoint
|
||||
curl -v http://localhost:8126/info
|
||||
|
||||
# Test DogStatsD (should accept UDP packets)
|
||||
echo "test.metric:1|c" | nc -u -w1 localhost 8125
|
||||
```
|
||||
|
||||
### Agentless Mode Not Working
|
||||
|
||||
1. Verify your API key is valid:
|
||||
```bash
|
||||
curl -X GET "https://api.datadoghq.com/api/v1/validate" \
|
||||
-H "DD-API-KEY: $DD_API_KEY"
|
||||
```
|
||||
|
||||
2. Ensure the `site` matches your API key's region
|
||||
|
||||
3. Check that the API key environment variable is set:
|
||||
```bash
|
||||
echo $DD_API_KEY
|
||||
```
|
||||
|
||||
### Missing Traces
|
||||
|
||||
1. Enable debug logging in Bifrost:
|
||||
```bash
|
||||
bifrost-http --log-level debug
|
||||
```
|
||||
|
||||
2. Verify traces are enabled in your configuration:
|
||||
```json
|
||||
{
|
||||
"enable_traces": true,
|
||||
"enable_llm_obs": true
|
||||
}
|
||||
```
|
||||
|
||||
3. Check for errors in the Bifrost logs related to the Datadog plugin
|
||||
|
||||
### Missing Metrics
|
||||
|
||||
1. Verify DogStatsD is running (agent mode):
|
||||
```bash
|
||||
datadog-agent status | grep DogStatsD
|
||||
```
|
||||
|
||||
2. Ensure metrics are enabled:
|
||||
```json
|
||||
{
|
||||
"enable_metrics": true
|
||||
}
|
||||
```
|
||||
|
||||
3. For agentless mode, verify your API key has metrics submission permissions
|
||||
|
||||
### LLM Observability Not Appearing
|
||||
|
||||
1. LLM Observability requires `enable_llm_obs: true` (default)
|
||||
2. Verify your Datadog plan includes LLM Observability
|
||||
3. Check the ML App name in Datadog under **LLM Observability** → **Applications**
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[OTel Plugin](/features/observability/otel)** - OpenTelemetry integration for multi-vendor observability
|
||||
- **[Built-in Observability](/features/observability/default)** - Local logging for development
|
||||
- **[Telemetry](/features/telemetry)** - Prometheus metrics and dashboards
|
||||
Reference in New Issue
Block a user