bifrost/docs/enterprise/datadog-connector.mdx

---
title: "Datadog"
description: "Native Datadog integration for APM traces, LLM Observability, and metrics"
icon: "dog"
---

## Overview

<Frame>
  <img src="/media/dd-trace.png" alt="Datadog LLM Observability dashboard" />
</Frame>

The **Datadog plugin** provides native integration with the Datadog observability platform, offering three pillars of observability for your LLM operations:

- **APM Traces** - Distributed tracing via dd-trace-go v2 with W3C Trace Context support for end-to-end request visibility
- **LLM Observability** - Native Datadog LLM Obs integration for AI/ML-specific monitoring
- **Metrics** - Operational metrics via DogStatsD or the Metrics API

Unlike the [OTel plugin](/features/observability/otel) which sends generic OpenTelemetry data, the Datadog plugin leverages Datadog's native SDKs for richer integration with Datadog-specific features like LLM Observability dashboards and ML App grouping.

---

## Deployment Modes

<Frame>
    <img src="/media/dd-mode.png" alt="Datadog LLM Observability dashboard" />
</Frame>

The plugin supports two deployment modes:

| Mode | Description | Requirements | Best For |
|------|-------------|--------------|----------|
| **Agent** (default) | Sends data through a local Datadog Agent | Datadog Agent running on host | Production deployments with existing agent infrastructure |
| **Agentless** | Sends data directly to Datadog APIs | API key only | Serverless, containers, or simplified deployments |

### Agent Mode

In agent mode, the plugin communicates with a locally running Datadog Agent:

- **APM Traces** → Agent at `localhost:8126`
- **Metrics** → DogStatsD at `localhost:8125`

The agent handles batching, retries, and provides lower latency. This is the recommended mode for production deployments where you already have the Datadog Agent installed.

### Agentless Mode

In agentless mode, the plugin sends data directly to Datadog's intake APIs:

- **APM Traces** → `https://trace.agent.{site}`
- **LLM Observability** → Direct API submission
- **Metrics** → Datadog Metrics API

This mode requires an API key but simplifies deployment by eliminating the need for a local agent. Ideal for serverless environments, Kubernetes pods, or quick testing.

---

## Configuration

### Required Fields

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `service_name` | `string` | No | `bifrost` | Service name displayed in Datadog APM |
| `ml_app` | `string` | No | (uses `service_name`) | ML application name for LLM Observability grouping |
| `agent_addr` | `string` | No | `localhost:8126` | Datadog Agent address (agent mode only) |
| `dogstatsd_addr` | `string` | No | `localhost:8125` | DogStatsD server address (agent mode only) |
| `env` | `string` | No | - | Environment tag (e.g., `production`, `staging`) |
| `version` | `string` | No | - | Service version tag |
| `custom_tags` | `object` | No | - | Additional tags for all traces and metrics |
| `enable_metrics` | `bool` | No | `true` | Enable metrics emission |
| `enable_traces` | `bool` | No | `true` | Enable APM traces |
| `enable_llm_obs` | `bool` | No | `true` | Enable LLM Observability |
| `agentless` | `bool` | No | `false` | Use agentless mode (direct API) |
| `api_key` | `EnvVar` | Agentless only | - | Datadog API key (supports `env.VAR_NAME`) |
| `site` | `string` | No | `datadoghq.com` | Datadog site/region |

### Environment Variable Substitution

The `api_key` and `custom_tags` fields support environment variable substitution using the `env.` prefix:

```json
{
  "api_key": "env.DD_API_KEY",
  "custom_tags": {
    "team": "env.TEAM_NAME",
    "cost_center": "env.COST_CENTER"
  }
}
```

---

## Setup

<Tabs group="setup-method">
<Tab title="UI">

<Frame>
    <img src="/media/dd-config-page.png" alt="Datadog LLM Observability dashboard" />
</Frame>

Configure the Datadog plugin through the Bifrost UI:

1. Navigate to **Settings** → **Plugins**
2. Enable the **Datadog** plugin
3. Configure the required fields based on your deployment mode

{/* Screenshot placeholder - user will add */}

</Tab>
<Tab title="Go SDK">

```go
package main

import (
    "context"
    bifrost "github.com/maximhq/bifrost/core"
    "github.com/maximhq/bifrost/core/schemas"
    "github.com/maximhq/bifrost/framework/modelcatalog"
    datadog "github.com/maximhq/bifrost-enterprise/plugins/datadog"
)

func main() {
    ctx := context.Background()
    logger := schemas.NewLogger()

    // Initialize model catalog (required for cost calculation)
    modelCatalog := modelcatalog.NewModelCatalog(logger)

    // Agent mode configuration
    ddPlugin, err := datadog.Init(ctx, &datadog.Config{
        ServiceName: "my-llm-service",
        Env:         "production",
        Version:     "1.0.0",
        CustomTags: map[string]string{
            "team": "platform",
        },
    }, logger, modelCatalog, "1.0.0")
    if err != nil {
        panic(err)
    }

    // Initialize Bifrost with the plugin
    client, err := bifrost.Init(ctx, schemas.BifrostConfig{
        Account: &yourAccount,
        Plugins: []schemas.Plugin{ddPlugin},
    })
    if err != nil {
        panic(err)
    }
    defer client.Shutdown()

    // All requests are now traced to Datadog
}
```

For agentless mode:

```go
// Agentless mode configuration
enableAgentless := true
ddPlugin, err := datadog.Init(ctx, &datadog.Config{
    ServiceName: "my-llm-service",
    Env:         "production",
    Agentless:   &enableAgentless,
    APIKey:      &schemas.EnvVar{EnvVarName: "DD_API_KEY"},
    Site:        "datadoghq.com",
}, logger, modelCatalog, "1.0.0")
```

</Tab>
<Tab title="config.json">

### Agent Mode (Minimal)

```json
{
  "plugins": [
    {
      "enabled": true,
      "name": "datadog",
      "config": {
        "service_name": "bifrost",
        "env": "production"
      }
    }
  ]
}
```

### Agent Mode (Full Configuration)

```json
{
  "plugins": [
    {
      "enabled": true,
      "name": "datadog",
      "config": {
        "service_name": "my-llm-gateway",
        "ml_app": "my-ml-application",
        "agent_addr": "localhost:8126",
        "dogstatsd_addr": "localhost:8125",
        "env": "production",
        "version": "1.2.3",
        "custom_tags": {
          "team": "platform",
          "cost_center": "env.COST_CENTER"
        },
        "enable_metrics": true,
        "enable_traces": true,
        "enable_llm_obs": true
      }
    }
  ]
}
```

### Agentless Mode

```json
{
  "plugins": [
    {
      "enabled": true,
      "name": "datadog",
      "config": {
        "service_name": "my-llm-gateway",
        "env": "production",
        "agentless": true,
        "api_key": "env.DD_API_KEY",
        "site": "datadoghq.com"
      }
    }
  ]
}
```

Set the environment variable:

```bash
export DD_API_KEY="your-datadog-api-key"
```

</Tab>
</Tabs>

---

## Datadog Sites

The plugin supports all Datadog regional sites. Set the `site` field to match your Datadog account region:

| Site | Region | Value |
|------|--------|-------|
| US1 (default) | United States | `datadoghq.com` |
| US3 | United States | `us3.datadoghq.com` |
| US5 | United States | `us5.datadoghq.com` |
| EU1 | Europe | `datadoghq.eu` |
| AP1 | Asia Pacific (Japan) | `ap1.datadoghq.com` |
| AP2 | Asia Pacific (Australia) | `ap2.datadoghq.com` |
| US1-FED | US Government | `ddog-gov.com` |

<Note>
Ensure your API key corresponds to the selected site. API keys from one region will not work with another.
</Note>

---

## LLM Observability

<Frame>
    <img src="/media/dd-llmobs.png" alt="Datadog LLM Observability dashboard" />
</Frame>

The Datadog plugin integrates with [Datadog LLM Observability](https://docs.datadoghq.com/llm_observability/) to provide AI/ML-specific monitoring capabilities.

### ML App Grouping

LLM traces are grouped under an **ML App** in Datadog. By default, this uses your `service_name`, but you can specify a dedicated ML App name:

```json
{
  "service_name": "bifrost-gateway",
  "ml_app": "customer-support-ai"
}
```

This allows you to:
- Group related LLM operations across multiple services
- Track costs and performance by application
- Apply ML-specific alerts and dashboards

### Session Tracking

The plugin supports session tracking via the `x-bf-session-id` header. Include this header in your requests to group related LLM calls into a conversation session:

```bash
curl -X POST https://your-bifrost-gateway/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "x-bf-session-id: user-123-session-456" \
  -d '{...}'
```

Sessions appear in Datadog LLM Observability, allowing you to trace entire conversation flows.

### W3C Distributed Tracing

The plugin supports [W3C Trace Context](https://www.w3.org/TR/trace-context/) for distributed tracing across services. When your upstream service sends a `traceparent` header, Bifrost automatically links its spans as children of the parent trace.

```bash
curl -X POST https://your-bifrost-gateway/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
  -d '{...}'
```

This enables:
- **End-to-end visibility** - See LLM calls in the context of your full application trace
- **Cross-service correlation** - Link frontend requests → backend services → Bifrost → LLM providers
- **Latency attribution** - Understand how LLM latency contributes to overall request time

The `traceparent` header format follows the W3C standard:
```
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
```

All Datadog APM spans created by Bifrost will be linked to the parent span, appearing as children in the Datadog trace view.

### What's Captured

For each LLM operation, the plugin sends to LLM Observability:

- **Input/Output Messages** - Full conversation history with role attribution
- **Token Usage** - Input, output, and total token counts
- **Cost** - Calculated cost in USD based on model pricing
- **Latency** - Request duration and time-to-first-token for streaming
- **Model Info** - Provider, model name, and request parameters
- **Tool Calls** - Function/tool call details for agentic workflows

---

## Metrics Reference

The plugin emits the following metrics to Datadog:

| Metric | Type | Description | Tags |
|--------|------|-------------|------|
| `bifrost.requests.total` | Counter | Total LLM requests | provider, model, request_type |
| `bifrost.success.total` | Counter | Successful requests | provider, model, request_type |
| `bifrost.errors.total` | Counter | Failed requests | provider, model, request_type, reason |
| `bifrost.latency.seconds` | Histogram | Request latency distribution | provider, model, request_type |
| `bifrost.tokens.input` | Counter | Input/prompt tokens consumed | provider, model |
| `bifrost.tokens.output` | Counter | Output/completion tokens generated | provider, model |
| `bifrost.tokens.total` | Counter | Total tokens (input + output) | provider, model |
| `bifrost.cost.usd` | Gauge | Request cost in USD | provider, model |
| `bifrost.cache.hits` | Counter | Cache hits | provider, model, cache_type |
| `bifrost.stream.first_token_latency` | Histogram | Time to first token (streaming) | provider, model |
| `bifrost.stream.inter_token_latency` | Histogram | Inter-token latency (streaming) | provider, model |

### Custom Tags

All metrics include your configured `custom_tags` plus automatic tags for:
- `provider` - LLM provider (openai, anthropic, etc.)
- `model` - Model name
- `request_type` - Type of request (chat, embedding, etc.)
- `env` - Environment from configuration

---

## Captured Data

Each APM trace includes comprehensive LLM operation metadata:

### Span Attributes

- **Span Name** - Based on request type (`genai.chat`, `genai.embedding`, etc.)
- **Service Info** - `service.name`, `service.version`, `env`
- **Provider & Model** - `gen_ai.provider.name`, `gen_ai.request.model`

### Request Parameters

- Temperature, max_tokens, top_p, stop sequences
- Presence/frequency penalties
- Tool configurations and parallel tool calls
- Custom parameters via `ExtraParams`

### Input/Output Data

- Complete chat history with role-based messages
- Prompt text for completions
- Response content with role attribution
- Tool calls and results
- Reasoning and refusal content (when present)

### Performance Metrics

- Token usage (prompt, completion, total)
- Cost calculations in USD
- Latency and timing (start/end timestamps)
- Time to first token (streaming)
- Error details with status codes

### Bifrost Context

- Virtual key ID and name
- Selected key ID and name
- Team ID and name
- Customer ID and name
- Retry count and fallback index

---

## Supported Request Types

The Datadog plugin captures all Bifrost request types:

| Request Type | Span Name | LLM Obs Type |
|--------------|-----------|--------------|
| Chat Completion | `genai.chat` | LLM Span |
| Chat Completion (streaming) | `genai.chat` | LLM Span |
| Text Completion | `genai.text` | LLM Span |
| Text Completion (streaming) | `genai.text` | LLM Span |
| Embeddings | `genai.embedding` | Embedding Span |
| Speech Generation | `genai.speech` | Task Span |
| Speech Generation (streaming) | `genai.speech` | Task Span |
| Transcription | `genai.transcription` | Task Span |
| Transcription (streaming) | `genai.transcription` | Task Span |
| Responses API | `genai.responses` | LLM Span |
| Responses API (streaming) | `genai.responses` | LLM Span |

---

## When to Use

### Datadog Plugin

Choose the Datadog plugin when you:

- Use Datadog as your primary observability platform
- Want native LLM Observability integration with ML App grouping
- Need seamless correlation with existing Datadog APM traces via W3C distributed tracing
- Require Datadog-specific features like notebooks and dashboards
- Want session tracking for conversation flows

### vs. OTel Plugin

Use the [OTel plugin](/features/observability/otel) when you:

- Need multi-vendor observability (send to multiple backends)
- Are using Datadog via an OpenTelemetry Collector
- Want vendor flexibility to switch backends without code changes
- Prefer standardized OpenTelemetry semantic conventions

<Note>
You can use both plugins simultaneously if needed. The Datadog plugin provides native integration while OTel can send to additional backends.
</Note>

### vs. Built-in Observability

Use [Built-in Observability](/features/observability/default) for:

- Local development and testing
- Simple self-hosted deployments
- No external dependencies required
- Direct database access to logs

---

## Troubleshooting

### Agent Connectivity Issues

Verify the Datadog Agent is running and accessible:

```bash
# Check agent status
datadog-agent status

# Test APM endpoint
curl -v http://localhost:8126/info

# Test DogStatsD (should accept UDP packets)
echo "test.metric:1|c" | nc -u -w1 localhost 8125
```

### Agentless Mode Not Working

1. Verify your API key is valid:
```bash
curl -X GET "https://api.datadoghq.com/api/v1/validate" \
  -H "DD-API-KEY: $DD_API_KEY"
```

2. Ensure the `site` matches your API key's region

3. Check that the API key environment variable is set:
```bash
echo $DD_API_KEY
```

### Missing Traces

1. Enable debug logging in Bifrost:
```bash
bifrost-http --log-level debug
```

2. Verify traces are enabled in your configuration:
```json
{
  "enable_traces": true,
  "enable_llm_obs": true
}
```

3. Check for errors in the Bifrost logs related to the Datadog plugin

### Missing Metrics

1. Verify DogStatsD is running (agent mode):
```bash
datadog-agent status | grep DogStatsD
```

2. Ensure metrics are enabled:
```json
{
  "enable_metrics": true
}
```

3. For agentless mode, verify your API key has metrics submission permissions

### LLM Observability Not Appearing

1. LLM Observability requires `enable_llm_obs: true` (default)
2. Verify your Datadog plan includes LLM Observability
3. Check the ML App name in Datadog under **LLM Observability** → **Applications**

---

## Next Steps

- **[OTel Plugin](/features/observability/otel)** - OpenTelemetry integration for multi-vendor observability
- **[Built-in Observability](/features/observability/default)** - Local logging for development
- **[Telemetry](/features/telemetry)** - Prometheus metrics and dashboards