--- title: "Datadog" description: "Native Datadog integration for APM traces, LLM Observability, and metrics" icon: "dog" --- ## Overview Datadog LLM Observability dashboard The **Datadog plugin** provides native integration with the Datadog observability platform, offering three pillars of observability for your LLM operations: - **APM Traces** - Distributed tracing via dd-trace-go v2 with W3C Trace Context support for end-to-end request visibility - **LLM Observability** - Native Datadog LLM Obs integration for AI/ML-specific monitoring - **Metrics** - Operational metrics via DogStatsD or the Metrics API Unlike the [OTel plugin](/features/observability/otel) which sends generic OpenTelemetry data, the Datadog plugin leverages Datadog's native SDKs for richer integration with Datadog-specific features like LLM Observability dashboards and ML App grouping. --- ## Deployment Modes Datadog LLM Observability dashboard The plugin supports two deployment modes: | Mode | Description | Requirements | Best For | |------|-------------|--------------|----------| | **Agent** (default) | Sends data through a local Datadog Agent | Datadog Agent running on host | Production deployments with existing agent infrastructure | | **Agentless** | Sends data directly to Datadog APIs | API key only | Serverless, containers, or simplified deployments | ### Agent Mode In agent mode, the plugin communicates with a locally running Datadog Agent: - **APM Traces** → Agent at `localhost:8126` - **Metrics** → DogStatsD at `localhost:8125` The agent handles batching, retries, and provides lower latency. This is the recommended mode for production deployments where you already have the Datadog Agent installed. ### Agentless Mode In agentless mode, the plugin sends data directly to Datadog's intake APIs: - **APM Traces** → `https://trace.agent.{site}` - **LLM Observability** → Direct API submission - **Metrics** → Datadog Metrics API This mode requires an API key but simplifies deployment by eliminating the need for a local agent. Ideal for serverless environments, Kubernetes pods, or quick testing. --- ## Configuration ### Required Fields | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `service_name` | `string` | No | `bifrost` | Service name displayed in Datadog APM | | `ml_app` | `string` | No | (uses `service_name`) | ML application name for LLM Observability grouping | | `agent_addr` | `string` | No | `localhost:8126` | Datadog Agent address (agent mode only) | | `dogstatsd_addr` | `string` | No | `localhost:8125` | DogStatsD server address (agent mode only) | | `env` | `string` | No | - | Environment tag (e.g., `production`, `staging`) | | `version` | `string` | No | - | Service version tag | | `custom_tags` | `object` | No | - | Additional tags for all traces and metrics | | `enable_metrics` | `bool` | No | `true` | Enable metrics emission | | `enable_traces` | `bool` | No | `true` | Enable APM traces | | `enable_llm_obs` | `bool` | No | `true` | Enable LLM Observability | | `agentless` | `bool` | No | `false` | Use agentless mode (direct API) | | `api_key` | `EnvVar` | Agentless only | - | Datadog API key (supports `env.VAR_NAME`) | | `site` | `string` | No | `datadoghq.com` | Datadog site/region | ### Environment Variable Substitution The `api_key` and `custom_tags` fields support environment variable substitution using the `env.` prefix: ```json { "api_key": "env.DD_API_KEY", "custom_tags": { "team": "env.TEAM_NAME", "cost_center": "env.COST_CENTER" } } ``` --- ## Setup Datadog LLM Observability dashboard Configure the Datadog plugin through the Bifrost UI: 1. Navigate to **Settings** → **Plugins** 2. Enable the **Datadog** plugin 3. Configure the required fields based on your deployment mode {/* Screenshot placeholder - user will add */} ```go package main import ( "context" bifrost "github.com/maximhq/bifrost/core" "github.com/maximhq/bifrost/core/schemas" "github.com/maximhq/bifrost/framework/modelcatalog" datadog "github.com/maximhq/bifrost-enterprise/plugins/datadog" ) func main() { ctx := context.Background() logger := schemas.NewLogger() // Initialize model catalog (required for cost calculation) modelCatalog := modelcatalog.NewModelCatalog(logger) // Agent mode configuration ddPlugin, err := datadog.Init(ctx, &datadog.Config{ ServiceName: "my-llm-service", Env: "production", Version: "1.0.0", CustomTags: map[string]string{ "team": "platform", }, }, logger, modelCatalog, "1.0.0") if err != nil { panic(err) } // Initialize Bifrost with the plugin client, err := bifrost.Init(ctx, schemas.BifrostConfig{ Account: &yourAccount, Plugins: []schemas.Plugin{ddPlugin}, }) if err != nil { panic(err) } defer client.Shutdown() // All requests are now traced to Datadog } ``` For agentless mode: ```go // Agentless mode configuration enableAgentless := true ddPlugin, err := datadog.Init(ctx, &datadog.Config{ ServiceName: "my-llm-service", Env: "production", Agentless: &enableAgentless, APIKey: &schemas.EnvVar{EnvVarName: "DD_API_KEY"}, Site: "datadoghq.com", }, logger, modelCatalog, "1.0.0") ``` ### Agent Mode (Minimal) ```json { "plugins": [ { "enabled": true, "name": "datadog", "config": { "service_name": "bifrost", "env": "production" } } ] } ``` ### Agent Mode (Full Configuration) ```json { "plugins": [ { "enabled": true, "name": "datadog", "config": { "service_name": "my-llm-gateway", "ml_app": "my-ml-application", "agent_addr": "localhost:8126", "dogstatsd_addr": "localhost:8125", "env": "production", "version": "1.2.3", "custom_tags": { "team": "platform", "cost_center": "env.COST_CENTER" }, "enable_metrics": true, "enable_traces": true, "enable_llm_obs": true } } ] } ``` ### Agentless Mode ```json { "plugins": [ { "enabled": true, "name": "datadog", "config": { "service_name": "my-llm-gateway", "env": "production", "agentless": true, "api_key": "env.DD_API_KEY", "site": "datadoghq.com" } } ] } ``` Set the environment variable: ```bash export DD_API_KEY="your-datadog-api-key" ``` --- ## Datadog Sites The plugin supports all Datadog regional sites. Set the `site` field to match your Datadog account region: | Site | Region | Value | |------|--------|-------| | US1 (default) | United States | `datadoghq.com` | | US3 | United States | `us3.datadoghq.com` | | US5 | United States | `us5.datadoghq.com` | | EU1 | Europe | `datadoghq.eu` | | AP1 | Asia Pacific (Japan) | `ap1.datadoghq.com` | | AP2 | Asia Pacific (Australia) | `ap2.datadoghq.com` | | US1-FED | US Government | `ddog-gov.com` | Ensure your API key corresponds to the selected site. API keys from one region will not work with another. --- ## LLM Observability Datadog LLM Observability dashboard The Datadog plugin integrates with [Datadog LLM Observability](https://docs.datadoghq.com/llm_observability/) to provide AI/ML-specific monitoring capabilities. ### ML App Grouping LLM traces are grouped under an **ML App** in Datadog. By default, this uses your `service_name`, but you can specify a dedicated ML App name: ```json { "service_name": "bifrost-gateway", "ml_app": "customer-support-ai" } ``` This allows you to: - Group related LLM operations across multiple services - Track costs and performance by application - Apply ML-specific alerts and dashboards ### Session Tracking The plugin supports session tracking via the `x-bf-session-id` header. Include this header in your requests to group related LLM calls into a conversation session: ```bash curl -X POST https://your-bifrost-gateway/v1/chat/completions \ -H "Authorization: Bearer $API_KEY" \ -H "x-bf-session-id: user-123-session-456" \ -d '{...}' ``` Sessions appear in Datadog LLM Observability, allowing you to trace entire conversation flows. ### W3C Distributed Tracing The plugin supports [W3C Trace Context](https://www.w3.org/TR/trace-context/) for distributed tracing across services. When your upstream service sends a `traceparent` header, Bifrost automatically links its spans as children of the parent trace. ```bash curl -X POST https://your-bifrost-gateway/v1/chat/completions \ -H "Authorization: Bearer $API_KEY" \ -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \ -d '{...}' ``` This enables: - **End-to-end visibility** - See LLM calls in the context of your full application trace - **Cross-service correlation** - Link frontend requests → backend services → Bifrost → LLM providers - **Latency attribution** - Understand how LLM latency contributes to overall request time The `traceparent` header format follows the W3C standard: ``` traceparent: {version}-{trace-id}-{parent-id}-{trace-flags} ``` All Datadog APM spans created by Bifrost will be linked to the parent span, appearing as children in the Datadog trace view. ### What's Captured For each LLM operation, the plugin sends to LLM Observability: - **Input/Output Messages** - Full conversation history with role attribution - **Token Usage** - Input, output, and total token counts - **Cost** - Calculated cost in USD based on model pricing - **Latency** - Request duration and time-to-first-token for streaming - **Model Info** - Provider, model name, and request parameters - **Tool Calls** - Function/tool call details for agentic workflows --- ## Metrics Reference The plugin emits the following metrics to Datadog: | Metric | Type | Description | Tags | |--------|------|-------------|------| | `bifrost.requests.total` | Counter | Total LLM requests | provider, model, request_type | | `bifrost.success.total` | Counter | Successful requests | provider, model, request_type | | `bifrost.errors.total` | Counter | Failed requests | provider, model, request_type, reason | | `bifrost.latency.seconds` | Histogram | Request latency distribution | provider, model, request_type | | `bifrost.tokens.input` | Counter | Input/prompt tokens consumed | provider, model | | `bifrost.tokens.output` | Counter | Output/completion tokens generated | provider, model | | `bifrost.tokens.total` | Counter | Total tokens (input + output) | provider, model | | `bifrost.cost.usd` | Gauge | Request cost in USD | provider, model | | `bifrost.cache.hits` | Counter | Cache hits | provider, model, cache_type | | `bifrost.stream.first_token_latency` | Histogram | Time to first token (streaming) | provider, model | | `bifrost.stream.inter_token_latency` | Histogram | Inter-token latency (streaming) | provider, model | ### Custom Tags All metrics include your configured `custom_tags` plus automatic tags for: - `provider` - LLM provider (openai, anthropic, etc.) - `model` - Model name - `request_type` - Type of request (chat, embedding, etc.) - `env` - Environment from configuration --- ## Captured Data Each APM trace includes comprehensive LLM operation metadata: ### Span Attributes - **Span Name** - Based on request type (`genai.chat`, `genai.embedding`, etc.) - **Service Info** - `service.name`, `service.version`, `env` - **Provider & Model** - `gen_ai.provider.name`, `gen_ai.request.model` ### Request Parameters - Temperature, max_tokens, top_p, stop sequences - Presence/frequency penalties - Tool configurations and parallel tool calls - Custom parameters via `ExtraParams` ### Input/Output Data - Complete chat history with role-based messages - Prompt text for completions - Response content with role attribution - Tool calls and results - Reasoning and refusal content (when present) ### Performance Metrics - Token usage (prompt, completion, total) - Cost calculations in USD - Latency and timing (start/end timestamps) - Time to first token (streaming) - Error details with status codes ### Bifrost Context - Virtual key ID and name - Selected key ID and name - Team ID and name - Customer ID and name - Retry count and fallback index --- ## Supported Request Types The Datadog plugin captures all Bifrost request types: | Request Type | Span Name | LLM Obs Type | |--------------|-----------|--------------| | Chat Completion | `genai.chat` | LLM Span | | Chat Completion (streaming) | `genai.chat` | LLM Span | | Text Completion | `genai.text` | LLM Span | | Text Completion (streaming) | `genai.text` | LLM Span | | Embeddings | `genai.embedding` | Embedding Span | | Speech Generation | `genai.speech` | Task Span | | Speech Generation (streaming) | `genai.speech` | Task Span | | Transcription | `genai.transcription` | Task Span | | Transcription (streaming) | `genai.transcription` | Task Span | | Responses API | `genai.responses` | LLM Span | | Responses API (streaming) | `genai.responses` | LLM Span | --- ## When to Use ### Datadog Plugin Choose the Datadog plugin when you: - Use Datadog as your primary observability platform - Want native LLM Observability integration with ML App grouping - Need seamless correlation with existing Datadog APM traces via W3C distributed tracing - Require Datadog-specific features like notebooks and dashboards - Want session tracking for conversation flows ### vs. OTel Plugin Use the [OTel plugin](/features/observability/otel) when you: - Need multi-vendor observability (send to multiple backends) - Are using Datadog via an OpenTelemetry Collector - Want vendor flexibility to switch backends without code changes - Prefer standardized OpenTelemetry semantic conventions You can use both plugins simultaneously if needed. The Datadog plugin provides native integration while OTel can send to additional backends. ### vs. Built-in Observability Use [Built-in Observability](/features/observability/default) for: - Local development and testing - Simple self-hosted deployments - No external dependencies required - Direct database access to logs --- ## Troubleshooting ### Agent Connectivity Issues Verify the Datadog Agent is running and accessible: ```bash # Check agent status datadog-agent status # Test APM endpoint curl -v http://localhost:8126/info # Test DogStatsD (should accept UDP packets) echo "test.metric:1|c" | nc -u -w1 localhost 8125 ``` ### Agentless Mode Not Working 1. Verify your API key is valid: ```bash curl -X GET "https://api.datadoghq.com/api/v1/validate" \ -H "DD-API-KEY: $DD_API_KEY" ``` 2. Ensure the `site` matches your API key's region 3. Check that the API key environment variable is set: ```bash echo $DD_API_KEY ``` ### Missing Traces 1. Enable debug logging in Bifrost: ```bash bifrost-http --log-level debug ``` 2. Verify traces are enabled in your configuration: ```json { "enable_traces": true, "enable_llm_obs": true } ``` 3. Check for errors in the Bifrost logs related to the Datadog plugin ### Missing Metrics 1. Verify DogStatsD is running (agent mode): ```bash datadog-agent status | grep DogStatsD ``` 2. Ensure metrics are enabled: ```json { "enable_metrics": true } ``` 3. For agentless mode, verify your API key has metrics submission permissions ### LLM Observability Not Appearing 1. LLM Observability requires `enable_llm_obs: true` (default) 2. Verify your Datadog plan includes LLM Observability 3. Check the ML App name in Datadog under **LLM Observability** → **Applications** --- ## Next Steps - **[OTel Plugin](/features/observability/otel)** - OpenTelemetry integration for multi-vendor observability - **[Built-in Observability](/features/observability/default)** - Local logging for development - **[Telemetry](/features/telemetry)** - Prometheus metrics and dashboards