first commit

This commit is contained in:
Beyhan Oğur
2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions

View File

View File

@@ -0,0 +1,764 @@
---
title: "Concurrency"
description: "Deep dive into Bifrost's advanced concurrency architecture - worker pools, goroutine management, channel-based communication, and resource isolation patterns."
icon: "traffic-light"
---
## Concurrency Philosophy
### **Core Principles**
| Principle | Implementation | Benefit |
| ---------------------------------- | -------------------------------------- | -------------------------------------- |
| **Provider Isolation** | Independent worker pools per provider | Fault tolerance, no cascade failures |
| **Channel-Based Communication** | Go channels for all async operations | Type-safe, deadlock-free communication |
| **Resource Pooling** | Object pools with lifecycle management | Predictable memory usage, minimal GC |
| **Non-Blocking Operations** | Async processing throughout pipeline | Maximum concurrency, no blocking waits |
| **Backpressure Handling** | Configurable buffers and flow control | Graceful degradation under load |
### **Threading Architecture Overview**
```mermaid
graph TB
subgraph "Main Thread"
Main[Main Process<br/>HTTP Server]
Router[Request Router<br/>Goroutine]
PluginMgr[Plugin Manager<br/>Goroutine]
end
subgraph "Provider Worker Pools"
subgraph "OpenAI Pool"
OAI1[Worker 1<br/>Goroutine]
OAI2[Worker 2<br/>Goroutine]
OAIN[Worker N<br/>Goroutine]
end
subgraph "Anthropic Pool"
ANT1[Worker 1<br/>Goroutine]
ANT2[Worker 2<br/>Goroutine]
ANTN[Worker N<br/>Goroutine]
end
subgraph "Bedrock Pool"
BED1[Worker 1<br/>Goroutine]
BED2[Worker 2<br/>Goroutine]
BEDN[Worker N<br/>Goroutine]
end
end
subgraph "Memory Pools"
ChannelPool[Channel Pool<br/>sync.Pool]
MessagePool[Message Pool<br/>sync.Pool]
ResponsePool[Response Pool<br/>sync.Pool]
end
Main --> Router
Router --> PluginMgr
PluginMgr --> OAI1
PluginMgr --> ANT1
PluginMgr --> BED1
OAI1 --> ChannelPool
ANT1 --> MessagePool
BED1 --> ResponsePool
```
---
## Worker Pool Architecture
### **Provider-Isolated Worker Pools**
```mermaid
stateDiagram-v2
[*] --> PoolInit: Worker Pool Creation
PoolInit --> WorkerSpawn: Spawn Worker Goroutines
WorkerSpawn --> Listening: Workers Listen on Channels
Listening --> Processing: Job Received
Processing --> API_Call: Provider API Request
API_Call --> Response: Process Response
Response --> Listening: Job Complete
Listening --> Shutdown: Graceful Shutdown
Processing --> Shutdown: Complete Current Job
Shutdown --> [*]: Pool Destroyed
```
**Worker Pool Architecture:**
The worker pool system maintains a sophisticated balance between resource efficiency and performance isolation:
**Key Components:**
- **Worker Pool Management** - Pre-spawned workers reduce startup latency
- **Job Queue System** - Buffered channels provide smooth load balancing
- **Resource Pools** - HTTP clients and API keys are pooled for efficiency
- **Health Monitoring** - Circuit breakers detect and isolate failing providers
- **Graceful Shutdown** - Workers complete current jobs before terminating
**Startup Process:**
1. **Worker Pre-spawning** - Workers are created during pool initialization
2. **Channel Setup** - Job queues and worker channels are established
3. **Resource Allocation** - HTTP clients and API keys are distributed
4. **Health Checks** - Initial connectivity tests verify provider availability
5. **Ready State** - Pool becomes available for request processing
**Job Dispatch Logic:**
- **Round-Robin Assignment** - Jobs are distributed evenly across available workers
- **Load Balancing** - Worker availability determines job assignment
- **Overflow Handling** - Excess jobs are queued or dropped based on configuration
### **Worker Lifecycle Management**
```mermaid
sequenceDiagram
participant Pool
participant Worker
participant HTTPClient
participant Provider
participant Metrics
Pool->>Worker: Start()
Worker->>Worker: Initialize HTTP Client
Worker->>Pool: Ready Signal
loop Job Processing
Pool->>Worker: Job Assignment
Worker->>HTTPClient: Prepare Request
HTTPClient->>Provider: API Call
Provider-->>HTTPClient: Response
HTTPClient-->>Worker: Parsed Response
Worker->>Metrics: Record Performance
Worker->>Pool: Job Complete
end
Pool->>Worker: Shutdown Signal
Worker->>Worker: Complete Current Job
Worker-->>Pool: Shutdown Confirmed
````
---
## Channel-Based Communication
### **Channel Architecture**
```mermaid
graph TB
subgraph "Channel Types"
JobQueue[Job Queue<br/>Buffered Channel]
WorkerPool[Worker Pool<br/>Buffered Channel]
ResultChan[Result Channel<br/>Buffered Channel]
QuitChan[Quit Channel<br/>Unbuffered]
end
subgraph "Flow Control"
BackPressure[Backpressure<br/>Buffer Limits]
Timeout[Timeout<br/>Context Cancellation]
Graceful[Graceful Shutdown<br/>Channel Closing]
end
JobQueue --> BackPressure
WorkerPool --> Timeout
ResultChan --> Graceful
```
**Channel Configuration Principles:**
Bifrost's channel system balances throughput and memory usage through careful buffer sizing:
**Job Queuing Configuration:**
- **Job Queue Buffer** - Sized based on expected burst traffic (100-1000 jobs)
- **Worker Pool Size** - Matches provider concurrency limits (10-100 workers)
- **Result Buffer** - Accommodates response processing delays (50-500 responses)
**Flow Control Parameters:**
- **Queue Wait Limits** - Maximum time jobs wait before timeout (1-10 seconds)
- **Processing Timeouts** - Per-job execution limits (30-300 seconds)
- **Shutdown Timeouts** - Graceful termination periods (5-30 seconds)
**Backpressure Policies:**
- **Drop Policy** - Discard excess jobs when queues are full
- **Block Policy** - Wait for queue space with timeout
- **Error Policy** - Immediately return error for full queues
**Channel Type Selection:**
- **Buffered Channels** - Used for async job processing and result handling
- **Unbuffered Channels** - Used for synchronization signals (quit, done)
- **Context Cancellation** - Used for timeout and cancellation propagation
### **Backpressure and Flow Control**
```mermaid
flowchart TD
Request[Incoming Request] --> QueueCheck{Queue Full?}
QueueCheck -->|No| Queue[Add to Queue]
QueueCheck -->|Yes| Policy{Drop Policy?}
Policy -->|Drop| Drop[Drop Request<br/>Return Error]
Policy -->|Block| Block[Block Until Space<br/>With Timeout]
Policy -->|Error| Error[Return Queue Full Error]
Queue --> Worker[Assign to Worker]
Block --> TimeoutCheck{Timeout?}
TimeoutCheck -->|Yes| Error
TimeoutCheck -->|No| Queue
Worker --> Processing[Process Request]
Processing --> Complete[Complete]
Drop --> Client[Client Response]
Error --> Client
Complete --> Client
````
**Backpressure Implementation Strategy:**
The backpressure system protects Bifrost from being overwhelmed while maintaining service availability:
**Non-Blocking Job Submission:**
- **Immediate Queue Check** - Jobs are submitted without blocking on queue space
- **Success Path** - Available queue space allows immediate job acceptance
- **Overflow Detection** - Full queues trigger backpressure policies
- **Metrics Collection** - All queue operations are tracked for monitoring
**Backpressure Policy Execution:**
- **Drop Policy** - Immediately rejects excess jobs with meaningful error messages
- **Block Policy** - Waits for queue space with configurable timeout limits
- **Error Policy** - Returns queue full errors for immediate client feedback
- **Metrics Tracking** - Dropped, blocked, and successful submissions are measured
**Timeout Management:**
- **Context-Based Timeouts** - All blocking operations respect timeout boundaries
- **Graceful Degradation** - Timeouts result in controlled error responses
- **Resource Protection** - Prevents goroutine leaks from infinite waits
```go
case pool.jobQueue <- job:
pool.metrics.IncQueuedJobs()
return nil
case <-ctx.Done():
pool.metrics.IncTimeoutJobs()
return errors.New("queue full, timeout waiting")
}
case "error":
pool.metrics.IncRejectedJobs()
return errors.New("queue full, job rejected")
default:
return errors.New("unknown queue policy")
}
}
}
```
---
## Memory Pool Concurrency
### **Thread-Safe Object Pools**
```mermaid
graph TD
subgraph "sync.Pool Lifecycle"
direction LR
GetObject[Get Object<br/>sync.Pool.Get]
PoolCheck{Is Pool Empty?}
NewObject[New Object<br/>Factory Function]
UseObject[Use Object<br/>Application Logic]
ResetObject[Reset Object<br/>Clear State]
ReturnObject[Return Object<br/>sync.Pool.Put]
GetObject --> PoolCheck
PoolCheck -- Yes --> NewObject
PoolCheck -- No --> UseObject
NewObject --> UseObject
UseObject --> ResetObject
ResetObject --> ReturnObject
ReturnObject --> GetObject
end
subgraph "GC Interaction"
direction TB
GCRun[GC Runs]
PoolCleanup[Pool Cleanup<br>Removes idle objects]
GCRun --> PoolCleanup
end
```
**Thread-Safe Pool Architecture:**
Bifrost's memory pool system ensures thread-safe object reuse across multiple goroutines:
**Pool Structure Design:**
- **Multiple Pool Types** - Separate pools for channels, messages, responses, and buffers
- **Factory Functions** - Dynamic object creation when pools are empty
- **Statistics Tracking** - Comprehensive metrics for pool performance monitoring
- **Thread Safety** - Synchronized access using Go's sync.Pool and read-write mutexes
**Object Lifecycle Management:**
- **Pool Initialization** - Factory functions define object creation patterns
- **Unique Identification** - Each pooled object gets a unique ID for tracking
- **Timestamp Tracking** - Creation, acquisition, and return times are recorded
- **Reusability Flags** - Objects can be marked as non-reusable for single-use scenarios
**Acquisition Strategy:**
- **Request Tracking** - All pool requests are counted for monitoring
- **Hit/Miss Tracking** - Pool effectiveness is measured through hit ratios
- **Fallback Creation** - New objects are created when pools are empty
- **Performance Metrics** - Acquisition times and patterns are monitored
**Return and Reset Process:**
- **State Validation** - Only reusable objects are returned to pools
- **Object Reset** - All object state is cleared before returning to pool
- **Return Tracking** - Return operations are counted and timed
- **Pool Replenishment** - Returned objects become available for reuse
### **Pool Performance Monitoring**
Comprehensive metrics provide insights into pool efficiency and system health:
**Usage Statistics Collection:**
- **Request Counting** - Track total pool requests by object type
- **Creation Tracking** - Monitor new object allocations when pools are empty
- **Hit/Miss Ratios** - Measure pool effectiveness through reuse rates
- **Return Monitoring** - Track successful object returns to pools
**Performance Metrics Analysis:**
- **Acquisition Times** - Measure how long it takes to get objects from pools
- **Reset Performance** - Track time spent cleaning objects for reuse
- **Hit Ratio Calculation** - Determine percentage of requests served from pools
- **Memory Efficiency** - Calculate memory savings from object reuse
**Key Performance Indicators:**
- **Channel Pool Hit Ratio** - Typically 85-95% in steady state
- **Message Pool Efficiency** - Usually 80-90% reuse rate
- **Response Pool Utilization** - Often 70-85% hit ratio
- **Total Memory Savings** - Measured reduction in garbage collection pressure
**Monitoring Integration:**
- **Thread-Safe Access** - All metrics collection is synchronized
- **Real-Time Updates** - Statistics are updated with each pool operation
- **Export Capability** - Metrics are available in JSON format for monitoring systems
- **Alerting Support** - Low hit ratios can trigger performance alerts
---
## Goroutine Management
### **Goroutine Lifecycle Patterns**
```mermaid
stateDiagram-v2
[*] --> Created: go routine()
Created --> Running: Execute Function
Running --> Waiting: Channel/Mutex Block
Waiting --> Running: Unblocked
Running --> Syscall: Network I/O
Syscall --> Running: I/O Complete
Running --> GCAssist: GC Triggered
GCAssist --> Running: GC Complete
Running --> Terminated: Function Exit
Terminated --> [*]: Cleanup
```
**Goroutine Pool Management Strategy:**
Bifrost's goroutine management ensures optimal resource usage while preventing goroutine leaks:
**Pool Configuration Management:**
- **Goroutine Limits** - Maximum concurrent goroutines prevent resource exhaustion
- **Active Counting** - Atomic counters track currently running goroutines
- **Idle Timeouts** - Unused goroutines are cleaned up after configured periods
- **Resource Boundaries** - Hard limits prevent runaway goroutine creation
**Lifecycle Orchestration:**
- **Spawn Channels** - New goroutine creation is tracked through channels
- **Completion Monitoring** - Finished goroutines signal completion for cleanup
- **Shutdown Coordination** - Graceful shutdown ensures all goroutines complete properly
- **Health Monitoring** - Continuous monitoring tracks goroutine health and performance
**Worker Creation Process:**
- **Limit Enforcement** - Creation fails when maximum goroutine count is reached
- **Unique Identification** - Each goroutine gets a unique ID for tracking and debugging
- **Lifecycle Tracking** - Start times and names enable performance analysis
- **Atomic Operations** - Thread-safe counters prevent race conditions
**Panic Recovery and Error Handling:**
- **Panic Isolation** - Goroutine panics don't crash the entire system
- **Error Logging** - Panic details are logged with goroutine context
- **Metrics Updates** - Panic counts are tracked for monitoring and alerting
- **Resource Cleanup** - Failed goroutines are properly cleaned up and counted
**Health Monitoring System:**
- **Periodic Health Checks** - Regular intervals check goroutine pool health
- **Completion Tracking** - Finished goroutines are recorded for performance analysis
- **Shutdown Handling** - Clean shutdown process ensures no goroutine leaks
### **Resource Leak Prevention**
```mermaid
flowchart TD
GoroutineStart[Goroutine Start] --> ResourceCheck[Resource Allocation Check]
ResourceCheck --> Timeout[Set Timeout Context]
Timeout --> Work[Execute Work]
Work --> Complete{Work Complete?}
Complete -->|Yes| Cleanup[Cleanup Resources]
Complete -->|No| TimeoutCheck{Timeout?}
TimeoutCheck -->|Yes| ForceCleanup[Force Cleanup]
TimeoutCheck -->|No| Work
Cleanup --> Return[Return Resources to Pool]
ForceCleanup --> Return
Return --> End[Goroutine End]
````
**Resource Leak Prevention:**
```go
func (worker *Worker) ExecuteWithCleanup(job *Job) {
// Set timeout context
ctx, cancel := context.WithTimeout(
context.Background(),
worker.config.ProcessTimeout,
)
defer cancel()
// Acquire resources with timeout
resources, err := worker.acquireResources(ctx)
if err != nil {
job.resultChan <- &Result{Error: err}
return
}
// Ensure cleanup happens
defer func() {
// Always return resources
worker.returnResources(resources)
// Handle panics
if r := recover(); r != nil {
worker.metrics.IncPanics()
job.resultChan <- &Result{
Error: fmt.Errorf("worker panic: %v", r),
}
}
}()
// Execute job with context
result := worker.processJob(ctx, job, resources)
// Return result
select {
case job.resultChan <- result:
// Success
case <-ctx.Done():
// Timeout - result channel might be closed
worker.metrics.IncTimeouts()
}
}
```
---
## Concurrency Optimization Strategies
### **Load-Based Worker Scaling** (Planned)
```mermaid
graph TB
subgraph "Load Monitoring"
QueueDepth[Queue Depth<br/>Monitoring]
ResponseTime[Response Time<br/>Tracking]
WorkerUtil[Worker Utilization<br/>Metrics]
end
subgraph "Scaling Decisions"
ScaleUp{Scale Up?<br/>Load > 80%}
ScaleDown{Scale Down?<br/>Load < 30%}
Maintain[Maintain<br/>Current Size]
end
subgraph "Actions"
AddWorkers[Spawn Additional<br/>Workers]
RemoveWorkers[Graceful Worker<br/>Shutdown]
NoAction[No Action<br/>Monitor Continue]
end
QueueDepth --> ScaleUp
ResponseTime --> ScaleUp
WorkerUtil --> ScaleDown
ScaleUp -->|Yes| AddWorkers
ScaleUp -->|No| ScaleDown
ScaleDown -->|Yes| RemoveWorkers
ScaleDown -->|No| Maintain
Maintain --> NoAction
```
**Adaptive Scaling Implementation:**
```go
type AdaptiveScaler struct {
pool *ProviderWorkerPool
config ScalingConfig
metrics *ScalingMetrics
lastScaleTime time.Time
scalingMutex sync.Mutex
}
func (scaler *AdaptiveScaler) EvaluateScaling() {
scaler.scalingMutex.Lock()
defer scaler.scalingMutex.Unlock()
// Prevent frequent scaling
if time.Since(scaler.lastScaleTime) < scaler.config.MinScaleInterval {
return
}
current := scaler.getCurrentMetrics()
// Scale up conditions
if current.QueueUtilization > scaler.config.ScaleUpThreshold ||
current.AvgResponseTime > scaler.config.MaxResponseTime {
scaler.scaleUp(current)
return
}
// Scale down conditions
if current.QueueUtilization < scaler.config.ScaleDownThreshold &&
current.AvgResponseTime < scaler.config.TargetResponseTime {
scaler.scaleDown(current)
return
}
}
func (scaler *AdaptiveScaler) scaleUp(metrics *CurrentMetrics) {
currentWorkers := scaler.pool.GetWorkerCount()
targetWorkers := int(float64(currentWorkers) * scaler.config.ScaleUpFactor)
// Respect maximum limits
if targetWorkers > scaler.config.MaxWorkers {
targetWorkers = scaler.config.MaxWorkers
}
additionalWorkers := targetWorkers - currentWorkers
if additionalWorkers > 0 {
scaler.pool.AddWorkers(additionalWorkers)
scaler.lastScaleTime = time.Now()
scaler.metrics.RecordScaleUp(additionalWorkers)
}
}
```
### **Provider-Specific Optimization**
```go
type ProviderOptimization struct {
// Provider characteristics
ProviderName string `json:"provider_name"`
RateLimit int `json:"rate_limit"` // Requests per second
AvgLatency time.Duration `json:"avg_latency"` // Average response time
ErrorRate float64 `json:"error_rate"` // Historical error rate
// Optimal configuration
OptimalWorkers int `json:"optimal_workers"`
OptimalBuffer int `json:"optimal_buffer"`
TimeoutConfig time.Duration `json:"timeout_config"`
RetryStrategy RetryConfig `json:"retry_strategy"`
}
func CalculateOptimalConcurrency(provider ProviderOptimization) ConcurrencyConfig {
// Calculate based on rate limits and latency
optimalWorkers := provider.RateLimit * int(provider.AvgLatency.Seconds())
// Adjust for error rate (more workers for higher error rate)
errorAdjustment := 1.0 + provider.ErrorRate
optimalWorkers = int(float64(optimalWorkers) * errorAdjustment)
// Buffer should be 2-3x worker count for smooth operation
optimalBuffer := optimalWorkers * 3
return ConcurrencyConfig{
Concurrency: optimalWorkers,
BufferSize: optimalBuffer,
Timeout: provider.AvgLatency * 2, // 2x avg latency for timeout
}
}
```
---
## Concurrency Monitoring & Metrics
### **Key Concurrency Metrics**
```mermaid
graph TB
subgraph "Worker Metrics"
ActiveWorkers[Active Workers<br/>Current Count]
IdleWorkers[Idle Workers<br/>Available Count]
BusyWorkers[Busy Workers<br/>Processing Count]
end
subgraph "Queue Metrics"
QueueDepth[Queue Depth<br/>Pending Jobs]
QueueThroughput[Queue Throughput<br/>Jobs/Second]
QueueWaitTime[Queue Wait Time<br/>Average Delay]
end
subgraph "Performance Metrics"
GoroutineCount[Goroutine Count<br/>Total Active]
MemoryUsage[Memory Usage<br/>Pool Utilization]
GCPressure[GC Pressure<br/>Collection Frequency]
end
subgraph "Health Metrics"
ErrorRate[Error Rate<br/>Failed Jobs %]
PanicCount[Panic Count<br/>Crashed Goroutines]
DeadlockDetection[Deadlock Detection<br/>Blocked Operations]
end
```
**Metrics Collection Strategy:**
Comprehensive concurrency monitoring provides operational insights and performance optimization data:
**Worker Pool Monitoring:**
- **Total Worker Tracking** - Monitor configured vs actual worker counts
- **Active Worker Monitoring** - Track workers currently processing requests
- **Idle Worker Analysis** - Identify unused capacity and optimization opportunities
- **Queue Depth Monitoring** - Track pending job backlog and processing delays
**Performance Data Collection:**
- **Throughput Metrics** - Measure jobs processed per second across all pools
- **Wait Time Analysis** - Track how long jobs wait in queues before processing
- **Memory Pool Performance** - Monitor hit/miss ratios for memory pool effectiveness
- **Goroutine Count Tracking** - Ensure goroutine counts remain within healthy limits
**Health and Reliability Metrics:**
- **Panic Recovery Tracking** - Count and analyze worker panic occurrences
- **Timeout Monitoring** - Track jobs that exceed processing time limits
- **Circuit Breaker Events** - Monitor provider isolation events and recoveries
- **Error Rate Analysis** - Track failure patterns for capacity planning
**Real-Time Updates:**
- **Live Metric Updates** - Worker metrics are updated continuously during operation
- **Processing Event Recording** - Each job completion updates relevant metrics
- **Performance Correlation** - Queue times and processing times are correlated for analysis
- **Success/Failure Tracking** - All job outcomes are recorded for reliability analysis
---
## Deadlock Prevention & Detection
### **Deadlock Prevention Strategies**
```mermaid
flowchart TD
Strategy1[Lock Ordering<br/>Consistent Acquisition]
Strategy2[Timeout-Based Locks<br/>Context Cancellation]
Strategy3[Channel Select<br/>Non-blocking Operations]
Strategy4[Resource Hierarchy<br/>Layered Locking]
Prevention[Deadlock Prevention<br/>Design Patterns]
Prevention --> Strategy1
Prevention --> Strategy2
Prevention --> Strategy3
Prevention --> Strategy4
Strategy1 --> Success[No Deadlocks<br/>Guaranteed Order]
Strategy2 --> Success
Strategy3 --> Success
Strategy4 --> Success
````
**Deadlock Prevention Implementation Strategy:**
Bifrost employs multiple complementary strategies to prevent deadlocks in concurrent operations:
**Lock Ordering Management:**
- **Consistent Acquisition Order** - All locks are acquired in a predetermined order
- **Global Lock Registry** - Centralized registry maintains lock ordering relationships
- **Order Enforcement** - Lock acquisition automatically sorts by predetermined order
- **Dependency Tracking** - Lock dependencies are mapped to prevent circular waits
**Timeout-Based Protection:**
- **Default Timeouts** - All lock acquisitions have reasonable timeout limits
- **Context Cancellation** - Operations respect context cancellation for cleanup
- **Maximum Timeout Limits** - Upper bounds prevent indefinite blocking
- **Graceful Timeout Handling** - Timeout errors provide meaningful context
**Multi-Lock Acquisition Process:**
- **Ordered Sorting** - Multiple locks are sorted before acquisition attempts
- **Progressive Acquisition** - Locks are acquired one by one in sorted order
- **Failure Recovery** - Failed acquisitions trigger automatic cleanup of held locks
- **Resource Tracking** - All acquired locks are tracked for proper release
**Lock Acquisition Safety:**
- **Non-Blocking Detection** - Channel-based lock attempts prevent indefinite blocking
- **Timeout Enforcement** - All lock attempts respect configured timeout limits
- **Error Propagation** - Lock failures are properly propagated with context
- **Cleanup Guarantees** - Failed operations always clean up partially acquired resources
**Deadlock Detection and Recovery:**
- **Active Monitoring** - Continuous monitoring for potential deadlock conditions
- **Automatic Recovery** - Detected deadlocks trigger automatic resolution procedures
- **Resource Release** - Deadlock resolution involves strategic resource release
- **Prevention Learning** - Deadlock patterns inform prevention strategy improvements
---
## Related Architecture Documentation
- **[Request Flow](./request-flow)** - How concurrency fits in request processing
- **[Benchmarks](../../benchmarking/getting-started)** - Concurrency performance characteristics
- **[Plugin System](./plugins)** - Plugin concurrency considerations
- **[MCP System](./mcp)** - MCP concurrency and worker integration
## Usage Documentation
- **[Provider Configuration](../../quickstart/gateway/provider-configuration)** - Configure concurrency settings per provider
- **[Performance Analysis](../../benchmarking/getting-started)** - Memory pool configuration and optimization
- **[Performance Monitoring](../../features/telemetry)** - Monitor concurrency metrics and health
- **[Go SDK Usage](../../quickstart/go-sdk/setting-up)** - Use Bifrost concurrency in Go applications
- **[Gateway Setup](../../quickstart/gateway/setting-up)** - Deploy Bifrost with optimal concurrency settings
---
**🎯 Next Step:** Understand how plugins integrate with the concurrency model in **[Plugin System](./plugins)**.
```

View File

@@ -0,0 +1,985 @@
---
title: "Model Context Protocol (MCP)"
description: "Deep dive into Bifrost's Model Context Protocol (MCP) integration - how external tool discovery, execution, and integration work internally."
icon: "toolbox"
---
## MCP Architecture Overview
### **What is MCP in Bifrost?**
The Model Context Protocol (MCP) system in Bifrost enables AI models to seamlessly discover and execute external tools, transforming static chat models into dynamic, action-capable agents. This architecture bridges the gap between AI reasoning and real-world tool execution.
**Core MCP Principles:**
- **Dynamic Discovery** - Tools are discovered at runtime, not hardcoded
- **Client-Side Execution** - Bifrost controls all tool execution for security
- **Multi-Protocol Support** - STDIO, HTTP, and SSE connection types
- **Request-Level Filtering** - Granular control over tool availability
- **Async Execution** - Non-blocking tool invocation and response handling
### **MCP System Components**
```mermaid
graph TB
subgraph "MCP Management Layer"
MCPMgr[MCP Manager<br/>Central Controller]
ClientRegistry[Client Registry<br/>Connection Management]
ToolDiscovery[Tool Discovery<br/>Runtime Registration]
end
subgraph "MCP Execution Layer"
ToolFilter[Tool Filter<br/>Access Control]
ToolExecutor[Tool Executor<br/>Invocation Engine]
ResultProcessor[Result Processor<br/>Response Handling]
end
subgraph "Connection Types"
STDIOConn[STDIO Connections<br/>Command-line Tools]
HTTPConn[HTTP Connections<br/>Web Services]
SSEConn[SSE Connections<br/>Real-time Streams]
end
subgraph "External MCP Servers"
FileSystem[Filesystem Tools<br/>File Operations]
WebSearch[Web Search<br/>Information Retrieval]
Database[Database Tools<br/>Data Access]
Custom[Custom Tools<br/>Business Logic]
end
MCPMgr --> ClientRegistry
ClientRegistry --> ToolDiscovery
ToolDiscovery --> ToolFilter
ToolFilter --> ToolExecutor
ToolExecutor --> ResultProcessor
ClientRegistry --> STDIOConn
ClientRegistry --> HTTPConn
ClientRegistry --> SSEConn
STDIOConn --> FileSystem
HTTPConn --> WebSearch
HTTPConn --> Database
STDIOConn --> Custom
```
---
## MCP Connection Architecture
### **Multi-Protocol Connection System**
Bifrost supports four MCP connection types, each optimized for different tool deployment patterns:
```mermaid
graph TB
subgraph "InProcess Connections"
InProcess[In-Memory Tools<br/>Same Process]
InProcessEx[Examples:<br/>• Embedded tools<br/>• High-perf operations<br/>• Testing tools]
end
subgraph "STDIO Connections"
STDIO[Command Line Tools<br/>Local Execution]
STDIOEx[Examples:<br/>• Filesystem tools<br/>• Local scripts<br/>• CLI utilities]
end
subgraph "HTTP Connections"
HTTP[Web Service Tools<br/>Remote APIs]
HTTPEx[Examples:<br/>• Web search APIs<br/>• Database services<br/>• External integrations]
end
subgraph "SSE Connections"
SSE[Real-time Tools<br/>Streaming Data]
SSEEx[Examples:<br/>• Live data feeds<br/>• Real-time monitoring<br/>• Event streams]
end
subgraph "Connection Characteristics"
Latency[Latency:<br/>InProcess < STDIO < HTTP < SSE]
Security[Security:<br/>InProcess/Local > HTTP > SSE]
Scalability[Scalability:<br/>HTTP > SSE > STDIO > InProcess]
Complexity[Complexity:<br/>InProcess < STDIO < HTTP < SSE]
end
InProcess --> Latency
STDIO --> Latency
HTTP --> Security
SSE --> Scalability
HTTP --> Complexity
```
### **Connection Type Details**
**InProcess Connections (In-Memory Tools):**
- **Use Case:** Embedded tools, high-performance operations, testing
- **Performance:** Lowest possible latency (~0.1ms) with no IPC overhead
- **Security:** Highest security as tools run in the same process
- **Limitations:** Go package only, cannot be configured via JSON
**STDIO Connections (Local Tools):**
- **Use Case:** Command-line tools, local scripts, filesystem operations
- **Performance:** Low latency (~1-10ms) due to local execution
- **Security:** High security with full local control
- **Limitations:** Single-server deployment, resource sharing
**HTTP Connections (Remote Services):**
- **Use Case:** Web APIs, microservices, cloud functions
- **Performance:** Network-dependent latency (~10-500ms)
- **Security:** Configurable with authentication and encryption
- **Advantages:** Scalable, multi-server deployment, service isolation
**SSE Connections (Streaming Tools):**
- **Use Case:** Real-time data feeds, live monitoring, event streams
- **Performance:** Variable latency depending on stream frequency
- **Security:** Similar to HTTP with streaming capabilities
- **Benefits:** Real-time updates, persistent connections, event-driven
> **MCP Configuration:** [MCP Setup Guide →](../../mcp/overview)
---
## Tool Discovery & Registration
### **Dynamic Tool Discovery Process**
The MCP system discovers tools at runtime rather than requiring static configuration, enabling flexible and adaptive tool availability:
```mermaid
sequenceDiagram
participant Bifrost
participant MCPManager
participant MCPServer
participant ToolRegistry
participant AIModel
Note over Bifrost: System Startup
Bifrost->>MCPManager: Initialize MCP System
MCPManager->>MCPServer: Establish Connection
MCPServer-->>MCPManager: Connection Ready
MCPManager->>MCPServer: List Available Tools
MCPServer-->>MCPManager: Tool Definitions
MCPManager->>ToolRegistry: Register Tools
Note over Bifrost: Runtime Request Processing
AIModel->>MCPManager: Request Available Tools
MCPManager->>ToolRegistry: Query Tools
ToolRegistry-->>MCPManager: Filtered Tool List
MCPManager-->>AIModel: Available Tools
AIModel->>MCPManager: Execute Tool Call
MCPManager->>MCPServer: Tool Invocation
MCPServer->>MCPServer: Execute Tool Logic
MCPServer-->>MCPManager: Tool Result
MCPManager-->>AIModel: Enhanced Response
```
### **Tool Registry Management**
**Registration Process:**
1. **Connection Establishment** - MCP client connects to configured servers
2. **Capability Exchange** - Server announces available tools and schemas
3. **Tool Validation** - Bifrost validates tool definitions and security
4. **Registry Update** - Tools are registered in the internal tool registry
5. **Availability Notification** - Tools become available for AI model use
**Registry Features:**
- **Dynamic Updates** - Tools can be added/removed during runtime
- **Version Management** - Support for tool versioning and compatibility
- **Access Control** - Request-level tool filtering and permissions
- **Health Monitoring** - Continuous tool availability checking
**Tool Metadata Structure:**
- **Name & Description** - Human-readable tool identification
- **Parameters Schema** - JSON schema for tool input validation
- **Return Schema** - Expected response format definition
- **Capabilities** - Tool feature flags and limitations
- **Authentication** - Required credentials and permissions
---
## Tool Filtering & Access Control
### **Multi-Level Filtering System**
Bifrost provides granular control over tool availability through a sophisticated filtering system:
```mermaid
flowchart TD
Request[Incoming Request] --> GlobalFilter{Global MCP Filter}
GlobalFilter -->|Enabled| ClientFilter[MCP Client Filtering]
GlobalFilter -->|Disabled| NoMCP[No MCP Tools]
ClientFilter --> IncludeClients{Include Clients?}
IncludeClients -->|Yes| IncludeList[Include Specified<br/>MCP Clients]
IncludeClients -->|No| AllClients[All MCP Clients]
IncludeList --> ExcludeClients{Exclude Clients?}
AllClients --> ExcludeClients
ExcludeClients -->|Yes| RemoveClients[Remove Excluded<br/>MCP Clients]
ExcludeClients -->|No| ClientsFiltered[Filtered Clients]
RemoveClients --> ToolFilter[Tool-Level Filtering]
ClientsFiltered --> ToolFilter
ToolFilter --> IncludeTools{Include Tools?}
IncludeTools -->|Yes| IncludeSpecific[Include Specified<br/>Tools Only]
IncludeTools -->|No| AllTools[All Available Tools]
IncludeSpecific --> ExcludeTools{Exclude Tools?}
AllTools --> ExcludeTools
ExcludeTools -->|Yes| RemoveTools[Remove Excluded<br/>Tools]
ExcludeTools -->|No| FinalTools[Final Tool Set]
RemoveTools --> FinalTools
FinalTools --> AIModel[Available to AI Model]
NoMCP --> AIModel
```
### **Filtering Configuration Levels**
**Request-Level Filtering:**
```bash
# Include only specific MCP clients
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-mcp-include-clients: filesystem,websearch" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
# Include only specific tools
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-mcp-include-tools: filesystem-read_file,websearch-search" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
```
**Configuration-Level Filtering:**
- **Client Selection** - Choose which MCP servers to connect to
- **Tool Blacklisting** - Permanently disable dangerous or unwanted tools
- **Permission Mapping** - Map user roles to available tool sets
- **Environment-Based** - Different tool sets for development vs production
**Security Benefits:**
- **Principle of Least Privilege** - Only necessary tools are exposed
- **Dynamic Access Control** - Per-request tool availability
- **Audit Trail** - Track which tools are used by which requests
- **Risk Mitigation** - Prevent access to dangerous operations
> **📖 Tool Filtering:** [MCP Tool Control →](../../mcp/filtering)
---
## Tool Execution Engine
### **Async Tool Execution Architecture**
The MCP execution engine handles tool invocation asynchronously to maintain system responsiveness and enable complex multi-tool workflows:
```mermaid
sequenceDiagram
participant AIModel
participant ExecutionEngine
participant ToolInvoker
participant MCPServer
participant ResultProcessor
AIModel->>ExecutionEngine: Tool Call Request
ExecutionEngine->>ExecutionEngine: Validate Tool Call
ExecutionEngine->>ToolInvoker: Queue Tool Execution
Note over ToolInvoker: Async Tool Execution
ToolInvoker->>MCPServer: Invoke Tool
MCPServer->>MCPServer: Execute Tool Logic
MCPServer-->>ToolInvoker: Raw Tool Result
ToolInvoker->>ResultProcessor: Process Result
ResultProcessor->>ResultProcessor: Format & Validate
ResultProcessor-->>ExecutionEngine: Processed Result
ExecutionEngine-->>AIModel: Tool Execution Complete
Note over AIModel: Multi-turn Conversation
AIModel->>ExecutionEngine: Continue with Tool Results
ExecutionEngine->>ExecutionEngine: Merge Results into Context
ExecutionEngine-->>AIModel: Enhanced Response
```
### **Execution Flow Characteristics**
**Validation Phase:**
- **Parameter Validation** - Ensure tool arguments match expected schema
- **Permission Checking** - Verify tool access permissions for the request
- **Rate Limiting** - Apply per-tool and per-user rate limits
- **Security Scanning** - Check for potentially dangerous operations
**Execution Phase:**
- **Timeout Management** - Bounded execution time to prevent hanging
- **Error Handling** - Graceful handling of tool failures and timeouts
- **Result Streaming** - Support for tools that return streaming responses
- **Resource Monitoring** - Track tool resource usage and performance
**Response Phase:**
- **Result Formatting** - Convert tool outputs to consistent format
- **Error Enrichment** - Add context and suggestions for tool failures
- **Multi-Result Aggregation** - Combine multiple tool outputs coherently
- **Context Integration** - Merge tool results into conversation context
### **Multi-Turn Conversation Support**
The MCP system enables sophisticated multi-turn conversations where AI models can:
1. **Initial Tool Discovery** - Request available tools for a given context
2. **Tool Execution** - Execute one or more tools based on user request
3. **Result Analysis** - Analyze tool outputs and determine next steps
4. **Follow-up Actions** - Execute additional tools based on previous results
5. **Response Synthesis** - Combine tool results into coherent user response
**Example Multi-Turn Flow:**
```
User: "Find recent news about AI and save interesting articles"
AI: → Execute web_search("AI news recent")
AI: → Analyze search results
AI: → Execute save_article() for each interesting result
AI: → Respond with summary of saved articles
```
### **Complete User-Controlled Tool Execution Flow**
The following diagram shows the end-to-end user experience with MCP tool execution, highlighting the critical user control points and decision-making process:
```mermaid
flowchart TD
A["👤 User Message<br/>\"List files in current directory\""] --> B["🤖 Bifrost Core"]
B --> C["🔧 MCP Manager<br/>Auto-discovers and adds<br/>available tools to request"]
C --> D["🌐 LLM Provider<br/>(OpenAI, Anthropic, etc.)"]
D --> E{"🔍 Response contains<br/>tool_calls?"}
E -->|No| F["✅ Final Response<br/>Display to user"]
E -->|Yes| G["📝 Add assistant message<br/>with tool_calls to history"]
G --> H["🛡️ YOUR EXECUTION LOGIC<br/>(Security, Approval, Logging)"]
H --> I{"🤔 User Decision Point<br/>Execute this tool?"}
I -->|Deny| J["❌ Create denial result<br/>Add to conversation history"]
I -->|Approve| K["⚙️ client.ExecuteMCPTool()<br/>Bifrost executes via MCP"]
K --> L["📊 Tool Result<br/>Add to conversation history"]
J --> M["🔄 Continue conversation loop<br/>Send updated history back to LLM"]
L --> M
M --> D
style A fill:#e1f5fe
style F fill:#e8f5e8
style H fill:#fff3e0
style I fill:#fce4ec
style K fill:#f3e5f5
```
**Key Flow Characteristics:**
**User Control Points:**
- **Security Layer** - Your application controls all tool execution decisions
- **Approval Gate** - Users can approve or deny each tool execution
- **Transparency** - Full visibility into what tools will be executed and why
- **Conversation Continuity** - Tool results seamlessly integrate into conversation flow
**Security Benefits:**
- **No Automatic Execution** - Tools never execute without explicit approval
- **Audit Trail** - Complete logging of all tool execution decisions
- **Contextual Security** - Approval decisions can consider full conversation context
- **Graceful Denials** - Denied tools result in informative responses, not errors
**Implementation Patterns:**
```go
// Example tool execution control in your application
func handleToolExecution(toolCall schemas.ChatToolCall, userContext UserContext) error {
// YOUR SECURITY AND APPROVAL LOGIC HERE
if !userContext.HasPermission(toolCall.Function.Name) {
return createDenialResponse("Tool not permitted for user role")
}
if requiresApproval(toolCall) {
approved := promptUserForApproval(toolCall)
if !approved {
return createDenialResponse("User denied tool execution")
}
}
// Execute the tool via Bifrost
result, err := client.ExecuteMCPTool(ctx, toolCall)
if err != nil {
return handleToolError(err)
}
return addToolResultToHistory(result)
}
```
This flow ensures that while AI models can discover and request tool usage, all actual execution remains under user control, providing the perfect balance of AI capability and human oversight.
---
## Agent Mode Architecture
Agent Mode transforms Bifrost into an autonomous agent runtime by automatically executing pre-approved tools. This section details the internal architecture of the agent execution loop.
### **Agent Execution Loop**
The agent mode operates as an iterative loop that continues until one of the termination conditions is met:
```mermaid
flowchart TD
subgraph "Agent Mode Entry"
A["📥 Incoming Chat Request"] --> B{"🔍 Check MCP Config<br/>Any tools_to_auto_execute?"}
B -->|No| C["📤 Standard Flow<br/>Return tool_calls for manual execution"]
B -->|Yes| D["🤖 Enter Agent Loop"]
end
subgraph "Agent Execution Loop"
D --> E["🌐 Send to LLM Provider<br/>With available tools"]
E --> F{"🔧 Response has<br/>tool_calls?"}
F -->|No| G["✅ Return Final Response<br/>No more tools needed"]
F -->|Yes| H["📋 Classify Tool Calls"]
H --> I{"🔐 Separate by<br/>auto-execute status"}
I --> J["⚡ Auto-Executable Tools"]
I --> K["🛡️ Non-Auto-Executable Tools"]
J --> L["🔄 Execute in Parallel<br/>Via ToolsManager"]
L --> M["📊 Collect Results"]
K --> N{"Any non-auto<br/>tools found?"}
N -->|Yes| O["🛑 Exit Loop Early<br/>Return mixed response"]
N -->|No| P{"⏱️ Max depth<br/>reached?"}
M --> P
P -->|Yes| Q["⚠️ Return Current State<br/>May have pending tools"]
P -->|No| R["📝 Add results to history"]
R --> E
end
subgraph "Response Handling"
O --> S["📦 Create Mixed Response<br/>• Content: executed results JSON<br/>• tool_calls: pending tools<br/>• finish_reason: stop"]
G --> T["📦 Standard Response<br/>Final answer from LLM"]
Q --> U["📦 Depth Limit Response<br/>Current state with any pending"]
end
style D fill:#e3f2fd
style L fill:#e8f5e9
style O fill:#fff3e0
style S fill:#fce4ec
```
### **Tool Classification System**
When the LLM returns tool calls, Bifrost classifies each tool based on the client configuration:
```mermaid
flowchart LR
subgraph "Tool Call Classification"
TC["🔧 Tool Call<br/>from LLM Response"] --> CHECK{"Tool in<br/>tools_to_execute?"}
CHECK -->|No| SKIP["❌ Skip<br/>Not allowed"]
CHECK -->|Yes| AUTO{"Tool in<br/>tools_to_auto_execute?"}
AUTO -->|Yes| EXEC["⚡ Auto-Execute<br/>Run immediately"]
AUTO -->|No| MANUAL["🛡️ Manual<br/>Return to caller"]
end
subgraph "Configuration Example"
CONFIG["MCPClientConfig"]
CONFIG --> TE["tools_to_execute: [*]<br/>All tools available"]
CONFIG --> TAE["tools_to_auto_execute:<br/>[read_file, list_dir]"]
end
style EXEC fill:#c8e6c9
style MANUAL fill:#fff9c4
style SKIP fill:#ffcdd2
```
### **Mixed Tool Response Format**
When a response contains both auto-executable and non-auto-executable tools, the agent creates a special response format:
<AccordionGroup>
<Accordion title="Chat API Response Format" icon="message" defaultOpen>
```json
{
"id": "chatcmpl-abc123",
"choices": [{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "The Output from allowed tools calls is - {\"filesystem_read_file\":\"file contents here\",\"filesystem_list_directory\":\"[\\\"file1.txt\\\",\\\"file2.txt\\\"]\"}\n\nNow I shall call these tools next...",
"tool_calls": [
{
"id": "call_write_123",
"type": "function",
"function": {
"name": "filesystem_write_file",
"arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
}
}
]
}
}]
}
```
<Note>
The `content` field contains JSON-formatted results from auto-executed tools. The `tool_calls` array contains only non-auto-executable tools awaiting approval. Setting `finish_reason` to `"stop"` ensures the agent loop exits.
</Note>
</Accordion>
<Accordion title="Responses API Format" icon="code">
```json
{
"id": "resp-abc123",
"output": [
{
"type": "message",
"role": "assistant",
"content": [{
"type": "text",
"text": "The Output from allowed tools calls is - {...}\n\nNow I shall call these tools next..."
}]
},
{
"type": "function_call",
"role": "assistant",
"call_id": "call_write_123",
"name": "filesystem_write_file",
"arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
}
]
}
```
</Accordion>
</AccordionGroup>
### **Agent Depth Control**
The `max_agent_depth` setting prevents infinite loops and controls resource usage:
```mermaid
graph LR
subgraph "Depth Tracking"
D0["Depth 0<br/>Initial Request"] --> D1["Depth 1<br/>First tool execution"]
D1 --> D2["Depth 2<br/>Second iteration"]
D2 --> D3["Depth 3<br/>..."]
D3 --> DN["Depth N<br/>Max reached"]
end
DN --> EXIT["🛑 Force Exit<br/>Return current state"]
subgraph "Configuration"
CFG["MCPToolManagerConfig"]
CFG --> MAX["max_agent_depth: 10<br/>(default)"]
CFG --> TIMEOUT["tool_execution_timeout:<br/>30s per tool"]
end
```
<Warning>
When max depth is reached, the response may contain pending tool calls that weren't executed. Your application should handle this gracefully.
</Warning>
---
## Code Mode Architecture
Code Mode enables AI models to write and execute Python code (Starlark) that orchestrates multiple MCP tools in a single request. This provides a powerful meta-layer for complex multi-tool workflows.
### **Code Mode System Overview**
```mermaid
graph TB
subgraph "Code Mode Components"
VM["🖥️ Starlark Interpreter<br/>Python-like Runtime"]
VFS["📁 Virtual File System<br/>Tool Definitions as .pyi"]
EXEC["⚙️ Code Executor<br/>Sandboxed Execution"]
end
subgraph "Meta Tools"
LIST["listToolFiles()<br/>Discover available servers"]
READ["readToolFile(fileName)<br/>Get tool signatures"]
DOCS["getToolDocs(server, tool)<br/>Get detailed docs"]
CODE["executeToolCode(code)<br/>Run Python code"]
end
subgraph "MCP Integration"
TOOLS["🔧 Connected MCP Tools"]
RESULTS["📊 Tool Results"]
end
LLM["🤖 LLM"] --> LIST
LIST --> VFS
VFS --> LLM
LLM --> READ
READ --> VFS
VFS --> LLM
LLM --> DOCS
DOCS --> VFS
VFS --> LLM
LLM --> CODE
CODE --> VM
VM --> EXEC
EXEC --> TOOLS
TOOLS --> RESULTS
RESULTS --> LLM
style VM fill:#e8eaf6
style VFS fill:#e3f2fd
style CODE fill:#e8f5e9
```
### **Virtual File System (VFS)**
Code Mode generates Python stub files (`.pyi`) for all connected MCP tools, providing compact function signatures:
<Tabs>
<Tab title="Server-Level Binding">
When `code_mode_binding_level: "server"` (default), tools are grouped by MCP client:
```
servers/
├── filesystem.pyi → All filesystem tools
├── web_search.pyi → All web search tools
└── database.pyi → All database tools
```
**Generated Stub Example:**
```python
# servers/filesystem.pyi
# Usage: filesystem.tool_name(param=value)
# For detailed docs: use getToolDocs(server="filesystem", tool="tool_name")
def read_file(path: str) -> dict: # Read contents of a file
def write_file(path: str, content: str) -> dict: # Write content to a file
def list_directory(path: str) -> dict: # List directory contents
```
**Usage in Code:**
```python
files = filesystem.list_directory(path=".")
content = filesystem.read_file(path=files["entries"][0])
result = content
```
</Tab>
<Tab title="Tool-Level Binding">
When `code_mode_binding_level: "tool"`, each tool gets its own file:
```
servers/
├── filesystem/
│ ├── read_file.pyi
│ ├── write_file.pyi
│ └── list_directory.pyi
├── web_search/
│ └── search.pyi
└── database/
└── query.pyi
```
**Generated Stub Example:**
```python
# servers/filesystem/read_file.pyi
# Usage: filesystem.read_file(param=value)
def read_file(path: str) -> dict: # Read contents of a file
```
**Usage in Code:**
```python
content = filesystem.read_file(path="config.json")
result = content
```
</Tab>
</Tabs>
### **Code Execution Flow**
```mermaid
sequenceDiagram
participant LLM as 🤖 LLM
participant CM as 📝 Code Mode Handler
participant VM as 🖥️ Starlark Interpreter
participant TM as 🔧 Tools Manager
participant MCP as 🌐 MCP Servers
LLM->>CM: executeToolCode({ code: "..." })
CM->>VM: Initialize sandbox
CM->>VM: Inject tool bindings
CM->>VM: Execute Python code
loop For each tool call in code
VM->>TM: server.tool(param=value)
TM->>MCP: Execute tool
MCP-->>TM: Tool result
TM-->>VM: Return result
end
VM-->>CM: Execution result
CM-->>LLM: { result, logs }
```
### **Starlark Sandbox**
The code execution environment is carefully sandboxed using Starlark, a Python-like language designed for configuration and embedded scripting:
<AccordionGroup>
<Accordion title="Available Features" icon="check" defaultOpen>
- ✅ **Python-like syntax** - Familiar Python syntax and semantics
- ✅ **Synchronous calls** - No async/await needed, direct function calls
- ✅ **List comprehensions** - `[x for x in items if condition]`
- ✅ **print()** - Output captured and returned in logs
- ✅ **Dict/List operations** - Standard Python data structures
- ✅ **Tool bindings** - All connected MCP tools as globals
</Accordion>
<Accordion title="Restricted Features" icon="ban">
- ❌ **Imports** - No `import` statements (tools are pre-bound)
- ❌ **Classes** - Use dicts and functions instead
- ❌ **File I/O** - No direct filesystem access (use MCP tools)
- ❌ **Network** - No direct network access (use MCP tools)
- ❌ **Randomness/Time** - Deterministic execution only
</Accordion>
</AccordionGroup>
### **Code Mode Security Model**
```mermaid
graph TB
subgraph "Security Layers"
L1["🔒 Code Validation<br/>Syntax checking before execution"]
L2["🛡️ Sandboxed Runtime<br/>No external module access"]
L3["⏱️ Execution Timeout<br/>Bounded runtime"]
L4["🔐 Tool ACL<br/>Only allowed tools accessible"]
end
subgraph "Execution Boundaries"
B1["No filesystem access<br/>(except via MCP tools)"]
B2["No network access<br/>(except via MCP tools)"]
B3["No process spawning"]
B4["Memory isolation enforced"]
end
L1 --> L2 --> L3 --> L4
L4 --> B1
L4 --> B2
L4 --> B3
L4 --> B4
```
### **Code Mode Configuration**
<Tabs>
<Tab title="Gateway (config.json)">
```json
{
"mcp": {
"client_configs": [
{
"name": "filesystem",
"is_code_mode_client": true,
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": ["-y", "@anthropic/mcp-filesystem"]
},
"tools_to_execute": ["*"]
}
],
"tool_manager_config": {
"code_mode_binding_level": "server",
"tool_execution_timeout": "30s"
}
}
}
```
</Tab>
<Tab title="Go SDK">
```go
mcpConfig := &schemas.MCPConfig{
ClientConfigs: []schemas.MCPClientConfig{
{
Name: "filesystem",
IsCodeModeClient: true,
ConnectionType: schemas.MCPConnectionTypeSTDIO,
StdioConfig: &schemas.MCPStdioConfig{
Command: "npx",
Args: []string{"-y", "@anthropic/mcp-filesystem"},
},
ToolsToExecute: []string{"*"},
},
},
ToolManagerConfig: &schemas.MCPToolManagerConfig{
CodeModeBindingLevel: schemas.CodeModeBindingLevelServer,
ToolExecutionTimeout: 30 * time.Second,
},
}
```
</Tab>
</Tabs>
### **Code Mode vs Agent Mode**
| Aspect | Agent Mode | Code Mode |
|--------|------------|-----------|
| **Execution Model** | LLM decides one tool at a time | LLM writes code orchestrating multiple tools |
| **Iterations** | Multiple LLM round-trips | Single LLM call, code handles orchestration |
| **Complexity** | Simple tool chains | Complex workflows with conditionals/loops |
| **Latency** | Higher (multiple LLM calls) | Lower (single LLM call + code execution) |
| **Control** | Per-tool approval possible | Code runs atomically |
| **Best For** | Interactive agents | Batch operations, complex data processing |
---
## MCP Integration Patterns
### **Common Integration Scenarios**
**1. Filesystem Operations**
- **Tools:** `list_files`, `read_file`, `write_file`, `create_directory`
- **Use Cases:** Code analysis, document processing, file management
- **Security:** Sandboxed file access, path validation, permission checks
- **Performance:** Local execution for fast file operations
**2. Web Search & Information Retrieval**
- **Tools:** `web_search`, `fetch_url`, `extract_content`, `summarize`
- **Use Cases:** Research assistance, fact-checking, content gathering
- **Integration:** External search APIs, content parsing services
- **Caching:** Response caching for repeated queries
**3. Database Operations**
- **Tools:** `query_database`, `insert_record`, `update_record`, `schema_info`
- **Use Cases:** Data analysis, report generation, database administration
- **Security:** Read-only access by default, query validation, injection prevention
- **Performance:** Connection pooling, query optimization
**4. API Integrations**
- **Tools:** Custom business logic tools, third-party service integration
- **Use Cases:** CRM operations, payment processing, notification sending
- **Authentication:** API key management, OAuth token handling
- **Error Handling:** Retry logic, fallback mechanisms
### **MCP Server Development Patterns**
**Simple STDIO Server:**
- **Language:** Any language that can read/write JSON to stdin/stdout
- **Deployment:** Single executable, minimal dependencies
- **Use Case:** Local tools, development utilities, simple scripts
**HTTP Service Server:**
- **Architecture:** RESTful API with MCP protocol endpoints
- **Scalability:** Horizontal scaling, load balancing
- **Use Case:** Shared tools, enterprise integrations, cloud services
**Hybrid Approach:**
- **Local + Remote:** Combine STDIO tools for local operations with HTTP for remote services
- **Failover:** Use local fallbacks when remote services are unavailable
- **Optimization:** Route tool calls to most appropriate execution environment
> **📖 MCP Development:** [Tool Development Guide →](../../mcp/overview)
---
## Security & Safety Considerations
### **MCP Security Architecture**
```mermaid
graph TB
subgraph "Security Layers"
L1[Connection Security<br/>Authentication & Encryption]
L2[Tool Validation<br/>Schema & Permission Checks]
L3[Execution Security<br/>Sandboxing & Limits]
L4[Result Security<br/>Output Validation & Filtering]
end
subgraph "Threat Mitigation"
T1[Malicious Tools<br/>Code Injection Prevention]
T2[Resource Abuse<br/>Rate Limiting & Quotas]
T3[Data Exposure<br/>Output Sanitization]
T4[System Access<br/>Privilege Isolation]
end
L1 --> T1
L2 --> T2
L3 --> T4
L4 --> T3
```
**Security Measures:**
**Connection Security:**
- **Authentication** - API keys, certificates, or token-based auth for HTTP/SSE
- **Encryption** - TLS for HTTP connections, secure pipes for STDIO
- **Network Isolation** - Firewall rules and network segmentation
**Execution Security:**
- **Sandboxing** - Isolated execution environments for tools
- **Resource Limits** - CPU, memory, and time constraints
- **Permission Model** - Principle of least privilege for tool access
**Operational Security:**
- **Regular Updates** - Keep MCP servers and tools updated
- **Monitoring** - Continuous security monitoring and alerting
- **Incident Response** - Procedures for security incidents involving tools
---
## Related Architecture Documentation
- **[Request Flow](./request-flow)** - MCP integration in request processing
- **[Concurrency Model](./concurrency)** - MCP concurrency and worker integration
- **[Plugin System](./plugins)** - Integration between MCP and plugin systems
- **[Benchmarks](../../benchmarking/getting-started)** - MCP performance impact and optimization

View File

@@ -0,0 +1,552 @@
---
title: "Plugins"
description: "Deep dive into Bifrost's extensible plugin architecture - how plugins work internally, lifecycle management, execution model, and integration patterns."
icon: "puzzle-piece"
---
## Plugin Architecture Philosophy
### **Core Design Principles**
Bifrost's plugin system is built around five key principles that ensure extensibility without compromising performance or reliability:
| Principle | Implementation | Benefit |
| ----------------------------- | ------------------------------------------------ | ------------------------------------------------ |
| **Plugin-First Design** | Core logic designed around plugin hook points | Maximum extensibility without core modifications |
| **Zero-Copy Integration** | Direct memory access to request/response objects | Minimal performance overhead |
| **Lifecycle Management** | Complete plugin lifecycle with automatic cleanup | Resource safety and leak prevention |
| **Interface-Based Safety** | Well-defined interfaces for type safety | Compile-time validation and consistency |
| **Failure Isolation** | Plugin errors don't crash the core system | Fault tolerance and system stability |
### **Plugin System Overview**
```mermaid
graph TB
subgraph "Plugin Management Layer"
PluginMgr[Plugin Manager<br/>Central Controller]
Registry[Plugin Registry<br/>Discovery & Loading]
Lifecycle[Lifecycle Manager<br/>State Management]
end
subgraph "Plugin Execution Layer"
Pipeline[Plugin Pipeline<br/>Execution Orchestrator]
PreHooks[Pre-Processing Hooks<br/>Request Modification]
PostHooks[Post-Processing Hooks<br/>Response Enhancement]
end
subgraph "Plugin Categories"
Auth[Authentication<br/>& Authorization]
RateLimit[Rate Limiting<br/>& Throttling]
Transform[Data Transformation<br/>& Validation]
Monitor[Monitoring<br/>& Analytics]
Custom[Custom Business<br/>Logic]
end
PluginMgr --> Registry
Registry --> Lifecycle
Lifecycle --> Pipeline
Pipeline --> PreHooks
Pipeline --> PostHooks
PreHooks --> Auth
PreHooks --> RateLimit
PostHooks --> Transform
PostHooks --> Monitor
PostHooks --> Custom
```
---
## Plugin Lifecycle Management
### **Complete Lifecycle States**
Every plugin goes through a well-defined lifecycle that ensures proper resource management and error handling:
```mermaid
stateDiagram-v2
[*] --> PluginInit: Plugin Creation
PluginInit --> Registered: Add to BifrostConfig
Registered --> PreHookCall: Request Received
PreHookCall --> ModifyRequest: Normal Flow
PreHookCall --> ShortCircuitResponse: Return Response
PreHookCall --> ShortCircuitError: Return Error
ModifyRequest --> ProviderCall: Send to Provider
ProviderCall --> PostHookCall: Receive Response
ShortCircuitResponse --> PostHookCall: Skip Provider
ShortCircuitError --> PostHookCall: Pipeline Symmetry
PostHookCall --> ModifyResponse: Process Result
PostHookCall --> RecoverError: Error Recovery
PostHookCall --> FallbackCheck: Check AllowFallbacks
PostHookCall --> ResponseReady: Pass Through
FallbackCheck --> TryFallback: AllowFallbacks=true/nil
FallbackCheck --> ResponseReady: AllowFallbacks=false
TryFallback --> PreHookCall: Next Provider
ModifyResponse --> ResponseReady: Modified
RecoverError --> ResponseReady: Recovered
ResponseReady --> [*]: Return to Client
Registered --> CleanupCall: Bifrost Shutdown
CleanupCall --> [*]: Plugin Destroyed
```
### **Lifecycle Phase Details**
**Discovery Phase:**
- **Purpose:** Find and catalog available plugins
- **Sources:** Command line, environment variables, JSON configuration, directory scanning
- **Validation:** Basic existence and format checks
- **Output:** Plugin descriptors with metadata
**Loading Phase:**
- **Purpose:** Load plugin binaries into memory
- **Security:** Digital signature verification and checksum validation
- **Compatibility:** Interface implementation validation
- **Resource:** Memory and capability assessment
**Initialization Phase:**
- **Purpose:** Configure plugin with runtime settings
- **Timeout:** Bounded initialization time to prevent hanging
- **Dependencies:** External service connectivity verification
- **State:** Internal state setup and resource allocation
**Runtime Phase:**
- **Purpose:** Active request processing
- **Monitoring:** Continuous health checking and performance tracking
- **Recovery:** Automatic error recovery and degraded mode handling
- **Metrics:** Real-time performance and health metrics collection
> **Plugin Lifecycle:** [Plugin Management →](../../enterprise/custom-plugins)
---
## Plugin Execution Pipeline
### **Request Processing Flow**
The plugin pipeline ensures consistent, predictable execution while maintaining high performance:
#### **Normal Execution Flow (No Short-Circuit)**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Plugin2
participant Provider
Client->>Bifrost: Request
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: modified request
Bifrost->>Plugin2: PreLLMHook(request)
Plugin2-->>Bifrost: modified request
Bifrost->>Provider: API Call
Provider-->>Bifrost: response
Bifrost->>Plugin2: PostLLMHook(response)
Plugin2-->>Bifrost: modified response
Bifrost->>Plugin1: PostLLMHook(response)
Plugin1-->>Bifrost: modified response
Bifrost-->>Client: Final Response
```
**Execution Order:**
1. **PreHooks:** Execute in registration order (1 → 2 → N)
2. **Provider Call:** If no short-circuit occurred
3. **PostHooks:** Execute in reverse order (N → 2 → 1)
#### **Short-Circuit Response Flow (Cache Hit)**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Cache
participant Auth
participant Provider
Client->>Bifrost: Request
Bifrost->>Auth: PreLLMHook(request)
Auth-->>Bifrost: modified request
Bifrost->>Cache: PreLLMHook(request)
Cache-->>Bifrost: LLMPluginShortCircuit{Response}
Note over Provider: Provider call skipped
Bifrost->>Cache: PostLLMHook(response)
Cache-->>Bifrost: modified response
Bifrost->>Auth: PostLLMHook(response)
Auth-->>Bifrost: modified response
Bifrost-->>Client: Cached Response
```
#### **Streaming Response Flow**
For streaming responses, the plugin pipeline executes post-hooks for every delta/chunk received from the provider:
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Plugin2
participant Provider
Client->>Bifrost: Stream Request
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: modified request
Bifrost->>Plugin2: PreLLMHook(request)
Plugin2-->>Bifrost: modified request
Bifrost->>Provider: Stream API Call
loop For Each Delta
Provider-->>Bifrost: stream delta
Bifrost->>Plugin2: PostLLMHook(delta)
Plugin2-->>Bifrost: modified delta
Bifrost->>Plugin1: PostLLMHook(delta)
Plugin1-->>Bifrost: modified delta
Bifrost-->>Client: Send Delta
end
Provider-->>Bifrost: final chunk (finish reason)
Bifrost->>Plugin2: PostLLMHook(final)
Plugin2-->>Bifrost: modified final
Bifrost->>Plugin1: PostLLMHook(final)
Plugin1-->>Bifrost: modified final
Bifrost-->>Client: Final Chunk
```
**Streaming Execution Characteristics:**
1. **Delta Processing:**
- Each stream delta (chunk) goes through all post-hooks
- Plugins can modify/transform each delta before it reaches the client
- Deltas can contain: text content, tool calls, role changes, or usage info
2. **Special Delta Types:**
- **Start Event:** Initial delta with role information
- **Content Delta:** Regular text or tool call content
- **Usage Update:** Token usage statistics (if enabled)
- **Final Chunk:** Contains finish reason and any final metadata
3. **Plugin Considerations:**
- Plugins must handle streaming responses efficiently
- Each delta should be processed quickly to maintain stream responsiveness
- Plugins can track state across deltas using context
- Heavy processing should be done asynchronously
4. **Error Handling:**
- If a post-hook returns an error, it's sent as an error stream chunk
- Stream is terminated after error chunks
- Plugins can recover from errors by providing valid responses
5. **Performance Optimization:**
- Lightweight delta processing to minimize latency
- Object pooling for common data structures
- Non-blocking operations for logging and metrics
- Efficient memory management for stream processing
> **Streaming Details:** [Streaming Guide →](../../quickstart/gateway/streaming)
**Short-Circuit Rules:**
- **Provider Skipped:** When plugin returns short-circuit response/error
- **PostLLMHook Guarantee:** All executed PreHooks get corresponding PostLLMHook calls
- **Reverse Order:** PostHooks execute in reverse order of PreHooks
#### **Short-Circuit Error Flow (Allow Fallbacks)**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Provider1
participant Provider2
Client->>Bifrost: Request (Provider1 + Fallback Provider2)
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: LLMPluginShortCircuit{Error, AllowFallbacks=true}
Note over Provider1: Provider1 call skipped
Bifrost->>Plugin1: PostLLMHook(error)
Plugin1-->>Bifrost: error unchanged
Note over Bifrost: Try fallback provider
Bifrost->>Plugin1: PreLLMHook(request for Provider2)
Plugin1-->>Bifrost: modified request
Bifrost->>Provider2: API Call
Provider2-->>Bifrost: response
Bifrost->>Plugin1: PostLLMHook(response)
Plugin1-->>Bifrost: modified response
Bifrost-->>Client: Final Response
```
#### **Error Recovery Flow**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Plugin2
participant Provider
participant RecoveryPlugin
Client->>Bifrost: Request
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: modified request
Bifrost->>Plugin2: PreLLMHook(request)
Plugin2-->>Bifrost: modified request
Bifrost->>RecoveryPlugin: PreLLMHook(request)
RecoveryPlugin-->>Bifrost: modified request
Bifrost->>Provider: API Call
Provider-->>Bifrost: error
Bifrost->>RecoveryPlugin: PostLLMHook(error)
RecoveryPlugin-->>Bifrost: recovered response
Bifrost->>Plugin2: PostLLMHook(response)
Plugin2-->>Bifrost: modified response
Bifrost->>Plugin1: PostLLMHook(response)
Plugin1-->>Bifrost: modified response
Bifrost-->>Client: Recovered Response
```
**Error Recovery Features:**
- **Error Transformation:** Plugins can convert errors to successful responses
- **Graceful Degradation:** Provide fallback responses for service failures
- **Context Preservation:** Error context is maintained through recovery process
### **Complex Plugin Decision Flow**
Real-world plugin interactions involving authentication, rate limiting, and caching with different decision paths:
```mermaid
graph TD
A["Client Request"] --> B["Bifrost"]
B --> C["Auth Plugin PreLLMHook"]
C --> D{"Authenticated?"}
D -->|No| E["Return Auth Error<br/>AllowFallbacks=false"]
D -->|Yes| F["RateLimit Plugin PreLLMHook"]
F --> G{"Rate Limited?"}
G -->|Yes| H["Return Rate Error<br/>AllowFallbacks=nil"]
G -->|No| I["Cache Plugin PreLLMHook"]
I --> J{"Cache Hit?"}
J -->|Yes| K["Return Cached Response"]
J -->|No| L["Provider API Call"]
L --> M["Cache Plugin PostLLMHook"]
M --> N["Store in Cache"]
N --> O["RateLimit Plugin PostLLMHook"]
O --> P["Auth Plugin PostLLMHook"]
P --> Q["Final Response"]
E --> R["Skip Fallbacks"]
H --> S["Try Fallback Provider"]
K --> T["Skip Provider Call"]
```
### **Execution Characteristics**
**Symmetric Execution Pattern:**
- **Pre-processing:** Plugins execute in priority order (high to low)
- **Post-processing:** Plugins execute in reverse order (low to high)
- **Rationale:** Ensures proper cleanup and state management (last in, first out)
**Performance Optimizations:**
- **Timeout Boundaries:** Each plugin has configurable execution timeouts
- **Panic Recovery:** Plugin panics are caught and logged without crashing the system
- **Resource Limits:** Memory and CPU limits prevent runaway plugins
- **Circuit Breaking:** Repeated failures trigger plugin isolation
**Error Handling Strategies:**
- **Continue:** Use original request/response if plugin fails
- **Fail Fast:** Return error immediately if critical plugin fails
- **Retry:** Attempt plugin execution with exponential backoff
- **Fallback:** Use alternative plugin or default behavior
> **Plugin Execution:** [Request Flow →](./request-flow#stage-3-plugin-pipeline-processing)
---
## Security & Validation
### **Multi-Layer Security Model**
Plugin security operates at multiple layers to ensure system integrity:
```mermaid
graph TB
subgraph "Security Validation Layers"
L1[Layer 1: Binary Validation<br/>Signature & Checksum]
L2[Layer 2: Interface Validation<br/>Type Safety & Compatibility]
L3[Layer 3: Runtime Validation<br/>Resource Limits & Timeouts]
L4[Layer 4: Execution Isolation<br/>Panic Recovery & Error Handling]
end
subgraph "Security Benefits"
Integrity[Code Integrity<br/>Verified Authenticity]
Safety[Type Safety<br/>Compile-time Checks]
Stability[System Stability<br/>Isolated Failures]
Performance[Performance Protection<br/>Resource Limits]
end
L1 --> Integrity
L2 --> Safety
L3 --> Performance
L4 --> Stability
```
### **Validation Process**
**Binary Security:**
- **Digital Signatures:** Cryptographic verification of plugin authenticity
- **Checksum Validation:** File integrity verification
- **Source Verification:** Trusted source requirements
**Interface Security:**
- **Type Safety:** Interface implementation verification
- **Version Compatibility:** Plugin API version checking
- **Memory Safety:** Safe memory access patterns
**Runtime Security:**
- **Resource Quotas:** Memory and CPU usage limits
- **Execution Timeouts:** Bounded execution time
- **Sandbox Execution:** Isolated execution environment
**Operational Security:**
- **Health Monitoring:** Continuous plugin health assessment
- **Error Tracking:** Plugin error rate monitoring
- **Automatic Recovery:** Failed plugin restart and recovery
---
## Plugin Performance & Monitoring
### **Comprehensive Metrics System**
Bifrost provides detailed metrics for plugin performance and health monitoring:
```mermaid
graph TB
subgraph "Execution Metrics"
ExecTime[Execution Time<br/>Latency per Plugin]
ExecCount[Execution Count<br/>Request Volume]
SuccessRate[Success Rate<br/>Error Percentage]
Throughput[Throughput<br/>Requests/Second]
end
subgraph "Resource Metrics"
MemoryUsage[Memory Usage<br/>Per Plugin Instance]
CPUUsage[CPU Utilization<br/>Processing Time]
IOMetrics[I/O Operations<br/>Network/Disk Activity]
PoolUtilization[Pool Utilization<br/>Resource Efficiency]
end
subgraph "Health Metrics"
ErrorRate[Error Rate<br/>Failed Executions]
PanicCount[Panic Recovery<br/>Crash Events]
TimeoutCount[Timeout Events<br/>Slow Executions]
RecoveryRate[Recovery Success<br/>Failure Handling]
end
subgraph "Business Metrics"
AddedLatency[Added Latency<br/>Plugin Overhead]
SystemImpact[System Impact<br/>Overall Performance]
FeatureUsage[Feature Usage<br/>Plugin Utilization]
CostImpact[Cost Impact<br/>Resource Consumption]
end
```
### **Performance Characteristics**
**Plugin Execution Performance:**
- **Typical Overhead:** 1-10μs per plugin for simple operations
- **Authentication Plugins:** 1-5μs for key validation
- **Rate Limiting Plugins:** 500ns for quota checks
- **Monitoring Plugins:** 200ns for metric collection
- **Transformation Plugins:** 2-10μs depending on complexity
**Resource Usage Patterns:**
- **Memory Efficiency:** Object pooling reduces allocations
- **CPU Optimization:** Minimal processing overhead
- **Network Impact:** Configurable external service calls
- **Storage Overhead:** Minimal for stateless plugins
---
## Plugin Integration Patterns
### **Common Integration Scenarios**
**1. Authentication & Authorization**
- **Pre-processing Hook:** Validate API keys or JWT tokens
- **Configuration:** External identity provider integration
- **Error Handling:** Return 401/403 responses for invalid credentials
- **Performance:** Sub-5μs validation with caching
**2. Rate Limiting & Quotas**
- **Pre-processing Hook:** Check request quotas and limits
- **Storage:** Redis or in-memory rate limit tracking
- **Algorithms:** Token bucket, sliding window, fixed window
- **Responses:** 429 Too Many Requests with retry headers
**3. Request/Response Transformation**
- **Dual Hooks:** Pre-processing for requests, post-processing for responses
- **Use Cases:** Data format conversion, field mapping, content filtering
- **Performance:** Streaming transformations for large payloads
- **Compatibility:** Provider-specific format adaptations
**4. Monitoring & Analytics**
- **Post-processing Hook:** Collect metrics and logs after request completion
- **Destinations:** Prometheus, DataDog, custom analytics systems
- **Data:** Request/response metadata, performance metrics, error tracking
- **Privacy:** Configurable data sanitization and filtering
### **Plugin Communication Patterns**
**Plugin-to-Plugin Communication:**
- **Shared Context:** Plugins can store data in request context for downstream plugins
- **Event System:** Plugin can emit events for other plugins to consume
- **Data Passing:** Structured data exchange between related plugins
**Plugin-to-External Service Communication:**
- **HTTP Clients:** Built-in HTTP client pools for external API calls
- **Database Connections:** Connection pooling for database access
- **Message Queues:** Integration with message queue systems
- **Caching Systems:** Redis, Memcached integration for state storage
> **📖 Integration Examples:** [Plugin Development Guide →](../../enterprise/custom-plugins)
---
## Related Architecture Documentation
- **[Request Flow](./request-flow)** - Plugin execution in request processing pipeline
- **[Concurrency Model](./concurrency)** - Plugin concurrency and threading considerations
- **[Benchmarks](../../benchmarking/getting-started)** - Plugin performance characteristics and optimization
- **[MCP System](./mcp)** - Integration between plugins and MCP system

View File

View File

@@ -0,0 +1,527 @@
---
title: "Request Flow"
description: "Deep dive into Bifrost's request processing pipeline - from transport layer ingestion through provider execution to response delivery."
icon: "route"
---
## Stage 1: Transport Layer Processing
### **HTTP Transport Flow**
```mermaid
sequenceDiagram
participant Client
participant HTTPTransport
participant Router
participant Validation
Client->>HTTPTransport: POST /v1/chat/completions
HTTPTransport->>HTTPTransport: Parse Headers
HTTPTransport->>HTTPTransport: Extract Body
HTTPTransport->>Validation: Validate JSON Schema
Validation->>Router: BifrostRequest
Router-->>HTTPTransport: Processing Started
HTTPTransport-->>Client: HTTP 200 (async processing)
```
**Key Processing Steps:**
1. **Request Reception** - FastHTTP server receives request
2. **Header Processing** - Extract authentication, content-type, custom headers
3. **Body Parsing** - JSON unmarshaling with schema validation
4. **Request Transformation** - Convert to internal `BifrostRequest` schema
5. **Context Creation** - Build request context with metadata
**Performance Characteristics:**
- **Parsing Time:** ~2.1μs for typical requests
- **Validation Overhead:** ~400ns for schema checks
- **Memory Allocation:** Zero-copy where possible
### **Go SDK Flow**
```mermaid
sequenceDiagram
participant Application
participant SDK
participant Core
participant Validation
Application->>SDK: bifrost.ChatCompletion(req)
SDK->>SDK: Type Validation
SDK->>Core: Direct Function Call
Core->>Validation: Schema Validation
Validation-->>Core: Validated Request
Core-->>SDK: Processing Result
SDK-->>Application: Typed Response
```
**Advantages:**
- **Zero Serialization** - Direct Go struct passing
- **Type Safety** - Compile-time validation
- **Lower Latency** - No HTTP/JSON overhead
- **Memory Efficiency** - No intermediate allocations
---
## Stage 2: Request Routing & Load Balancing
### **Provider Selection Logic**
```mermaid
flowchart TD
Request[Incoming Request] --> ModelCheck{Model Available?}
ModelCheck -->|Yes| ProviderDirect[Use Specified Provider]
ModelCheck -->|No| ModelMapping[Model → Provider Mapping]
ProviderDirect --> KeyPool[API Key Pool]
ModelMapping --> KeyPool
KeyPool --> WeightedSelect[Weighted Random Selection]
WeightedSelect --> HealthCheck{Provider Healthy?}
HealthCheck -->|Yes| AssignWorker[Assign Worker]
HealthCheck -->|No| CircuitBreaker[Circuit Breaker]
CircuitBreaker --> FallbackCheck{Fallback Available?}
FallbackCheck -->|Yes| FallbackProvider[Try Fallback]
FallbackCheck -->|No| ErrorResponse[Return Error]
FallbackProvider --> KeyPool
```
**Key Selection Algorithm:**
```go
// Weighted random key selection
type KeySelector struct {
keys []APIKey
weights []float64
total float64
}
func (ks *KeySelector) SelectKey() *APIKey {
r := rand.Float64() * ks.total
cumulative := 0.0
for i, weight := range ks.weights {
cumulative += weight
if r <= cumulative {
return &ks.keys[i]
}
}
return &ks.keys[len(ks.keys)-1]
}
```
**Performance Metrics:**
- **Key Selection Time:** ~10ns (constant time)
- **Health Check Overhead:** ~50ns (cached results)
- **Fallback Decision:** ~25ns (configuration lookup)
---
## Stage 3: Plugin Pipeline Processing
### **Pre-Processing Hooks**
```mermaid
sequenceDiagram
participant Request
participant AuthPlugin
participant RateLimitPlugin
participant TransformPlugin
participant Core
Request->>AuthPlugin: ProcessRequest()
AuthPlugin->>AuthPlugin: Validate API Key
AuthPlugin->>RateLimitPlugin: Authorized Request
RateLimitPlugin->>RateLimitPlugin: Check Rate Limits
RateLimitPlugin->>TransformPlugin: Allowed Request
TransformPlugin->>TransformPlugin: Modify Request
TransformPlugin->>Core: Final Request
```
**Plugin Execution Model:**
```go
type PluginManager struct {
plugins []Plugin
}
func (pm *PluginManager) ExecutePreHooks(
ctx BifrostContext,
req *BifrostRequest,
) (*BifrostRequest, *BifrostError) {
for _, plugin := range pm.plugins {
modifiedReq, err := plugin.ProcessRequest(ctx, req)
if err != nil {
return nil, err
}
req = modifiedReq
}
return req, nil
}
```
**Plugin Types & Performance:**
| Plugin Type | Processing Time | Memory Impact | Failure Mode |
| --------------------- | --------------- | ------------- | ---------------------- |
| **Authentication** | ~1-5μs | Minimal | Reject request |
| **Rate Limiting** | ~500ns | Cache-based | Throttle/reject |
| **Request Transform** | ~2-10μs | Copy-on-write | Continue with original |
| **Monitoring** | ~200ns | Append-only | Continue silently |
---
## Stage 4: MCP Tool Discovery & Integration
### **Tool Discovery Process**
```mermaid
flowchart TD
Request[Request with Model] --> MCPCheck{MCP Enabled?}
MCPCheck -->|No| SkipMCP[Skip MCP Processing]
MCPCheck -->|Yes| ClientLookup[MCP Client Lookup]
ClientLookup --> ToolFilter[Tool Filtering]
ToolFilter --> ToolInject[Inject Tools into Request]
ToolFilter --> IncludeCheck{Include Filter?}
ToolFilter --> ExcludeCheck{Exclude Filter?}
IncludeCheck -->|Yes| IncludeTools[Include Specified Tools]
IncludeCheck -->|No| AllTools[Include All Tools]
ExcludeCheck -->|Yes| RemoveTools[Remove Excluded Tools]
ExcludeCheck -->|No| KeepFiltered[Keep Filtered Tools]
IncludeTools --> ToolInject
AllTools --> ToolInject
RemoveTools --> ToolInject
KeepFiltered --> ToolInject
ToolInject --> EnhancedRequest[Request with Tools]
SkipMCP --> EnhancedRequest
```
**Tool Integration Algorithm:**
```go
func (mcpm *MCPManager) EnhanceRequest(
ctx BifrostContext,
req *BifrostChatRequest,
) (*BifrostRequest, error) {
// Extract tool filtering from context
includeClients := ctx.GetStringSlice("mcp-include-clients")
includeTools := ctx.GetStringSlice("mcp-include-tools")
// Get available tools
availableTools := mcpm.getAvailableTools(includeClients)
// Filter tools
filteredTools := mcpm.filterTools(availableTools, includeTools)
// Inject into request
if req.Params == nil {
req.Params = &ChatParameters{}
}
req.Params.Tools = append(req.Params.Tools, filteredTools...)
return req, nil
}
```
**MCP Performance Impact:**
- **Tool Discovery:** ~100-500μs (cached after first request)
- **Tool Filtering:** ~50-200ns per tool
- **Request Enhancement:** ~1-5μs depending on tool count
---
## Stage 5: Memory Pool Management
### **Object Pool Lifecycle**
```mermaid
stateDiagram-v2
[*] --> PoolInit: System Startup
PoolInit --> Available: Objects Pre-allocated
Available --> Acquired: Request Processing
Acquired --> InUse: Object Populated
InUse --> Processing: Worker Processing
Processing --> Completed: Processing Done
Completed --> Reset: Object Cleanup
Reset --> Available: Return to Pool
Available --> Expansion: Pool Exhaustion
Expansion --> Available: New Objects Created
Reset --> GC: Pool Full
GC --> [*]: Garbage Collection
```
**Memory Pool Implementation:**
```go
type MemoryPools struct {
channelPool sync.Pool
messagePool sync.Pool
responsePool sync.Pool
bufferPool sync.Pool
}
func (mp *MemoryPools) GetChannel() *ProcessingChannel {
if ch := mp.channelPool.Get(); ch != nil {
return ch.(*ProcessingChannel)
}
return NewProcessingChannel()
}
func (mp *MemoryPools) ReturnChannel(ch *ProcessingChannel) {
ch.Reset() // Clear previous data
mp.channelPool.Put(ch)
}
```
---
## Stage 6: Worker Pool Processing
### **Worker Assignment & Execution**
```mermaid
sequenceDiagram
participant Queue
participant WorkerPool
participant Worker
participant Provider
participant Circuit
Queue->>WorkerPool: Enqueue Request
WorkerPool->>Worker: Assign Available Worker
Worker->>Circuit: Check Circuit Breaker
Circuit->>Provider: Forward Request
Provider-->>Circuit: Response/Error
Circuit->>Circuit: Update Health Metrics
Circuit-->>Worker: Provider Response
Worker-->>WorkerPool: Release Worker
WorkerPool-->>Queue: Request Completed
```
**Worker Pool Architecture:**
```go
type ProviderWorkerPool struct {
workers chan *Worker
queue chan *ProcessingJob
config WorkerPoolConfig
metrics *PoolMetrics
}
func (pwp *ProviderWorkerPool) ProcessRequest(job *ProcessingJob) {
// Get worker from pool
worker := <-pwp.workers
go func() {
defer func() {
// Return worker to pool
pwp.workers <- worker
}()
// Process request
result := worker.Execute(job)
job.ResultChan <- result
}()
}
```
---
## Stage 7: Provider API Communication
### **HTTP Request Execution**
```mermaid
sequenceDiagram
participant Worker
participant HTTPClient
participant Provider
participant CircuitBreaker
participant Metrics
Worker->>HTTPClient: PrepareRequest()
HTTPClient->>HTTPClient: Add Headers & Auth
HTTPClient->>CircuitBreaker: CheckHealth()
CircuitBreaker->>Provider: HTTP Request
Provider-->>CircuitBreaker: HTTP Response
CircuitBreaker->>Metrics: Record Metrics
CircuitBreaker-->>HTTPClient: Response/Error
HTTPClient-->>Worker: Parsed Response
```
**Request Preparation Pipeline:**
```go
func (w *ProviderWorker) ExecuteRequest(job *ProcessingJob) *ProviderResponse {
// Prepare HTTP request
httpReq := w.prepareHTTPRequest(job.Request)
// Add authentication
w.addAuthentication(httpReq, job.APIKey)
// Execute with timeout
ctx, cancel := context.WithTimeout(context.Background(), job.Timeout)
defer cancel()
httpResp, err := w.httpClient.Do(httpReq.WithContext(ctx))
if err != nil {
return w.handleError(err, job)
}
// Parse response
return w.parseResponse(httpResp, job)
}
```
---
## Stage 8: Tool Execution & Response Processing
### **MCP Tool Execution Flow**
```mermaid
sequenceDiagram
participant Provider
participant MCPProcessor
participant MCPServer
participant ToolExecutor
participant ResponseBuilder
Provider->>MCPProcessor: Response with Tool Calls
MCPProcessor->>MCPProcessor: Extract Tool Calls
loop For each tool call
MCPProcessor->>MCPServer: Execute Tool
MCPServer->>ToolExecutor: Tool Invocation
ToolExecutor-->>MCPServer: Tool Result
MCPServer-->>MCPProcessor: Tool Response
end
MCPProcessor->>ResponseBuilder: Combine Results
ResponseBuilder-->>Provider: Enhanced Response
```
**Tool Execution Pipeline:**
```go
func (mcp *MCPProcessor) ProcessToolCalls(
response *ProviderResponse,
) (*ProviderResponse, error) {
toolCalls := mcp.extractToolCalls(response)
if len(toolCalls) == 0 {
return response, nil
}
// Execute tools concurrently
results := make(chan ToolResult, len(toolCalls))
for _, toolCall := range toolCalls {
go func(tc ToolCall) {
result := mcp.executeTool(tc)
results <- result
}(toolCall)
}
// Collect results
toolResults := make([]ToolResult, 0, len(toolCalls))
for i := 0; i < len(toolCalls); i++ {
toolResults = append(toolResults, <-results)
}
// Enhance response
return mcp.enhanceResponse(response, toolResults), nil
}
```
---
## Stage 9: Post-Processing & Response Formation
### **Plugin Post-Processing**
```mermaid
sequenceDiagram
participant CoreResponse
participant LoggingPlugin
participant CachePlugin
participant MetricsPlugin
participant Transport
CoreResponse->>LoggingPlugin: ProcessResponse()
LoggingPlugin->>LoggingPlugin: Log Request/Response
LoggingPlugin->>CachePlugin: Response + Logs
CachePlugin->>CachePlugin: Cache Response
CachePlugin->>MetricsPlugin: Cached Response
MetricsPlugin->>MetricsPlugin: Record Metrics
MetricsPlugin->>Transport: Final Response
```
**Response Enhancement Pipeline:**
```go
func (pm *PluginManager) ExecutePostHooks(
ctx BifrostContext,
req *BifrostRequest,
resp *BifrostResponse,
) (*BifrostResponse, error) {
for _, plugin := range pm.plugins {
enhancedResp, err := plugin.ProcessResponse(ctx, req, resp)
if err != nil {
// Log error but continue processing
pm.logger.Warn("Plugin post-processing error", "plugin", plugin.Name(), "error", err)
continue
}
resp = enhancedResp
}
return resp, nil
}
```
### **Response Serialization**
```mermaid
flowchart TD
Response[BifrostResponse] --> Format{Response Format}
Format -->|HTTP| JSONSerialize[JSON Serialization]
Format -->|SDK| DirectReturn[Direct Go Struct]
JSONSerialize --> Compress[Compression]
DirectReturn --> TypeCheck[Type Validation]
Compress --> Headers[Set Headers]
TypeCheck --> Return[Return Response]
Headers --> HTTPResponse[HTTP Response]
HTTPResponse --> Client[Client Response]
Return --> Client
```
---
## Related Architecture Documentation
- **[Concurrency Model](./concurrency)** - Worker pools and threading details
- **[Plugin System](./plugins)** - Plugin execution and lifecycle
- **[MCP System](./mcp)** - Tool discovery and execution internals
- **[Benchmarks](../../benchmarking/getting-started)** - Detailed performance analysis

View File

@@ -0,0 +1,161 @@
---
title: "Config Store"
description: "A persistent and flexible configuration management system for Bifrost, supporting multiple database backends."
icon: "gear"
---
The ConfigStore is a critical component of the Bifrost framework, providing a centralized and persistent storage solution for all gateway configurations. It abstracts the underlying database, offering a unified API for managing everything from provider settings and virtual keys to governance policies and plugin configurations.
## Core Features
- **Unified Configuration API**: A single interface (`ConfigStore`) for all configuration CRUD (Create, Read, Update, Delete) operations.
- **Multiple Backend Support**: Out-of-the-box support for SQLite and PostgreSQL, with an extensible architecture for adding new database backends.
- **Comprehensive Data Management**: Manages a wide range of configuration data, including:
- Provider and key settings
- Virtual keys and governance rules (budgets, rate limits)
- Customer and team information for multi-tenancy
- Plugin configurations
- Vector store and log store settings
- Model pricing information
- **Transactional Operations**: Ensures data consistency by supporting atomic transactions for complex configuration changes.
- **Database Migrations**: Integrated migration system to manage schema evolution across different versions of Bifrost.
- **Environment Variable Handling**: Securely manages sensitive data like API keys by storing references to environment variables instead of raw values.
## Architecture
The ConfigStore is designed around the `ConfigStore` interface, which defines all the methods for interacting with the configuration data. The primary implementation is `RDBConfigStore`, which uses [GORM](https://gorm.io/) as an ORM to communicate with relational databases.
### Supported Backends
- **SQLite**: The default, file-based database, perfect for local development, testing, and single-node deployments. It requires no external services.
- **PostgreSQL**: A robust, production-grade database suitable for large-scale, high-availability deployments.
The backend is selected and configured in Bifrost's main configuration file.
### Initialization
The ConfigStore is initialized at startup based on the provided configuration.
```go
import (
"github.com/maximhq/bifrost/framework/configstore"
"github.com/maximhq/bifrost/core/schemas"
)
// Example: Initialize a SQLite-based ConfigStore
config := &configstore.Config{
Enabled: true,
Type: configstore.ConfigStoreTypeSQLite,
Config: &configstore.SQLiteConfig{
File: "/path/to/config.db",
},
}
var logger schemas.Logger // Assume logger is initialized
store, err := configstore.NewConfigStore(context.Background(), config, logger)
if err != nil {
// Handle error
}
```
Here is an example for initializing a PostgreSQL-based `ConfigStore`:
```go
// Example: Initialize a PostgreSQL-based ConfigStore
pgConfig := &configstore.Config{
Enabled: true,
Type: configstore.ConfigStoreTypePostgres,
Config: &configstore.PostgresConfig{
Host: "localhost",
Port: "5432",
User: "postgres",
Password: "secret",
DBName: "bifrost",
SSLMode: "disable",
MaxIdleConns: 5, // Optional: Maximum idle connections (default: 5)
MaxOpenConns: 50, // Optional: Maximum open connections (default: 50)
},
}
store, err = configstore.NewConfigStore(context.Background(), pgConfig, logger)
if err != nil {
// Handle error
}
```
<Note>
PostgreSQL databases used by Bifrost stores must be UTF8 encoded. See [PostgreSQL UTF8 Requirement](../../quickstart/gateway/setting-up#postgresql-utf8-requirement).
</Note>
### Connection Pool Configuration
For PostgreSQL backends, you can configure the database connection pool to optimize performance based on your workload:
- **MaxIdleConns**: Maximum number of idle connections in the pool (default: 5)
- **MaxOpenConns**: Maximum number of open connections to the database (default: 50)
These parameters help manage database connection resources effectively. Increase them for high-traffic deployments or decrease them for resource-constrained environments.
## Data Models
The ConfigStore manages a variety of data models, which are defined as GORM tables in the `framework/configstore/tables` directory. Some of the key models include:
- `TableVirtualKey`: Represents a virtual key with its associated governance rules, keys, and metadata.
- `TableProvider` & `TableKey`: Store provider-specific configurations and the physical API keys.
- `TableBudget` & `TableRateLimit`: Define spending limits and request rate limits for governance.
- `TableCustomer` & `TableTeam`: Enable multi-tenant configurations.
- `TableModelPricing`: Caches model pricing information for cost calculation.
- `TablePlugin`: Stores configuration for loaded plugins.
## Usage
The `ConfigStore` interface provides a rich set of methods for managing Bifrost's configuration.
### Managing Virtual Keys
```go
// Create a new virtual key
newKey := &tables.TableVirtualKey{
ID: "vk-12345",
Name: "My Test Key",
// ... other fields
}
err := store.CreateVirtualKey(ctx, newKey)
// Retrieve a virtual key
virtualKey, err := store.GetVirtualKey(ctx, "vk-12345")
```
### Managing Providers
```go
// Get all provider configurations
providers, err := store.GetProvidersConfig(ctx)
// Update a specific provider
providerConfig := providers[schemas.OpenAI]
providerConfig.NetworkConfig.TimeoutSeconds = 120
err = store.UpdateProvider(ctx, schemas.OpenAI, providerConfig, envKeys)
```
### Executing Transactions
For operations that require multiple database writes, you can use a transaction to ensure atomicity.
```go
err := store.ExecuteTransaction(ctx, func(tx *gorm.DB) error {
// Perform multiple operations within this transaction
if err := store.CreateBudget(ctx, budget1, tx); err != nil {
return err // Rollback
}
if err := store.UpdateRateLimit(ctx, limit1, tx); err != nil {
return err // Rollback
}
return nil // Commit
})
```
## Migrations
The ConfigStore includes a migration system to handle database schema changes between Bifrost versions. Migrations are automatically applied at startup, ensuring the database schema is always up-to-date. This process is managed by the `migrator` package and is transparent to the user.
The ConfigStore is a powerful and flexible component that provides the backbone for Bifrost's dynamic configuration capabilities. Its support for multiple backends and transactional operations makes it suitable for both small-scale and large-scale, production environments.

View File

@@ -0,0 +1,176 @@
---
title: "Log Store"
description: "A robust and queryable system for persisting API request and response logs, with support for multiple database backends."
icon: "clipboard-list"
---
The LogStore is a core component of the Bifrost framework responsible for capturing, storing, and retrieving detailed logs of API requests and responses. It provides a persistent, queryable audit trail of all activity passing through the gateway, which is essential for debugging, monitoring, analytics, and compliance.
## Core Features
- **Persistent Logging**: Automatically saves detailed information about each API request, including input, output, status, latency, and cost.
- **Multiple Backend Support**: Comes with built-in support for SQLite and PostgreSQL, allowing you to choose the best storage solution for your deployment needs.
- **Rich Querying and Filtering**: A powerful search API allows you to filter and sort logs based on a wide range of criteria such as provider, model, status, latency, cost, and content.
- **Performance Analytics**: The search functionality also provides aggregated statistics, including total requests, success rate, average latency, total tokens, and total cost for the queried data.
- **Structured Data Model**: Logs are stored in a structured format, with complex objects like message history and tool calls serialized as JSON for efficient storage and retrieval.
- **Automatic Data Management**: Includes GORM hooks to automatically handle JSON serialization/deserialization and to build a searchable content summary.
## Architecture
The LogStore is built around the `LogStore` interface, which defines the standard methods for interacting with the log database. The primary implementation, `RDBLogStore`, uses GORM to provide an abstraction over relational databases.
### Supported Backends
- **SQLite**: The default, file-based database, ideal for local development and smaller, single-node deployments.
- **PostgreSQL**: A production-ready database for scalable and high-availability deployments.
The backend is configured in Bifrost's main configuration file.
### Initialization
The LogStore is initialized at startup based on the provided configuration.
```go
import (
"github.com/maximhq/bifrost/framework/logstore"
"github.com/maximhq/bifrost/core/schemas"
)
// Example: Initialize a SQLite-based LogStore
config := &logstore.Config{
Enabled: true,
Type: logstore.LogStoreTypeSQLite,
Config: &logstore.SQLiteConfig{
File: "/path/to/logs.db",
},
}
var logger schemas.Logger // Assume logger is initialized
store, err := logstore.NewLogStore(context.Background(), config, logger)
if err != nil {
// Handle error
}
```
Here is an example for initializing a PostgreSQL-based `LogStore`:
```go
// Example: Initialize a PostgreSQL-based LogStore
pgConfig := &logstore.Config{
Enabled: true,
Type: logstore.LogStoreTypePostgres,
Config: &logstore.PostgresConfig{
Host: "localhost",
Port: "5432",
User: "postgres",
Password: "secret",
DBName: "bifrost_logs",
SSLMode: "disable",
MaxIdleConns: 5, // Optional: Maximum idle connections (default: 5)
MaxOpenConns: 50, // Optional: Maximum open connections (default: 50)
},
}
store, err = logstore.NewLogStore(context.Background(), pgConfig, logger)
if err != nil {
// Handle error
}
```
<Note>
PostgreSQL databases used by Bifrost stores must be UTF8 encoded. See [PostgreSQL UTF8 Requirement](../../quickstart/gateway/setting-up#postgresql-utf8-requirement).
</Note>
### Connection Pool Configuration
For PostgreSQL backends, you can configure the database connection pool to optimize performance based on your workload:
- **MaxIdleConns**: Maximum number of idle connections in the pool (default: 5)
- **MaxOpenConns**: Maximum number of open connections to the database (default: 50)
These parameters help manage database connection resources effectively. Increase them for high-traffic deployments or decrease them for resource-constrained environments.
## Data Model
The core of the LogStore is the `Log` struct, which represents a single log entry in the `logs` table.
```go
// Log represents a complete log entry for a request/response cycle
type Log struct {
ID string `gorm:"primaryKey;type:varchar(255)"`
Timestamp time.Time `gorm:"index;not null"`
Object string `gorm:"type:varchar(255);index;not null;column:object_type"`
Provider string `gorm:"type:varchar(255);index;not null"`
Model string `gorm:"type:varchar(255);index;not null"`
Latency *float64
Cost *float64 `gorm:"index"`
Status string `gorm:"type:varchar(50);index;not null"` // "processing", "success", or "error"
Stream bool `gorm:"default:false"`
// Denormalized token fields for easier querying
PromptTokens int `gorm:"default:0"`
CompletionTokens int `gorm:"default:0"`
TotalTokens int `gorm:"default:0"`
// JSON serialized fields
InputHistory string `gorm:"type:text"`
OutputMessage string `gorm:"type:text"`
TokenUsage string `gorm:"type:text"`
ErrorDetails string `gorm:"type:text"`
// ... and many more for different data types
}
```
Complex data like message arrays and tool calls are serialized into JSON strings for storage and are automatically deserialized back into their struct forms when retrieved.
## Usage
### Creating Log Entries
A log entry is created by populating a `Log` struct and passing it to the `Create` method. This is typically handled internally by Bifrost's logging plugins.
```go
logEntry := &logstore.Log{
ID: "req-xyz123",
Timestamp: time.Now(),
Provider: "openai",
Model: "gpt-4",
Status: "success",
// ... other fields
}
err := store.Create(ctx, logEntry)
```
### Searching and Filtering Logs
The `SearchLogs` method provides a powerful way to query logs with fine-grained filters and pagination.
```go
// Define search criteria
filters := logstore.SearchFilters{
Providers: []string{"openai", "anthropic"},
Status: []string{"error"},
StartTime: &startTime, // time.Time pointer
}
pagination := logstore.PaginationOptions{
Limit: 50,
Offset: 0,
SortBy: "timestamp",
Order: "desc",
}
// Execute the search
results, err := store.SearchLogs(ctx, filters, pagination)
if err != nil {
// Handle error
}
// Process the results
for _, log := range results.Logs {
fmt.Printf("Found log: %s\n", log.ID)
}
// Access aggregated stats
fmt.Printf("Total errors: %d\n", results.Stats.TotalRequests)
```
The LogStore is an indispensable tool for observability in Bifrost, providing the detailed audit trail needed to monitor, debug, and analyze AI application performance and behavior effectively.

View File

@@ -0,0 +1,412 @@
---
title: "Model Catalog"
description: "A centralized system for managing model information, pricing, and capabilities across all supported AI providers."
icon: "book-open"
---
The Model Catalog is a foundational component of Bifrost that provides a unified interface for managing AI models, including their pricing, capabilities, and availability. It serves as a centralized repository for all model-related information, enabling dynamic cost calculation, intelligent model routing, and efficient resource management.
<Info>
**Related Documentation**: The Model Catalog powers Bifrost's intelligent routing system. See [Provider Routing](/providers/provider-routing) for detailed examples of how governance and load balancing use the catalog to make routing decisions, including cross-provider scenarios and weighted routing via proxy providers.
</Info>
## Core Features
### **1. Automatic Pricing Synchronization**
The Model Catalog manages pricing data through a two-phase approach:
**Startup Behavior:**
- **With ConfigStore**: Downloads a pricing sheet from Maxim's datasheet, persists it to the config store, and then loads it into memory for fast lookups.
- **Without ConfigStore**: Downloads the pricing sheet directly into memory on every startup.
**Ongoing Synchronization:**
- When ConfigStore is available, an automatic sync occurs every 24 hours to keep pricing data current.
- All pricing data is cached in memory for O(1) lookup performance during cost calculations.
This ensures that cost calculations always use the latest pricing information from AI providers while maintaining optimal performance.
### **2. Multi-Modal Cost Calculation**
It supports diverse pricing models across different AI operation types:
- **Text Operations**: Token-based pricing for chat completions, text completions, responses, and embeddings. Cache-read/cache-write pricing applies to chat/text/responses when providers surface prompt cache token details.
- **Audio Processing**: Character-based, token-based, and duration-based pricing for speech synthesis and transcription, with audio token detail breakdown. Speech responses populate `usage.input_chars` so speech can be billed by input characters in addition to tokens/duration.
- **Image Processing**: Per-image (`input_cost_per_image`/`output_cost_per_image`), per-pixel (`input_cost_per_pixel`/`output_cost_per_pixel`), or token-based pricing with text/image token breakdown.
- **Video Processing**: Token-based or duration-based pricing. Input can use prompt tokens or `input_cost_per_video_per_second`; output can use completion tokens or fall back to `output_cost_per_video_per_second` / `output_cost_per_second`.
- **Reranking**: Input/output token pricing with search query cost support.
- **Prompt Caching**: Separate rates for cache-read tokens (`cached_read_tokens`) and cache-creation tokens (`cached_write_tokens`), both surfaced under `prompt_tokens_details` (see [Prompt Cache Cost Calculation](#prompt-cache-cost-calculation)).
### **3. Model Information Management**
The Model Catalog maintains a pool of available models for each provider, populated from both pricing data and provider list models APIs. This enables:
- **Model Discovery**: Listing all available models for a given provider
- **Provider Discovery**: Finding all providers that support a specific model with intelligent cross-provider resolution (OpenRouter, Vertex, Groq, Bedrock)
- **Model Validation**: Checking if a model is allowed for a provider based on allowed models lists (supports provider-prefixed entries)
### **4. Intelligent Cache Cost Handling**
It integrates with semantic caching to provide accurate cost calculations:
- **Cache Hits**: Zero cost for direct cache hits, and embedding cost only for semantic matches.
- **Cache Misses**: Combined cost of the base model usage plus the embedding generation cost for cache storage.
### **5. Tiered Pricing Support**
The system automatically applies different pricing rates for high-token contexts, reflecting real provider pricing models. Two tiers are supported: above 128k tokens and above 200k tokens, with the higher tier taking precedence when both are configured.
## Configuration
The `ModelCatalog` can be configured during initialization by passing a `Config` struct.
```go
type Config struct {
PricingURL *string `json:"pricing_url,omitempty"`
PricingSyncInterval *time.Duration `json:"pricing_sync_interval,omitempty"`
}
```
- **`PricingURL`**: Overrides the default URL (`https://getbifrost.ai/datasheet`) for downloading the pricing sheet.
- **`PricingSyncInterval`**: Customizes the interval for periodic pricing data synchronization. The default is 24 hours.
This configuration is passed during the initialization of the `ModelCatalog`:
```go
config := &modelcatalog.Config{
PricingURL: "https://my-custom-url.com/pricing.json",
}
modelCatalog, err := modelcatalog.Init(context.Background(), config, configStore, logger)
```
## Architecture
### ModelCatalog
The `ModelCatalog` is the central component that handles all model and pricing operations:
```go
type ModelCatalog struct {
configStore configstore.ConfigStore
logger schemas.Logger
pricingURL string
pricingSyncInterval time.Duration
// In-memory cache for fast access
pricingData map[string]configstoreTables.TableModelPricing
mu sync.RWMutex
modelPool map[schemas.ModelProvider][]string
// Background sync worker
syncTicker *time.Ticker
done chan struct{}
wg sync.WaitGroup
syncCtx context.Context
syncCancel context.CancelFunc
}
```
### Pricing Data Structure
Each model's pricing information includes comprehensive cost metrics, supporting various modalities and tiered pricing:
```go
// PricingEntry represents a single model's pricing information.
// The fields below are an excerpt — see framework/modelcatalog/main.go for the full definition.
type PricingEntry struct {
BaseModel string `json:"base_model,omitempty"`
Provider string `json:"provider"`
Mode string `json:"mode"`
// Costs - Text
InputCostPerToken float64 `json:"input_cost_per_token"`
OutputCostPerToken float64 `json:"output_cost_per_token"`
InputCostPerTokenBatches *float64 `json:"input_cost_per_token_batches,omitempty"`
OutputCostPerTokenBatches *float64 `json:"output_cost_per_token_batches,omitempty"`
InputCostPerTokenPriority *float64 `json:"input_cost_per_token_priority,omitempty"`
OutputCostPerTokenPriority *float64 `json:"output_cost_per_token_priority,omitempty"`
InputCostPerTokenAbove200kTokens *float64 `json:"input_cost_per_token_above_200k_tokens,omitempty"`
OutputCostPerTokenAbove200kTokens *float64 `json:"output_cost_per_token_above_200k_tokens,omitempty"`
// Costs - Cache
CacheCreationInputTokenCost *float64 `json:"cache_creation_input_token_cost,omitempty"`
CacheReadInputTokenCost *float64 `json:"cache_read_input_token_cost,omitempty"`
CacheCreationInputTokenCostAbove200kTokens *float64 `json:"cache_creation_input_token_cost_above_200k_tokens,omitempty"`
CacheReadInputTokenCostAbove200kTokens *float64 `json:"cache_read_input_token_cost_above_200k_tokens,omitempty"`
CacheCreationInputTokenCostAbove1hr *float64 `json:"cache_creation_input_token_cost_above_1hr,omitempty"`
CacheCreationInputTokenCostAbove1hrAbove200kTokens *float64 `json:"cache_creation_input_token_cost_above_1hr_above_200k_tokens,omitempty"`
CacheCreationInputAudioTokenCost *float64 `json:"cache_creation_input_audio_token_cost,omitempty"`
CacheReadInputTokenCostPriority *float64 `json:"cache_read_input_token_cost_priority,omitempty"`
// Costs - Image
InputCostPerImage *float64 `json:"input_cost_per_image,omitempty"`
InputCostPerPixel *float64 `json:"input_cost_per_pixel,omitempty"`
OutputCostPerImage *float64 `json:"output_cost_per_image,omitempty"`
OutputCostPerPixel *float64 `json:"output_cost_per_pixel,omitempty"`
OutputCostPerImagePremiumImage *float64 `json:"output_cost_per_image_premium_image,omitempty"`
OutputCostPerImageAbove512x512Pixels *float64 `json:"output_cost_per_image_above_512_and_512_pixels,omitempty"`
OutputCostPerImageAbove512x512PixelsPremium *float64 `json:"output_cost_per_image_above_512_and_512_pixels_and_premium_image,omitempty"`
OutputCostPerImageAbove1024x1024Pixels *float64 `json:"output_cost_per_image_above_1024_and_1024_pixels,omitempty"`
OutputCostPerImageAbove1024x1024PixelsPremium *float64 `json:"output_cost_per_image_above_1024_and_1024_pixels_and_premium_image,omitempty"`
OutputCostPerImageAbove2048x2048Pixels *float64 `json:"output_cost_per_image_above_2048_and_2048_pixels,omitempty"`
OutputCostPerImageAbove4096x4096Pixels *float64 `json:"output_cost_per_image_above_4096_and_4096_pixels,omitempty"`
OutputCostPerImageLowQuality *float64 `json:"output_cost_per_image_low_quality,omitempty"`
OutputCostPerImageMediumQuality *float64 `json:"output_cost_per_image_medium_quality,omitempty"`
OutputCostPerImageHighQuality *float64 `json:"output_cost_per_image_high_quality,omitempty"`
OutputCostPerImageAutoQuality *float64 `json:"output_cost_per_image_auto_quality,omitempty"`
// Costs - Audio/Video
InputCostPerAudioToken *float64 `json:"input_cost_per_audio_token,omitempty"`
InputCostPerAudioPerSecond *float64 `json:"input_cost_per_audio_per_second,omitempty"`
InputCostPerSecond *float64 `json:"input_cost_per_second,omitempty"`
InputCostPerVideoPerSecond *float64 `json:"input_cost_per_video_per_second,omitempty"`
OutputCostPerAudioToken *float64 `json:"output_cost_per_audio_token,omitempty"`
OutputCostPerVideoPerSecond *float64 `json:"output_cost_per_video_per_second,omitempty"`
OutputCostPerSecond *float64 `json:"output_cost_per_second,omitempty"`
// Costs - Other
SearchContextCostPerQuery *float64 `json:"search_context_cost_per_query,omitempty"`
CodeInterpreterCostPerSession *float64 `json:"code_interpreter_cost_per_session,omitempty"`
}
```
## Usage in Plugins
The Model Catalog is designed to be shared across all Bifrost plugins, providing consistent model information and validation logic for governance, load balancing, and other routing mechanisms.
<Note>
**Governance & Load Balancing**: Both plugins delegate model validation to the Model Catalog's `IsModelAllowedForProvider` method, ensuring consistent handling of cross-provider scenarios and provider-prefixed allowed models. See [Provider Routing](/providers/provider-routing) for configuration examples.
</Note>
### Initialization
In Bifrost's gateway, the `ModelCatalog` is initialized once at the start and shared across all plugins:
```go
import "github.com/maximhq/bifrost/framework/modelcatalog"
// Initialize model catalog with config store and logger
modelCatalog, err := modelcatalog.Init(context.Background(), &modelcatalog.Config{}, configStore, logger)
if err != nil {
return fmt.Errorf("failed to initialize model catalog: %w", err)
}
```
### Basic Cost Calculation
Calculate costs from a Bifrost response:
```go
// Calculate cost for a completed request
cost := modelCatalog.CalculateCost(
result, // *schemas.BifrostResponse
nil, // *PricingLookupScopes (nil = no scoped overrides)
)
logger.Info("Request cost: $%.6f", cost)
```
### Unified Cost Calculation
`CalculateCost` is the single entry point for all cost calculations. It handles all request types, semantic cache billing, and tiered pricing automatically:
```go
// CalculateCost handles all cost scenarios including cache-aware pricing
cost := modelCatalog.CalculateCost(result, nil) // *schemas.BifrostResponse, *PricingLookupScopes
// Cache hits return 0 for direct hits, embedding cost for semantic matches
// Cache misses return base model cost + embedding generation cost
// Returns 0.0 if pricing data is not found (logs a debug message)
```
### Model Discovery
The `ModelCatalog` provides several methods to query for model and provider information.
#### Get Models for a Provider
Retrieve a list of all models supported by a specific provider.
```go
openaiModels := modelCatalog.GetModelsForProvider(schemas.OpenAI)
for _, model := range openaiModels {
logger.Info("Found OpenAI model: %s", model)
}
```
**Thread-safe**: Uses read lock for concurrent access.
#### Get Providers for a Model
Find all providers that offer a specific model, including cross-provider resolution.
```go
gpt4Providers := modelCatalog.GetProvidersForModel("gpt-4o")
for _, provider := range gpt4Providers {
logger.Info("gpt-4o is available from: %s", provider)
}
// Result: [openai, azure, groq] (includes cross-provider mappings)
```
**Cross-Provider Resolution**:
This method implements intelligent cross-provider routing logic to discover all providers that can serve a model:
1. **Direct Match**: Checks each provider's model list in `modelPool` for the exact model name
2. **OpenRouter Format**: For models found in other providers, checks if `provider/model` exists in OpenRouter
- Example: `claude-3-5-sonnet` found in Anthropic → checks OpenRouter for `anthropic/claude-3-5-sonnet`
3. **Vertex Format**: Similar check for Vertex with `provider/model` format
4. **Groq OpenAI Compatibility**: For GPT models, checks if `openai/model` exists in Groq's catalog
5. **Bedrock Claude Models**: For Claude models, flexible matching against Bedrock's full ARN format
**Example**:
```go
providers := modelCatalog.GetProvidersForModel("claude-3-5-sonnet")
// Returns: [anthropic, vertex, bedrock, openrouter]
// Even though request was just "claude-3-5-sonnet" without provider prefix!
```
<Note>
This cross-provider logic powers Bifrost's intelligent routing capabilities. See [Provider Routing](/providers/provider-routing#the-model-catalog) for detailed examples of how this enables features like weighted routing via proxy providers.
</Note>
#### Check Model Allowance for Provider
Validate if a model is allowed for a specific provider based on an allowed models list. This method is used internally by governance and load balancing plugins.
```go
// ["*"] wildcard - uses catalog to determine support
isAllowed := modelCatalog.IsModelAllowedForProvider(
schemas.OpenRouter,
"gpt-4o",
schemas.WhiteList{"*"}, // wildcard = check catalog
)
// Returns: true (catalog knows OpenRouter supports openai/gpt-4o)
// Explicit allowedModels with provider prefix
isAllowed := modelCatalog.IsModelAllowedForProvider(
schemas.OpenRouter,
"gpt-4o",
schemas.WhiteList{"openai/gpt-4o", "anthropic/claude-3-5-sonnet"},
)
// Returns: true (strips "openai/" prefix and matches "gpt-4o")
// Explicit allowedModels without prefix
isAllowed := modelCatalog.IsModelAllowedForProvider(
schemas.OpenAI,
"gpt-4o",
schemas.WhiteList{"gpt-4o", "gpt-4o-mini"},
)
// Returns: true (direct match)
```
**Behavior**:
- **`["*"]` wildcard**: Delegates to `GetProvidersForModel` (includes cross-provider logic) — this is the "allow all via catalog" mode
- **Non-empty explicit list**: Checks for both direct matches and provider-prefixed entries
- **Empty slice (`[]string{}` / empty `schemas.WhiteList`)**: Returns `false` (deny-all) — mirrors the config deny-by-default semantics
<Note>
In `config.json` and the governance API, `allowed_models: []` (empty array) means **deny all models** (deny-by-default, v1.5.0+). The Go helper `IsModelAllowedForProvider` behaves the same way: an empty `allowedModels` slice also returns `false`. Use `["*"]` to allow all models validated through the catalog.
</Note>
- Direct: `"gpt-4o"` matches `"gpt-4o"`
- Prefixed: `"openai/gpt-4o"` matches request for `"gpt-4o"` (prefix stripped)
**Use Cases**:
- **Governance Routing**: Validate if a model request is allowed for a provider configuration
- **Load Balancing**: Filter providers based on allowed models before performance scoring
- **Virtual Key Validation**: Check if a model can be used with a specific virtual key's provider configs
<Tip>
This method is the central validation point for both governance and load balancing plugins, ensuring consistent model allowance logic across all routing mechanisms. It handles all edge cases including proxy providers (OpenRouter, Vertex) and provider-prefixed model entries.
</Tip>
#### Dynamically Add Models
You can dynamically add models to the catalog's pool from a `v1/models` compatible response structure. This is useful for providers that expose a model list endpoint.
```go
// response is *schemas.BifrostListModelsResponse
modelCatalog.AddModelDataToPool(response)
```
This is automatically done in Bifrost gateway initialization for all providers that are supported by Bifrost.
**When to use**:
- After fetching models from a provider's `/v1/models` endpoint
- When a new provider is dynamically added at runtime
- For testing with custom model lists
### Reloading Configuration
You can reload the pricing configuration at runtime if you need to change the pricing URL or sync interval.
```go
newConfig := &modelcatalog.Config{
PricingSyncInterval: 12 * time.Hour,
}
err := modelCatalog.UpdateSyncConfig(ctx, newConfig)
```
## Error Handling and Fallbacks
The Model Catalog handles missing pricing data gracefully with intelligent fallbacks:
```go
// resolvePricing resolves the pricing entry for a model, trying deployment as fallback.
func (mc *ModelCatalog) resolvePricing(provider, model, deployment string, requestType schemas.RequestType) *configstoreTables.TableModelPricing {
pricing, exists := mc.getPricing(model, provider, requestType)
if exists {
return pricing
}
// If pricing not found for model, try the deployment name
if deployment != "" {
pricing, exists = mc.getPricing(deployment, provider, requestType)
if exists {
return pricing
}
}
return nil
}
// getPricing returns pricing information for a model (thread-safe).
// It implements a multi-step fallback chain:
// 1. Direct lookup by model + provider + mode
// 2. Gemini → Vertex provider fallback
// 3. Vertex "provider/model" prefix stripping
// 4. Bedrock "anthropic." prefix addition for Claude models
// 5. Responses → Chat mode fallback (at each step)
// 6. ImageEdit / ImageVariation → ImageGeneration mode fallback
func (mc *ModelCatalog) getPricing(model, provider string, requestType schemas.RequestType) (*configstoreTables.TableModelPricing, bool) {
mc.mu.RLock()
defer mc.mu.RUnlock()
mode := normalizeRequestType(requestType)
pricing, ok := mc.pricingData[makeKey(model, provider, mode)]
if ok {
return &pricing, true
}
// Provider-specific fallbacks (Gemini→Vertex, Vertex prefix strip, Bedrock anthropic. prefix)
// Each fallback also tries Responses→Chat mode if applicable
// ...
// Final fallback: Responses → Chat mode for any provider
if requestType == schemas.ResponsesRequest || requestType == schemas.ResponsesStreamRequest {
pricing, ok = mc.pricingData[makeKey(model, provider, normalizeRequestType(schemas.ChatCompletionRequest))]
if ok {
return &pricing, true
}
}
return nil, false
}
// When pricing is not found, CalculateCost returns 0.0 and logs a debug message.
// This ensures operations continue smoothly without billing failures.
```
## Cleanup and Lifecycle Management
Properly clean up resources when shutting down:
```go
// Cleanup model catalog resources
defer func() {
if err := modelCatalog.Cleanup(); err != nil {
logger.Error("Failed to cleanup model catalog: %v", err)
}
}()
```
## Thread Safety
All `ModelCatalog` operations are thread-safe, making it suitable for concurrent usage across multiple plugins and goroutines. The internal pricing data cache uses read-write mutexes for optimal performance during frequent lookups.
## Best Practices
1. **Shared Instance**: Use a single `ModelCatalog` instance across all plugins to avoid redundant data synchronization.
2. **Error Handling**: Always handle the case where pricing returns 0.0 due to missing model data.
3. **Logging**: Monitor pricing sync failures and missing model warnings in production.
4. **Cache Awareness**: Use `CalculateCost` which automatically handles cache hits/misses and embedding costs.
5. **Resource Cleanup**: Always call `Cleanup()` during application shutdown to prevent resource leaks.
The Model Catalog provides a robust, production-ready foundation for implementing billing, budgeting, and cost monitoring features in Bifrost plugins.

View File

@@ -0,0 +1,130 @@
---
title: "Streaming"
description: "Framework utility for aggregating and processing real-time stream chunks from AI providers"
icon: "water"
---
## Overview
The **Streaming** package (`framework/streaming`) is a core utility within Bifrost designed to handle real-time data streams from AI providers. It provides a robust and efficient mechanism for plugins like [Logging](/features/observability/default), [OTel](/features/observability/otel), and [Maxim](/features/observability/maxim) to process, aggregate, and format streaming responses for chat completions, transcriptions, and other real-time AI interactions.
```mermaid
sequenceDiagram
participant Plugin
participant BC as Bifrost Core
participant Accumulator
BC->>Plugin: PreLLMHook(StreamingRequest)
activate Plugin
Plugin->>Accumulator: CreateStreamAccumulator(requestID)
activate Accumulator
Accumulator-->>Plugin: ack
deactivate Accumulator
Plugin-->>BC: return
deactivate Plugin
loop For each response chunk
BC->>Plugin: PostLLMHook(StreamChunk)
activate Plugin
Plugin->>Accumulator: ProcessStreamingResponse(StreamChunk)
activate Accumulator
alt Is NOT Final Chunk
Accumulator-->>Plugin: return {Type: Delta}
else Is Final Chunk
Accumulator->>Accumulator: buildCompleteResponse()
Accumulator-->>Plugin: return {Type: Final, CompleteData}
end
deactivate Accumulator
Plugin-->>BC: return
deactivate Plugin
end
```
Its primary purpose is to simplify the complexity of handling chunked data, ensuring that plugins can work with complete, well-structured responses without needing to implement their own aggregation logic.
## How It Works
The streaming package uses an `Accumulator` to manage the lifecycle of a streaming operation. This process is designed to be highly efficient, using `sync.Pool` to reuse objects and minimize memory allocations.
1. **Initialization**: When a plugin that needs to process streams (like `logging` or `otel`) is initialized, it creates a new `streaming.Accumulator`.
2. **Stream Start**: In the `PreLLMHook` phase of a request, if the request is identified as a streaming type, the plugin calls `accumulator.CreateStreamAccumulator(requestID, timestamp)` to prepare a dedicated buffer for the incoming chunks of that request.
3. **Chunk Processing**: In the `PostLLMHook` phase, as each chunk of the streaming response arrives, the plugin passes it to `accumulator.ProcessStreamingResponse()`.
* For each `delta` chunk, the accumulator appends it to the buffer associated with the request ID.
* The accumulator handles different types of streams, including chat, audio, and transcriptions, using specialized logic to correctly piece together the data. For example, it accumulates text deltas, tool call argument deltas, and other parts of the message.
4. **Finalization**: When the final chunk of the stream is received (indicated by a `finish_reason` or other provider-specific signal), `ProcessStreamingResponse` performs the final assembly.
* It reconstructs the complete `ChatMessage` or other response object from all the stored chunks.
* It calculates total token usage, cost, and latency.
* It returns a `ProcessedStreamResponse` object with `StreamResponseTypeFinal` and the complete, structured `AccumulatedData`.
5. **Cleanup**: Once the final response is processed, the accumulator cleans up all buffered chunks for that request ID, returning them to the `sync.Pool` for reuse.
## Key Components
### `Accumulator`
The central component of the package. It is a thread-safe manager that:
- Tracks stream chunks for multiple concurrent requests using a `sync.Map`.
- Uses `sync.Pool` to recycle `*StreamChunk` objects, reducing garbage collection overhead.
- Provides methods to add chunks (`addChatStreamChunk`, `addAudioStreamChunk`, etc.).
- Includes a periodic cleanup worker to remove stale accumulators for incomplete or orphaned requests.
### `ProcessStreamingResponse`
This is the main entry point for plugins to process stream data. It inspects the response type and delegates to the appropriate handler:
- `processChatStreamingResponse`
- `processAudioStreamingResponse`
- `processTranscriptionStreamingResponse`
- `processResponsesStreamingResponse`
It returns a `ProcessedStreamResponse`, which indicates whether the chunk is a `delta` or the `final` aggregated response.
### Stream-Specific Builders
The package includes internal logic to correctly build complete messages from chunks. For example, `buildCompleteMessageFromChatStreamChunks` iterates through the collected `ChatStreamChunk` objects, appending content deltas and assembling tool calls into a final, coherent `schemas.ChatMessage`.
## Usage Example
The following snippet from the `logging` plugin shows how the `streaming` package is used in practice within a plugin's `PostLLMHook`.
```go
// In plugins/logging/main.go
func (p *LoggerPlugin) PostLLMHook(ctx *schemas.BifrostContext, result *schemas.BifrostResponse, bifrostErr *schemas.BifrostError) (*schemas.BifrostResponse, *schemas.BifrostError, error) {
// ... setup, get requestID ...
go func() {
// ...
if bifrost.IsStreamRequestType(requestType) {
p.logger.Debug("[logging] processing streaming response")
// 1. Pass the response chunk to the accumulator
streamResponse, err := p.accumulator.ProcessStreamingResponse(ctx, result, bifrostErr)
if err != nil {
p.logger.Error("failed to process streaming response: %v", err)
// 2. Check if this is the final, aggregated response
} else if streamResponse != nil && streamResponse.Type == streaming.StreamResponseTypeFinal {
// Prepare final log data
logMsg.Operation = LogOperationStreamUpdate
logMsg.StreamResponse = streamResponse
// 3. Update the log entry with the complete data
processingErr := retryOnNotFound(p.ctx, func() error {
return p.updateStreamingLogEntry(p.ctx, logMsg.RequestID, logMsg.SemanticCacheDebug, logMsg.StreamResponse, true)
})
// ... handle errors and callbacks ...
}
}
// ... handle non-streaming responses ...
}()
return result, bifrostErr, nil
}
```
This demonstrates how a plugin can remain agnostic to the details of stream aggregation and simply react to the final, complete data returned by the `streaming` package. This greatly simplifies plugin development and ensures consistent data handling across the framework.

View File

@@ -0,0 +1,185 @@
---
title: "Vector Store"
description: "Vector database implementations for semantic search, embeddings storage, and AI-powered features in Bifrost."
icon: "diagram-project"
---
## Overview
The VectorStore is a core component of Bifrost's framework package that provides a unified interface for vector database operations. It enables plugins to store embeddings, perform similarity searches, and build AI-powered features like semantic caching, content recommendations, and knowledge retrieval.
**Key Capabilities:**
- **Vector Similarity Search**: Find semantically similar content using embeddings
- **Namespace Management**: Organize data into separate collections with custom schemas
- **Flexible Filtering**: Query data with complex filters and pagination
- **Multiple Backends**: Support for Weaviate, Redis/Valkey-compatible, Qdrant, and Pinecone vector stores
- **High Performance**: Optimized for production workloads
- **Scalable Storage**: Handle millions of vectors with efficient indexing
## VectorStore Interface Usage
### Creating Namespaces
Create collections (namespaces) with custom schemas:
```go
// Define properties for your data
properties := map[string]vectorstore.VectorStoreProperties{
"content": {
DataType: vectorstore.VectorStorePropertyTypeString,
Description: "The main content text",
},
"category": {
DataType: vectorstore.VectorStorePropertyTypeString,
Description: "Content category",
},
"tags": {
DataType: vectorstore.VectorStorePropertyTypeStringArray,
Description: "Content tags",
},
}
// Create namespace
err := store.CreateNamespace(ctx, "my_content", 1536, properties)
if err != nil {
log.Fatal("Failed to create namespace:", err)
}
```
### Storing Data with Embeddings
Add data with vector embeddings for similarity search:
```go
// Your embedding data (typically from an embedding model)
embedding := []float32{0.1, 0.2, 0.3 } // example 3-dimensional vector
// Metadata associated with this vector
metadata := map[string]interface{}{
"content": "This is my content text",
"category": "documentation",
"tags": []string{"guide", "tutorial"},
}
// Store in vector database
err := store.Add(ctx, "my_content", "unique-id-123", embedding, metadata)
if err != nil {
log.Fatal("Failed to add data:", err)
}
```
### Similarity Search
Find similar content using vector similarity:
```go
// Query embedding (from user query)
queryEmbedding := []float32{0.15, 0.25, 0.35, ...}
// Optional filters
filters := []vectorstore.Query{
{
Field: "category",
Operator: vectorstore.QueryOperatorEqual,
Value: "documentation",
},
}
// Perform similarity search
results, err := store.GetNearest(
ctx,
"my_content", // namespace
queryEmbedding, // query vector
filters, // optional filters
[]string{"content", "category"}, // fields to return
0.7, // similarity threshold (0-1)
10, // limit
)
for _, result := range results {
fmt.Printf("Score: %.3f, Content: %s\n", *result.Score, result.Properties["content"])
}
```
### Data Retrieval and Management
Query and manage stored data:
```go
// Get specific item by ID
item, err := store.GetChunk(ctx, "my_content", "unique-id-123")
if err != nil {
log.Fatal("Failed to get item:", err)
}
// Get all items with filtering and pagination
allResults, cursor, err := store.GetAll(
ctx,
"my_content",
[]vectorstore.Query{
{Field: "category", Operator: vectorstore.QueryOperatorEqual, Value: "documentation"},
},
[]string{"content", "tags"}, // select fields
nil, // cursor for pagination
50, // limit
)
// Delete items
err = store.Delete(ctx, "my_content", "unique-id-123")
```
## Supported Vector Stores
<CardGroup cols={2}>
<Card title="Weaviate" icon="database" href="/integrations/vector-databases/weaviate">
Production-ready vector database with gRPC support.
</Card>
<Card title="Redis / Valkey" icon="database" href="/integrations/vector-databases/redis">
High-performance in-memory vector store.
</Card>
<Card title="Qdrant" icon="database" href="/integrations/vector-databases/qdrant">
Rust-based vector search engine with advanced filtering.
</Card>
<Card title="Pinecone" icon="database" href="/integrations/vector-databases/pinecone">
Managed vector database with serverless options.
</Card>
</CardGroup>
---
## Use Cases
### [Semantic Caching](../../features/semantic-caching)
Build intelligent caching systems that understand query intent rather than just exact matches.
**Applications:**
- Customer support systems with FAQ matching
- Code completion and documentation search
- Content management with semantic deduplication
### Knowledge Base & Search
Create intelligent search systems that understand user queries contextually.
**Applications:**
- Document search and retrieval systems
- Product recommendation engines
- Research paper and knowledge discovery platforms
### Content Classification
Automatically categorize and tag content based on semantic similarity.
**Applications:**
- Email classification and routing
- Content moderation and filtering
- News article categorization and clustering
### Recommendation Systems
Build personalized recommendation engines using vector similarity.
**Applications:**
- Product recommendations based on user preferences
- Content suggestions for media platforms
- Similar document or article recommendations
## Related Documentation
| Topic | Documentation | Description |
|-------|---------------|-------------|
| **Framework Overview** | [What is Framework](./what-is-framework) | Understanding the framework package and VectorStore interface |
| **Semantic Caching** | [Semantic Caching](../../features/semantic-caching) | Using VectorStore for AI response caching |

View File

@@ -0,0 +1,49 @@
---
title: "What is framework?"
description: "Framework is Bifrost's shared storage and utilities SDK package that provides common database interfaces and logic for the plugin ecosystem."
icon: "play"
---
Framework serves as the foundation layer that enables plugins to implement consistent data management patterns without reinventing storage solutions.
## Installation
```bash
go get github.com/maximhq/bifrost/framework
```
## Purpose
The framework package was designed to solve a fundamental challenge in plugin development: providing standardized, reliable storage and utility interfaces that plugins can depend on. Instead of each plugin implementing its own database logic, configuration management, or logging systems, framework offers battle-tested, shared implementations.
## Core Components
### ConfigStore
A unified configuration persistence layer that provides consistent storage patterns for plugin settings, provider configurations, and system state. Plugins can leverage `ConfigStore` to manage their configuration data with built-in CRUD operations, transaction support, and schema management.
### LogStore
Standardized logging and audit trail capabilities that enable plugins to implement observability features. `LogStore` provides structured logging, search and filtering capabilities, pagination support, and automated data retention policies.
### VectorStore
Vector database operations designed for AI-powered plugins that need semantic capabilities. `VectorStore` handles embeddings management, similarity search operations, and namespace isolation, making it easy for plugins to add features like semantic caching, content search, and AI-powered recommendations.
### Pricing Module
Cost calculation and model pricing management tools that help plugins implement billing and usage tracking features. The pricing system supports multi-tier pricing models, real-time usage tracking, and dynamic pricing updates.
## Benefits for Plugin Developers
**Shared Logic**: Common patterns for configuration, logging, and data management are provided out-of-the-box, reducing development time and ensuring consistency across plugins.
**Standardized Interfaces**: All framework components use consistent APIs, making it easier for developers to work across different plugins and maintain code quality.
**Pluggable Architecture**: The interface-based design allows different storage backends to be used without changing plugin code, providing flexibility for different deployment scenarios.
**Transaction Support**: Built-in transaction management and error handling ensure data integrity and provide reliable rollback capabilities.
**Production Ready**: Framework components are battle-tested in production environments and include features like connection pooling, retry logic, and performance optimizations.
## Integration with Bifrost
Framework seamlessly integrates with the Bifrost ecosystem, providing the storage foundation that powers core features like provider management, request logging, semantic caching, and governance. When plugins use framework components, they automatically participate in Bifrost's unified data management strategy.
The framework package enables plugin developers to focus on their core business logic while relying on robust, shared infrastructure for all storage and utility needs.

View File

View File

View File

View File

View File

View File