first commit

This commit is contained in:
Beyhan Oğur
2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions

View File

@@ -0,0 +1,764 @@
---
title: "Concurrency"
description: "Deep dive into Bifrost's advanced concurrency architecture - worker pools, goroutine management, channel-based communication, and resource isolation patterns."
icon: "traffic-light"
---
## Concurrency Philosophy
### **Core Principles**
| Principle | Implementation | Benefit |
| ---------------------------------- | -------------------------------------- | -------------------------------------- |
| **Provider Isolation** | Independent worker pools per provider | Fault tolerance, no cascade failures |
| **Channel-Based Communication** | Go channels for all async operations | Type-safe, deadlock-free communication |
| **Resource Pooling** | Object pools with lifecycle management | Predictable memory usage, minimal GC |
| **Non-Blocking Operations** | Async processing throughout pipeline | Maximum concurrency, no blocking waits |
| **Backpressure Handling** | Configurable buffers and flow control | Graceful degradation under load |
### **Threading Architecture Overview**
```mermaid
graph TB
subgraph "Main Thread"
Main[Main Process<br/>HTTP Server]
Router[Request Router<br/>Goroutine]
PluginMgr[Plugin Manager<br/>Goroutine]
end
subgraph "Provider Worker Pools"
subgraph "OpenAI Pool"
OAI1[Worker 1<br/>Goroutine]
OAI2[Worker 2<br/>Goroutine]
OAIN[Worker N<br/>Goroutine]
end
subgraph "Anthropic Pool"
ANT1[Worker 1<br/>Goroutine]
ANT2[Worker 2<br/>Goroutine]
ANTN[Worker N<br/>Goroutine]
end
subgraph "Bedrock Pool"
BED1[Worker 1<br/>Goroutine]
BED2[Worker 2<br/>Goroutine]
BEDN[Worker N<br/>Goroutine]
end
end
subgraph "Memory Pools"
ChannelPool[Channel Pool<br/>sync.Pool]
MessagePool[Message Pool<br/>sync.Pool]
ResponsePool[Response Pool<br/>sync.Pool]
end
Main --> Router
Router --> PluginMgr
PluginMgr --> OAI1
PluginMgr --> ANT1
PluginMgr --> BED1
OAI1 --> ChannelPool
ANT1 --> MessagePool
BED1 --> ResponsePool
```
---
## Worker Pool Architecture
### **Provider-Isolated Worker Pools**
```mermaid
stateDiagram-v2
[*] --> PoolInit: Worker Pool Creation
PoolInit --> WorkerSpawn: Spawn Worker Goroutines
WorkerSpawn --> Listening: Workers Listen on Channels
Listening --> Processing: Job Received
Processing --> API_Call: Provider API Request
API_Call --> Response: Process Response
Response --> Listening: Job Complete
Listening --> Shutdown: Graceful Shutdown
Processing --> Shutdown: Complete Current Job
Shutdown --> [*]: Pool Destroyed
```
**Worker Pool Architecture:**
The worker pool system maintains a sophisticated balance between resource efficiency and performance isolation:
**Key Components:**
- **Worker Pool Management** - Pre-spawned workers reduce startup latency
- **Job Queue System** - Buffered channels provide smooth load balancing
- **Resource Pools** - HTTP clients and API keys are pooled for efficiency
- **Health Monitoring** - Circuit breakers detect and isolate failing providers
- **Graceful Shutdown** - Workers complete current jobs before terminating
**Startup Process:**
1. **Worker Pre-spawning** - Workers are created during pool initialization
2. **Channel Setup** - Job queues and worker channels are established
3. **Resource Allocation** - HTTP clients and API keys are distributed
4. **Health Checks** - Initial connectivity tests verify provider availability
5. **Ready State** - Pool becomes available for request processing
**Job Dispatch Logic:**
- **Round-Robin Assignment** - Jobs are distributed evenly across available workers
- **Load Balancing** - Worker availability determines job assignment
- **Overflow Handling** - Excess jobs are queued or dropped based on configuration
### **Worker Lifecycle Management**
```mermaid
sequenceDiagram
participant Pool
participant Worker
participant HTTPClient
participant Provider
participant Metrics
Pool->>Worker: Start()
Worker->>Worker: Initialize HTTP Client
Worker->>Pool: Ready Signal
loop Job Processing
Pool->>Worker: Job Assignment
Worker->>HTTPClient: Prepare Request
HTTPClient->>Provider: API Call
Provider-->>HTTPClient: Response
HTTPClient-->>Worker: Parsed Response
Worker->>Metrics: Record Performance
Worker->>Pool: Job Complete
end
Pool->>Worker: Shutdown Signal
Worker->>Worker: Complete Current Job
Worker-->>Pool: Shutdown Confirmed
````
---
## Channel-Based Communication
### **Channel Architecture**
```mermaid
graph TB
subgraph "Channel Types"
JobQueue[Job Queue<br/>Buffered Channel]
WorkerPool[Worker Pool<br/>Buffered Channel]
ResultChan[Result Channel<br/>Buffered Channel]
QuitChan[Quit Channel<br/>Unbuffered]
end
subgraph "Flow Control"
BackPressure[Backpressure<br/>Buffer Limits]
Timeout[Timeout<br/>Context Cancellation]
Graceful[Graceful Shutdown<br/>Channel Closing]
end
JobQueue --> BackPressure
WorkerPool --> Timeout
ResultChan --> Graceful
```
**Channel Configuration Principles:**
Bifrost's channel system balances throughput and memory usage through careful buffer sizing:
**Job Queuing Configuration:**
- **Job Queue Buffer** - Sized based on expected burst traffic (100-1000 jobs)
- **Worker Pool Size** - Matches provider concurrency limits (10-100 workers)
- **Result Buffer** - Accommodates response processing delays (50-500 responses)
**Flow Control Parameters:**
- **Queue Wait Limits** - Maximum time jobs wait before timeout (1-10 seconds)
- **Processing Timeouts** - Per-job execution limits (30-300 seconds)
- **Shutdown Timeouts** - Graceful termination periods (5-30 seconds)
**Backpressure Policies:**
- **Drop Policy** - Discard excess jobs when queues are full
- **Block Policy** - Wait for queue space with timeout
- **Error Policy** - Immediately return error for full queues
**Channel Type Selection:**
- **Buffered Channels** - Used for async job processing and result handling
- **Unbuffered Channels** - Used for synchronization signals (quit, done)
- **Context Cancellation** - Used for timeout and cancellation propagation
### **Backpressure and Flow Control**
```mermaid
flowchart TD
Request[Incoming Request] --> QueueCheck{Queue Full?}
QueueCheck -->|No| Queue[Add to Queue]
QueueCheck -->|Yes| Policy{Drop Policy?}
Policy -->|Drop| Drop[Drop Request<br/>Return Error]
Policy -->|Block| Block[Block Until Space<br/>With Timeout]
Policy -->|Error| Error[Return Queue Full Error]
Queue --> Worker[Assign to Worker]
Block --> TimeoutCheck{Timeout?}
TimeoutCheck -->|Yes| Error
TimeoutCheck -->|No| Queue
Worker --> Processing[Process Request]
Processing --> Complete[Complete]
Drop --> Client[Client Response]
Error --> Client
Complete --> Client
````
**Backpressure Implementation Strategy:**
The backpressure system protects Bifrost from being overwhelmed while maintaining service availability:
**Non-Blocking Job Submission:**
- **Immediate Queue Check** - Jobs are submitted without blocking on queue space
- **Success Path** - Available queue space allows immediate job acceptance
- **Overflow Detection** - Full queues trigger backpressure policies
- **Metrics Collection** - All queue operations are tracked for monitoring
**Backpressure Policy Execution:**
- **Drop Policy** - Immediately rejects excess jobs with meaningful error messages
- **Block Policy** - Waits for queue space with configurable timeout limits
- **Error Policy** - Returns queue full errors for immediate client feedback
- **Metrics Tracking** - Dropped, blocked, and successful submissions are measured
**Timeout Management:**
- **Context-Based Timeouts** - All blocking operations respect timeout boundaries
- **Graceful Degradation** - Timeouts result in controlled error responses
- **Resource Protection** - Prevents goroutine leaks from infinite waits
```go
case pool.jobQueue <- job:
pool.metrics.IncQueuedJobs()
return nil
case <-ctx.Done():
pool.metrics.IncTimeoutJobs()
return errors.New("queue full, timeout waiting")
}
case "error":
pool.metrics.IncRejectedJobs()
return errors.New("queue full, job rejected")
default:
return errors.New("unknown queue policy")
}
}
}
```
---
## Memory Pool Concurrency
### **Thread-Safe Object Pools**
```mermaid
graph TD
subgraph "sync.Pool Lifecycle"
direction LR
GetObject[Get Object<br/>sync.Pool.Get]
PoolCheck{Is Pool Empty?}
NewObject[New Object<br/>Factory Function]
UseObject[Use Object<br/>Application Logic]
ResetObject[Reset Object<br/>Clear State]
ReturnObject[Return Object<br/>sync.Pool.Put]
GetObject --> PoolCheck
PoolCheck -- Yes --> NewObject
PoolCheck -- No --> UseObject
NewObject --> UseObject
UseObject --> ResetObject
ResetObject --> ReturnObject
ReturnObject --> GetObject
end
subgraph "GC Interaction"
direction TB
GCRun[GC Runs]
PoolCleanup[Pool Cleanup<br>Removes idle objects]
GCRun --> PoolCleanup
end
```
**Thread-Safe Pool Architecture:**
Bifrost's memory pool system ensures thread-safe object reuse across multiple goroutines:
**Pool Structure Design:**
- **Multiple Pool Types** - Separate pools for channels, messages, responses, and buffers
- **Factory Functions** - Dynamic object creation when pools are empty
- **Statistics Tracking** - Comprehensive metrics for pool performance monitoring
- **Thread Safety** - Synchronized access using Go's sync.Pool and read-write mutexes
**Object Lifecycle Management:**
- **Pool Initialization** - Factory functions define object creation patterns
- **Unique Identification** - Each pooled object gets a unique ID for tracking
- **Timestamp Tracking** - Creation, acquisition, and return times are recorded
- **Reusability Flags** - Objects can be marked as non-reusable for single-use scenarios
**Acquisition Strategy:**
- **Request Tracking** - All pool requests are counted for monitoring
- **Hit/Miss Tracking** - Pool effectiveness is measured through hit ratios
- **Fallback Creation** - New objects are created when pools are empty
- **Performance Metrics** - Acquisition times and patterns are monitored
**Return and Reset Process:**
- **State Validation** - Only reusable objects are returned to pools
- **Object Reset** - All object state is cleared before returning to pool
- **Return Tracking** - Return operations are counted and timed
- **Pool Replenishment** - Returned objects become available for reuse
### **Pool Performance Monitoring**
Comprehensive metrics provide insights into pool efficiency and system health:
**Usage Statistics Collection:**
- **Request Counting** - Track total pool requests by object type
- **Creation Tracking** - Monitor new object allocations when pools are empty
- **Hit/Miss Ratios** - Measure pool effectiveness through reuse rates
- **Return Monitoring** - Track successful object returns to pools
**Performance Metrics Analysis:**
- **Acquisition Times** - Measure how long it takes to get objects from pools
- **Reset Performance** - Track time spent cleaning objects for reuse
- **Hit Ratio Calculation** - Determine percentage of requests served from pools
- **Memory Efficiency** - Calculate memory savings from object reuse
**Key Performance Indicators:**
- **Channel Pool Hit Ratio** - Typically 85-95% in steady state
- **Message Pool Efficiency** - Usually 80-90% reuse rate
- **Response Pool Utilization** - Often 70-85% hit ratio
- **Total Memory Savings** - Measured reduction in garbage collection pressure
**Monitoring Integration:**
- **Thread-Safe Access** - All metrics collection is synchronized
- **Real-Time Updates** - Statistics are updated with each pool operation
- **Export Capability** - Metrics are available in JSON format for monitoring systems
- **Alerting Support** - Low hit ratios can trigger performance alerts
---
## Goroutine Management
### **Goroutine Lifecycle Patterns**
```mermaid
stateDiagram-v2
[*] --> Created: go routine()
Created --> Running: Execute Function
Running --> Waiting: Channel/Mutex Block
Waiting --> Running: Unblocked
Running --> Syscall: Network I/O
Syscall --> Running: I/O Complete
Running --> GCAssist: GC Triggered
GCAssist --> Running: GC Complete
Running --> Terminated: Function Exit
Terminated --> [*]: Cleanup
```
**Goroutine Pool Management Strategy:**
Bifrost's goroutine management ensures optimal resource usage while preventing goroutine leaks:
**Pool Configuration Management:**
- **Goroutine Limits** - Maximum concurrent goroutines prevent resource exhaustion
- **Active Counting** - Atomic counters track currently running goroutines
- **Idle Timeouts** - Unused goroutines are cleaned up after configured periods
- **Resource Boundaries** - Hard limits prevent runaway goroutine creation
**Lifecycle Orchestration:**
- **Spawn Channels** - New goroutine creation is tracked through channels
- **Completion Monitoring** - Finished goroutines signal completion for cleanup
- **Shutdown Coordination** - Graceful shutdown ensures all goroutines complete properly
- **Health Monitoring** - Continuous monitoring tracks goroutine health and performance
**Worker Creation Process:**
- **Limit Enforcement** - Creation fails when maximum goroutine count is reached
- **Unique Identification** - Each goroutine gets a unique ID for tracking and debugging
- **Lifecycle Tracking** - Start times and names enable performance analysis
- **Atomic Operations** - Thread-safe counters prevent race conditions
**Panic Recovery and Error Handling:**
- **Panic Isolation** - Goroutine panics don't crash the entire system
- **Error Logging** - Panic details are logged with goroutine context
- **Metrics Updates** - Panic counts are tracked for monitoring and alerting
- **Resource Cleanup** - Failed goroutines are properly cleaned up and counted
**Health Monitoring System:**
- **Periodic Health Checks** - Regular intervals check goroutine pool health
- **Completion Tracking** - Finished goroutines are recorded for performance analysis
- **Shutdown Handling** - Clean shutdown process ensures no goroutine leaks
### **Resource Leak Prevention**
```mermaid
flowchart TD
GoroutineStart[Goroutine Start] --> ResourceCheck[Resource Allocation Check]
ResourceCheck --> Timeout[Set Timeout Context]
Timeout --> Work[Execute Work]
Work --> Complete{Work Complete?}
Complete -->|Yes| Cleanup[Cleanup Resources]
Complete -->|No| TimeoutCheck{Timeout?}
TimeoutCheck -->|Yes| ForceCleanup[Force Cleanup]
TimeoutCheck -->|No| Work
Cleanup --> Return[Return Resources to Pool]
ForceCleanup --> Return
Return --> End[Goroutine End]
````
**Resource Leak Prevention:**
```go
func (worker *Worker) ExecuteWithCleanup(job *Job) {
// Set timeout context
ctx, cancel := context.WithTimeout(
context.Background(),
worker.config.ProcessTimeout,
)
defer cancel()
// Acquire resources with timeout
resources, err := worker.acquireResources(ctx)
if err != nil {
job.resultChan <- &Result{Error: err}
return
}
// Ensure cleanup happens
defer func() {
// Always return resources
worker.returnResources(resources)
// Handle panics
if r := recover(); r != nil {
worker.metrics.IncPanics()
job.resultChan <- &Result{
Error: fmt.Errorf("worker panic: %v", r),
}
}
}()
// Execute job with context
result := worker.processJob(ctx, job, resources)
// Return result
select {
case job.resultChan <- result:
// Success
case <-ctx.Done():
// Timeout - result channel might be closed
worker.metrics.IncTimeouts()
}
}
```
---
## Concurrency Optimization Strategies
### **Load-Based Worker Scaling** (Planned)
```mermaid
graph TB
subgraph "Load Monitoring"
QueueDepth[Queue Depth<br/>Monitoring]
ResponseTime[Response Time<br/>Tracking]
WorkerUtil[Worker Utilization<br/>Metrics]
end
subgraph "Scaling Decisions"
ScaleUp{Scale Up?<br/>Load > 80%}
ScaleDown{Scale Down?<br/>Load < 30%}
Maintain[Maintain<br/>Current Size]
end
subgraph "Actions"
AddWorkers[Spawn Additional<br/>Workers]
RemoveWorkers[Graceful Worker<br/>Shutdown]
NoAction[No Action<br/>Monitor Continue]
end
QueueDepth --> ScaleUp
ResponseTime --> ScaleUp
WorkerUtil --> ScaleDown
ScaleUp -->|Yes| AddWorkers
ScaleUp -->|No| ScaleDown
ScaleDown -->|Yes| RemoveWorkers
ScaleDown -->|No| Maintain
Maintain --> NoAction
```
**Adaptive Scaling Implementation:**
```go
type AdaptiveScaler struct {
pool *ProviderWorkerPool
config ScalingConfig
metrics *ScalingMetrics
lastScaleTime time.Time
scalingMutex sync.Mutex
}
func (scaler *AdaptiveScaler) EvaluateScaling() {
scaler.scalingMutex.Lock()
defer scaler.scalingMutex.Unlock()
// Prevent frequent scaling
if time.Since(scaler.lastScaleTime) < scaler.config.MinScaleInterval {
return
}
current := scaler.getCurrentMetrics()
// Scale up conditions
if current.QueueUtilization > scaler.config.ScaleUpThreshold ||
current.AvgResponseTime > scaler.config.MaxResponseTime {
scaler.scaleUp(current)
return
}
// Scale down conditions
if current.QueueUtilization < scaler.config.ScaleDownThreshold &&
current.AvgResponseTime < scaler.config.TargetResponseTime {
scaler.scaleDown(current)
return
}
}
func (scaler *AdaptiveScaler) scaleUp(metrics *CurrentMetrics) {
currentWorkers := scaler.pool.GetWorkerCount()
targetWorkers := int(float64(currentWorkers) * scaler.config.ScaleUpFactor)
// Respect maximum limits
if targetWorkers > scaler.config.MaxWorkers {
targetWorkers = scaler.config.MaxWorkers
}
additionalWorkers := targetWorkers - currentWorkers
if additionalWorkers > 0 {
scaler.pool.AddWorkers(additionalWorkers)
scaler.lastScaleTime = time.Now()
scaler.metrics.RecordScaleUp(additionalWorkers)
}
}
```
### **Provider-Specific Optimization**
```go
type ProviderOptimization struct {
// Provider characteristics
ProviderName string `json:"provider_name"`
RateLimit int `json:"rate_limit"` // Requests per second
AvgLatency time.Duration `json:"avg_latency"` // Average response time
ErrorRate float64 `json:"error_rate"` // Historical error rate
// Optimal configuration
OptimalWorkers int `json:"optimal_workers"`
OptimalBuffer int `json:"optimal_buffer"`
TimeoutConfig time.Duration `json:"timeout_config"`
RetryStrategy RetryConfig `json:"retry_strategy"`
}
func CalculateOptimalConcurrency(provider ProviderOptimization) ConcurrencyConfig {
// Calculate based on rate limits and latency
optimalWorkers := provider.RateLimit * int(provider.AvgLatency.Seconds())
// Adjust for error rate (more workers for higher error rate)
errorAdjustment := 1.0 + provider.ErrorRate
optimalWorkers = int(float64(optimalWorkers) * errorAdjustment)
// Buffer should be 2-3x worker count for smooth operation
optimalBuffer := optimalWorkers * 3
return ConcurrencyConfig{
Concurrency: optimalWorkers,
BufferSize: optimalBuffer,
Timeout: provider.AvgLatency * 2, // 2x avg latency for timeout
}
}
```
---
## Concurrency Monitoring & Metrics
### **Key Concurrency Metrics**
```mermaid
graph TB
subgraph "Worker Metrics"
ActiveWorkers[Active Workers<br/>Current Count]
IdleWorkers[Idle Workers<br/>Available Count]
BusyWorkers[Busy Workers<br/>Processing Count]
end
subgraph "Queue Metrics"
QueueDepth[Queue Depth<br/>Pending Jobs]
QueueThroughput[Queue Throughput<br/>Jobs/Second]
QueueWaitTime[Queue Wait Time<br/>Average Delay]
end
subgraph "Performance Metrics"
GoroutineCount[Goroutine Count<br/>Total Active]
MemoryUsage[Memory Usage<br/>Pool Utilization]
GCPressure[GC Pressure<br/>Collection Frequency]
end
subgraph "Health Metrics"
ErrorRate[Error Rate<br/>Failed Jobs %]
PanicCount[Panic Count<br/>Crashed Goroutines]
DeadlockDetection[Deadlock Detection<br/>Blocked Operations]
end
```
**Metrics Collection Strategy:**
Comprehensive concurrency monitoring provides operational insights and performance optimization data:
**Worker Pool Monitoring:**
- **Total Worker Tracking** - Monitor configured vs actual worker counts
- **Active Worker Monitoring** - Track workers currently processing requests
- **Idle Worker Analysis** - Identify unused capacity and optimization opportunities
- **Queue Depth Monitoring** - Track pending job backlog and processing delays
**Performance Data Collection:**
- **Throughput Metrics** - Measure jobs processed per second across all pools
- **Wait Time Analysis** - Track how long jobs wait in queues before processing
- **Memory Pool Performance** - Monitor hit/miss ratios for memory pool effectiveness
- **Goroutine Count Tracking** - Ensure goroutine counts remain within healthy limits
**Health and Reliability Metrics:**
- **Panic Recovery Tracking** - Count and analyze worker panic occurrences
- **Timeout Monitoring** - Track jobs that exceed processing time limits
- **Circuit Breaker Events** - Monitor provider isolation events and recoveries
- **Error Rate Analysis** - Track failure patterns for capacity planning
**Real-Time Updates:**
- **Live Metric Updates** - Worker metrics are updated continuously during operation
- **Processing Event Recording** - Each job completion updates relevant metrics
- **Performance Correlation** - Queue times and processing times are correlated for analysis
- **Success/Failure Tracking** - All job outcomes are recorded for reliability analysis
---
## Deadlock Prevention & Detection
### **Deadlock Prevention Strategies**
```mermaid
flowchart TD
Strategy1[Lock Ordering<br/>Consistent Acquisition]
Strategy2[Timeout-Based Locks<br/>Context Cancellation]
Strategy3[Channel Select<br/>Non-blocking Operations]
Strategy4[Resource Hierarchy<br/>Layered Locking]
Prevention[Deadlock Prevention<br/>Design Patterns]
Prevention --> Strategy1
Prevention --> Strategy2
Prevention --> Strategy3
Prevention --> Strategy4
Strategy1 --> Success[No Deadlocks<br/>Guaranteed Order]
Strategy2 --> Success
Strategy3 --> Success
Strategy4 --> Success
````
**Deadlock Prevention Implementation Strategy:**
Bifrost employs multiple complementary strategies to prevent deadlocks in concurrent operations:
**Lock Ordering Management:**
- **Consistent Acquisition Order** - All locks are acquired in a predetermined order
- **Global Lock Registry** - Centralized registry maintains lock ordering relationships
- **Order Enforcement** - Lock acquisition automatically sorts by predetermined order
- **Dependency Tracking** - Lock dependencies are mapped to prevent circular waits
**Timeout-Based Protection:**
- **Default Timeouts** - All lock acquisitions have reasonable timeout limits
- **Context Cancellation** - Operations respect context cancellation for cleanup
- **Maximum Timeout Limits** - Upper bounds prevent indefinite blocking
- **Graceful Timeout Handling** - Timeout errors provide meaningful context
**Multi-Lock Acquisition Process:**
- **Ordered Sorting** - Multiple locks are sorted before acquisition attempts
- **Progressive Acquisition** - Locks are acquired one by one in sorted order
- **Failure Recovery** - Failed acquisitions trigger automatic cleanup of held locks
- **Resource Tracking** - All acquired locks are tracked for proper release
**Lock Acquisition Safety:**
- **Non-Blocking Detection** - Channel-based lock attempts prevent indefinite blocking
- **Timeout Enforcement** - All lock attempts respect configured timeout limits
- **Error Propagation** - Lock failures are properly propagated with context
- **Cleanup Guarantees** - Failed operations always clean up partially acquired resources
**Deadlock Detection and Recovery:**
- **Active Monitoring** - Continuous monitoring for potential deadlock conditions
- **Automatic Recovery** - Detected deadlocks trigger automatic resolution procedures
- **Resource Release** - Deadlock resolution involves strategic resource release
- **Prevention Learning** - Deadlock patterns inform prevention strategy improvements
---
## Related Architecture Documentation
- **[Request Flow](./request-flow)** - How concurrency fits in request processing
- **[Benchmarks](../../benchmarking/getting-started)** - Concurrency performance characteristics
- **[Plugin System](./plugins)** - Plugin concurrency considerations
- **[MCP System](./mcp)** - MCP concurrency and worker integration
## Usage Documentation
- **[Provider Configuration](../../quickstart/gateway/provider-configuration)** - Configure concurrency settings per provider
- **[Performance Analysis](../../benchmarking/getting-started)** - Memory pool configuration and optimization
- **[Performance Monitoring](../../features/telemetry)** - Monitor concurrency metrics and health
- **[Go SDK Usage](../../quickstart/go-sdk/setting-up)** - Use Bifrost concurrency in Go applications
- **[Gateway Setup](../../quickstart/gateway/setting-up)** - Deploy Bifrost with optimal concurrency settings
---
**🎯 Next Step:** Understand how plugins integrate with the concurrency model in **[Plugin System](./plugins)**.
```

View File

@@ -0,0 +1,985 @@
---
title: "Model Context Protocol (MCP)"
description: "Deep dive into Bifrost's Model Context Protocol (MCP) integration - how external tool discovery, execution, and integration work internally."
icon: "toolbox"
---
## MCP Architecture Overview
### **What is MCP in Bifrost?**
The Model Context Protocol (MCP) system in Bifrost enables AI models to seamlessly discover and execute external tools, transforming static chat models into dynamic, action-capable agents. This architecture bridges the gap between AI reasoning and real-world tool execution.
**Core MCP Principles:**
- **Dynamic Discovery** - Tools are discovered at runtime, not hardcoded
- **Client-Side Execution** - Bifrost controls all tool execution for security
- **Multi-Protocol Support** - STDIO, HTTP, and SSE connection types
- **Request-Level Filtering** - Granular control over tool availability
- **Async Execution** - Non-blocking tool invocation and response handling
### **MCP System Components**
```mermaid
graph TB
subgraph "MCP Management Layer"
MCPMgr[MCP Manager<br/>Central Controller]
ClientRegistry[Client Registry<br/>Connection Management]
ToolDiscovery[Tool Discovery<br/>Runtime Registration]
end
subgraph "MCP Execution Layer"
ToolFilter[Tool Filter<br/>Access Control]
ToolExecutor[Tool Executor<br/>Invocation Engine]
ResultProcessor[Result Processor<br/>Response Handling]
end
subgraph "Connection Types"
STDIOConn[STDIO Connections<br/>Command-line Tools]
HTTPConn[HTTP Connections<br/>Web Services]
SSEConn[SSE Connections<br/>Real-time Streams]
end
subgraph "External MCP Servers"
FileSystem[Filesystem Tools<br/>File Operations]
WebSearch[Web Search<br/>Information Retrieval]
Database[Database Tools<br/>Data Access]
Custom[Custom Tools<br/>Business Logic]
end
MCPMgr --> ClientRegistry
ClientRegistry --> ToolDiscovery
ToolDiscovery --> ToolFilter
ToolFilter --> ToolExecutor
ToolExecutor --> ResultProcessor
ClientRegistry --> STDIOConn
ClientRegistry --> HTTPConn
ClientRegistry --> SSEConn
STDIOConn --> FileSystem
HTTPConn --> WebSearch
HTTPConn --> Database
STDIOConn --> Custom
```
---
## MCP Connection Architecture
### **Multi-Protocol Connection System**
Bifrost supports four MCP connection types, each optimized for different tool deployment patterns:
```mermaid
graph TB
subgraph "InProcess Connections"
InProcess[In-Memory Tools<br/>Same Process]
InProcessEx[Examples:<br/>• Embedded tools<br/>• High-perf operations<br/>• Testing tools]
end
subgraph "STDIO Connections"
STDIO[Command Line Tools<br/>Local Execution]
STDIOEx[Examples:<br/>• Filesystem tools<br/>• Local scripts<br/>• CLI utilities]
end
subgraph "HTTP Connections"
HTTP[Web Service Tools<br/>Remote APIs]
HTTPEx[Examples:<br/>• Web search APIs<br/>• Database services<br/>• External integrations]
end
subgraph "SSE Connections"
SSE[Real-time Tools<br/>Streaming Data]
SSEEx[Examples:<br/>• Live data feeds<br/>• Real-time monitoring<br/>• Event streams]
end
subgraph "Connection Characteristics"
Latency[Latency:<br/>InProcess < STDIO < HTTP < SSE]
Security[Security:<br/>InProcess/Local > HTTP > SSE]
Scalability[Scalability:<br/>HTTP > SSE > STDIO > InProcess]
Complexity[Complexity:<br/>InProcess < STDIO < HTTP < SSE]
end
InProcess --> Latency
STDIO --> Latency
HTTP --> Security
SSE --> Scalability
HTTP --> Complexity
```
### **Connection Type Details**
**InProcess Connections (In-Memory Tools):**
- **Use Case:** Embedded tools, high-performance operations, testing
- **Performance:** Lowest possible latency (~0.1ms) with no IPC overhead
- **Security:** Highest security as tools run in the same process
- **Limitations:** Go package only, cannot be configured via JSON
**STDIO Connections (Local Tools):**
- **Use Case:** Command-line tools, local scripts, filesystem operations
- **Performance:** Low latency (~1-10ms) due to local execution
- **Security:** High security with full local control
- **Limitations:** Single-server deployment, resource sharing
**HTTP Connections (Remote Services):**
- **Use Case:** Web APIs, microservices, cloud functions
- **Performance:** Network-dependent latency (~10-500ms)
- **Security:** Configurable with authentication and encryption
- **Advantages:** Scalable, multi-server deployment, service isolation
**SSE Connections (Streaming Tools):**
- **Use Case:** Real-time data feeds, live monitoring, event streams
- **Performance:** Variable latency depending on stream frequency
- **Security:** Similar to HTTP with streaming capabilities
- **Benefits:** Real-time updates, persistent connections, event-driven
> **MCP Configuration:** [MCP Setup Guide →](../../mcp/overview)
---
## Tool Discovery & Registration
### **Dynamic Tool Discovery Process**
The MCP system discovers tools at runtime rather than requiring static configuration, enabling flexible and adaptive tool availability:
```mermaid
sequenceDiagram
participant Bifrost
participant MCPManager
participant MCPServer
participant ToolRegistry
participant AIModel
Note over Bifrost: System Startup
Bifrost->>MCPManager: Initialize MCP System
MCPManager->>MCPServer: Establish Connection
MCPServer-->>MCPManager: Connection Ready
MCPManager->>MCPServer: List Available Tools
MCPServer-->>MCPManager: Tool Definitions
MCPManager->>ToolRegistry: Register Tools
Note over Bifrost: Runtime Request Processing
AIModel->>MCPManager: Request Available Tools
MCPManager->>ToolRegistry: Query Tools
ToolRegistry-->>MCPManager: Filtered Tool List
MCPManager-->>AIModel: Available Tools
AIModel->>MCPManager: Execute Tool Call
MCPManager->>MCPServer: Tool Invocation
MCPServer->>MCPServer: Execute Tool Logic
MCPServer-->>MCPManager: Tool Result
MCPManager-->>AIModel: Enhanced Response
```
### **Tool Registry Management**
**Registration Process:**
1. **Connection Establishment** - MCP client connects to configured servers
2. **Capability Exchange** - Server announces available tools and schemas
3. **Tool Validation** - Bifrost validates tool definitions and security
4. **Registry Update** - Tools are registered in the internal tool registry
5. **Availability Notification** - Tools become available for AI model use
**Registry Features:**
- **Dynamic Updates** - Tools can be added/removed during runtime
- **Version Management** - Support for tool versioning and compatibility
- **Access Control** - Request-level tool filtering and permissions
- **Health Monitoring** - Continuous tool availability checking
**Tool Metadata Structure:**
- **Name & Description** - Human-readable tool identification
- **Parameters Schema** - JSON schema for tool input validation
- **Return Schema** - Expected response format definition
- **Capabilities** - Tool feature flags and limitations
- **Authentication** - Required credentials and permissions
---
## Tool Filtering & Access Control
### **Multi-Level Filtering System**
Bifrost provides granular control over tool availability through a sophisticated filtering system:
```mermaid
flowchart TD
Request[Incoming Request] --> GlobalFilter{Global MCP Filter}
GlobalFilter -->|Enabled| ClientFilter[MCP Client Filtering]
GlobalFilter -->|Disabled| NoMCP[No MCP Tools]
ClientFilter --> IncludeClients{Include Clients?}
IncludeClients -->|Yes| IncludeList[Include Specified<br/>MCP Clients]
IncludeClients -->|No| AllClients[All MCP Clients]
IncludeList --> ExcludeClients{Exclude Clients?}
AllClients --> ExcludeClients
ExcludeClients -->|Yes| RemoveClients[Remove Excluded<br/>MCP Clients]
ExcludeClients -->|No| ClientsFiltered[Filtered Clients]
RemoveClients --> ToolFilter[Tool-Level Filtering]
ClientsFiltered --> ToolFilter
ToolFilter --> IncludeTools{Include Tools?}
IncludeTools -->|Yes| IncludeSpecific[Include Specified<br/>Tools Only]
IncludeTools -->|No| AllTools[All Available Tools]
IncludeSpecific --> ExcludeTools{Exclude Tools?}
AllTools --> ExcludeTools
ExcludeTools -->|Yes| RemoveTools[Remove Excluded<br/>Tools]
ExcludeTools -->|No| FinalTools[Final Tool Set]
RemoveTools --> FinalTools
FinalTools --> AIModel[Available to AI Model]
NoMCP --> AIModel
```
### **Filtering Configuration Levels**
**Request-Level Filtering:**
```bash
# Include only specific MCP clients
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-mcp-include-clients: filesystem,websearch" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
# Include only specific tools
curl -X POST http://localhost:8080/v1/chat/completions \
-H "x-bf-mcp-include-tools: filesystem-read_file,websearch-search" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
```
**Configuration-Level Filtering:**
- **Client Selection** - Choose which MCP servers to connect to
- **Tool Blacklisting** - Permanently disable dangerous or unwanted tools
- **Permission Mapping** - Map user roles to available tool sets
- **Environment-Based** - Different tool sets for development vs production
**Security Benefits:**
- **Principle of Least Privilege** - Only necessary tools are exposed
- **Dynamic Access Control** - Per-request tool availability
- **Audit Trail** - Track which tools are used by which requests
- **Risk Mitigation** - Prevent access to dangerous operations
> **📖 Tool Filtering:** [MCP Tool Control →](../../mcp/filtering)
---
## Tool Execution Engine
### **Async Tool Execution Architecture**
The MCP execution engine handles tool invocation asynchronously to maintain system responsiveness and enable complex multi-tool workflows:
```mermaid
sequenceDiagram
participant AIModel
participant ExecutionEngine
participant ToolInvoker
participant MCPServer
participant ResultProcessor
AIModel->>ExecutionEngine: Tool Call Request
ExecutionEngine->>ExecutionEngine: Validate Tool Call
ExecutionEngine->>ToolInvoker: Queue Tool Execution
Note over ToolInvoker: Async Tool Execution
ToolInvoker->>MCPServer: Invoke Tool
MCPServer->>MCPServer: Execute Tool Logic
MCPServer-->>ToolInvoker: Raw Tool Result
ToolInvoker->>ResultProcessor: Process Result
ResultProcessor->>ResultProcessor: Format & Validate
ResultProcessor-->>ExecutionEngine: Processed Result
ExecutionEngine-->>AIModel: Tool Execution Complete
Note over AIModel: Multi-turn Conversation
AIModel->>ExecutionEngine: Continue with Tool Results
ExecutionEngine->>ExecutionEngine: Merge Results into Context
ExecutionEngine-->>AIModel: Enhanced Response
```
### **Execution Flow Characteristics**
**Validation Phase:**
- **Parameter Validation** - Ensure tool arguments match expected schema
- **Permission Checking** - Verify tool access permissions for the request
- **Rate Limiting** - Apply per-tool and per-user rate limits
- **Security Scanning** - Check for potentially dangerous operations
**Execution Phase:**
- **Timeout Management** - Bounded execution time to prevent hanging
- **Error Handling** - Graceful handling of tool failures and timeouts
- **Result Streaming** - Support for tools that return streaming responses
- **Resource Monitoring** - Track tool resource usage and performance
**Response Phase:**
- **Result Formatting** - Convert tool outputs to consistent format
- **Error Enrichment** - Add context and suggestions for tool failures
- **Multi-Result Aggregation** - Combine multiple tool outputs coherently
- **Context Integration** - Merge tool results into conversation context
### **Multi-Turn Conversation Support**
The MCP system enables sophisticated multi-turn conversations where AI models can:
1. **Initial Tool Discovery** - Request available tools for a given context
2. **Tool Execution** - Execute one or more tools based on user request
3. **Result Analysis** - Analyze tool outputs and determine next steps
4. **Follow-up Actions** - Execute additional tools based on previous results
5. **Response Synthesis** - Combine tool results into coherent user response
**Example Multi-Turn Flow:**
```
User: "Find recent news about AI and save interesting articles"
AI: → Execute web_search("AI news recent")
AI: → Analyze search results
AI: → Execute save_article() for each interesting result
AI: → Respond with summary of saved articles
```
### **Complete User-Controlled Tool Execution Flow**
The following diagram shows the end-to-end user experience with MCP tool execution, highlighting the critical user control points and decision-making process:
```mermaid
flowchart TD
A["👤 User Message<br/>\"List files in current directory\""] --> B["🤖 Bifrost Core"]
B --> C["🔧 MCP Manager<br/>Auto-discovers and adds<br/>available tools to request"]
C --> D["🌐 LLM Provider<br/>(OpenAI, Anthropic, etc.)"]
D --> E{"🔍 Response contains<br/>tool_calls?"}
E -->|No| F["✅ Final Response<br/>Display to user"]
E -->|Yes| G["📝 Add assistant message<br/>with tool_calls to history"]
G --> H["🛡️ YOUR EXECUTION LOGIC<br/>(Security, Approval, Logging)"]
H --> I{"🤔 User Decision Point<br/>Execute this tool?"}
I -->|Deny| J["❌ Create denial result<br/>Add to conversation history"]
I -->|Approve| K["⚙️ client.ExecuteMCPTool()<br/>Bifrost executes via MCP"]
K --> L["📊 Tool Result<br/>Add to conversation history"]
J --> M["🔄 Continue conversation loop<br/>Send updated history back to LLM"]
L --> M
M --> D
style A fill:#e1f5fe
style F fill:#e8f5e8
style H fill:#fff3e0
style I fill:#fce4ec
style K fill:#f3e5f5
```
**Key Flow Characteristics:**
**User Control Points:**
- **Security Layer** - Your application controls all tool execution decisions
- **Approval Gate** - Users can approve or deny each tool execution
- **Transparency** - Full visibility into what tools will be executed and why
- **Conversation Continuity** - Tool results seamlessly integrate into conversation flow
**Security Benefits:**
- **No Automatic Execution** - Tools never execute without explicit approval
- **Audit Trail** - Complete logging of all tool execution decisions
- **Contextual Security** - Approval decisions can consider full conversation context
- **Graceful Denials** - Denied tools result in informative responses, not errors
**Implementation Patterns:**
```go
// Example tool execution control in your application
func handleToolExecution(toolCall schemas.ChatToolCall, userContext UserContext) error {
// YOUR SECURITY AND APPROVAL LOGIC HERE
if !userContext.HasPermission(toolCall.Function.Name) {
return createDenialResponse("Tool not permitted for user role")
}
if requiresApproval(toolCall) {
approved := promptUserForApproval(toolCall)
if !approved {
return createDenialResponse("User denied tool execution")
}
}
// Execute the tool via Bifrost
result, err := client.ExecuteMCPTool(ctx, toolCall)
if err != nil {
return handleToolError(err)
}
return addToolResultToHistory(result)
}
```
This flow ensures that while AI models can discover and request tool usage, all actual execution remains under user control, providing the perfect balance of AI capability and human oversight.
---
## Agent Mode Architecture
Agent Mode transforms Bifrost into an autonomous agent runtime by automatically executing pre-approved tools. This section details the internal architecture of the agent execution loop.
### **Agent Execution Loop**
The agent mode operates as an iterative loop that continues until one of the termination conditions is met:
```mermaid
flowchart TD
subgraph "Agent Mode Entry"
A["📥 Incoming Chat Request"] --> B{"🔍 Check MCP Config<br/>Any tools_to_auto_execute?"}
B -->|No| C["📤 Standard Flow<br/>Return tool_calls for manual execution"]
B -->|Yes| D["🤖 Enter Agent Loop"]
end
subgraph "Agent Execution Loop"
D --> E["🌐 Send to LLM Provider<br/>With available tools"]
E --> F{"🔧 Response has<br/>tool_calls?"}
F -->|No| G["✅ Return Final Response<br/>No more tools needed"]
F -->|Yes| H["📋 Classify Tool Calls"]
H --> I{"🔐 Separate by<br/>auto-execute status"}
I --> J["⚡ Auto-Executable Tools"]
I --> K["🛡️ Non-Auto-Executable Tools"]
J --> L["🔄 Execute in Parallel<br/>Via ToolsManager"]
L --> M["📊 Collect Results"]
K --> N{"Any non-auto<br/>tools found?"}
N -->|Yes| O["🛑 Exit Loop Early<br/>Return mixed response"]
N -->|No| P{"⏱️ Max depth<br/>reached?"}
M --> P
P -->|Yes| Q["⚠️ Return Current State<br/>May have pending tools"]
P -->|No| R["📝 Add results to history"]
R --> E
end
subgraph "Response Handling"
O --> S["📦 Create Mixed Response<br/>• Content: executed results JSON<br/>• tool_calls: pending tools<br/>• finish_reason: stop"]
G --> T["📦 Standard Response<br/>Final answer from LLM"]
Q --> U["📦 Depth Limit Response<br/>Current state with any pending"]
end
style D fill:#e3f2fd
style L fill:#e8f5e9
style O fill:#fff3e0
style S fill:#fce4ec
```
### **Tool Classification System**
When the LLM returns tool calls, Bifrost classifies each tool based on the client configuration:
```mermaid
flowchart LR
subgraph "Tool Call Classification"
TC["🔧 Tool Call<br/>from LLM Response"] --> CHECK{"Tool in<br/>tools_to_execute?"}
CHECK -->|No| SKIP["❌ Skip<br/>Not allowed"]
CHECK -->|Yes| AUTO{"Tool in<br/>tools_to_auto_execute?"}
AUTO -->|Yes| EXEC["⚡ Auto-Execute<br/>Run immediately"]
AUTO -->|No| MANUAL["🛡️ Manual<br/>Return to caller"]
end
subgraph "Configuration Example"
CONFIG["MCPClientConfig"]
CONFIG --> TE["tools_to_execute: [*]<br/>All tools available"]
CONFIG --> TAE["tools_to_auto_execute:<br/>[read_file, list_dir]"]
end
style EXEC fill:#c8e6c9
style MANUAL fill:#fff9c4
style SKIP fill:#ffcdd2
```
### **Mixed Tool Response Format**
When a response contains both auto-executable and non-auto-executable tools, the agent creates a special response format:
<AccordionGroup>
<Accordion title="Chat API Response Format" icon="message" defaultOpen>
```json
{
"id": "chatcmpl-abc123",
"choices": [{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "The Output from allowed tools calls is - {\"filesystem_read_file\":\"file contents here\",\"filesystem_list_directory\":\"[\\\"file1.txt\\\",\\\"file2.txt\\\"]\"}\n\nNow I shall call these tools next...",
"tool_calls": [
{
"id": "call_write_123",
"type": "function",
"function": {
"name": "filesystem_write_file",
"arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
}
}
]
}
}]
}
```
<Note>
The `content` field contains JSON-formatted results from auto-executed tools. The `tool_calls` array contains only non-auto-executable tools awaiting approval. Setting `finish_reason` to `"stop"` ensures the agent loop exits.
</Note>
</Accordion>
<Accordion title="Responses API Format" icon="code">
```json
{
"id": "resp-abc123",
"output": [
{
"type": "message",
"role": "assistant",
"content": [{
"type": "text",
"text": "The Output from allowed tools calls is - {...}\n\nNow I shall call these tools next..."
}]
},
{
"type": "function_call",
"role": "assistant",
"call_id": "call_write_123",
"name": "filesystem_write_file",
"arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
}
]
}
```
</Accordion>
</AccordionGroup>
### **Agent Depth Control**
The `max_agent_depth` setting prevents infinite loops and controls resource usage:
```mermaid
graph LR
subgraph "Depth Tracking"
D0["Depth 0<br/>Initial Request"] --> D1["Depth 1<br/>First tool execution"]
D1 --> D2["Depth 2<br/>Second iteration"]
D2 --> D3["Depth 3<br/>..."]
D3 --> DN["Depth N<br/>Max reached"]
end
DN --> EXIT["🛑 Force Exit<br/>Return current state"]
subgraph "Configuration"
CFG["MCPToolManagerConfig"]
CFG --> MAX["max_agent_depth: 10<br/>(default)"]
CFG --> TIMEOUT["tool_execution_timeout:<br/>30s per tool"]
end
```
<Warning>
When max depth is reached, the response may contain pending tool calls that weren't executed. Your application should handle this gracefully.
</Warning>
---
## Code Mode Architecture
Code Mode enables AI models to write and execute Python code (Starlark) that orchestrates multiple MCP tools in a single request. This provides a powerful meta-layer for complex multi-tool workflows.
### **Code Mode System Overview**
```mermaid
graph TB
subgraph "Code Mode Components"
VM["🖥️ Starlark Interpreter<br/>Python-like Runtime"]
VFS["📁 Virtual File System<br/>Tool Definitions as .pyi"]
EXEC["⚙️ Code Executor<br/>Sandboxed Execution"]
end
subgraph "Meta Tools"
LIST["listToolFiles()<br/>Discover available servers"]
READ["readToolFile(fileName)<br/>Get tool signatures"]
DOCS["getToolDocs(server, tool)<br/>Get detailed docs"]
CODE["executeToolCode(code)<br/>Run Python code"]
end
subgraph "MCP Integration"
TOOLS["🔧 Connected MCP Tools"]
RESULTS["📊 Tool Results"]
end
LLM["🤖 LLM"] --> LIST
LIST --> VFS
VFS --> LLM
LLM --> READ
READ --> VFS
VFS --> LLM
LLM --> DOCS
DOCS --> VFS
VFS --> LLM
LLM --> CODE
CODE --> VM
VM --> EXEC
EXEC --> TOOLS
TOOLS --> RESULTS
RESULTS --> LLM
style VM fill:#e8eaf6
style VFS fill:#e3f2fd
style CODE fill:#e8f5e9
```
### **Virtual File System (VFS)**
Code Mode generates Python stub files (`.pyi`) for all connected MCP tools, providing compact function signatures:
<Tabs>
<Tab title="Server-Level Binding">
When `code_mode_binding_level: "server"` (default), tools are grouped by MCP client:
```
servers/
├── filesystem.pyi → All filesystem tools
├── web_search.pyi → All web search tools
└── database.pyi → All database tools
```
**Generated Stub Example:**
```python
# servers/filesystem.pyi
# Usage: filesystem.tool_name(param=value)
# For detailed docs: use getToolDocs(server="filesystem", tool="tool_name")
def read_file(path: str) -> dict: # Read contents of a file
def write_file(path: str, content: str) -> dict: # Write content to a file
def list_directory(path: str) -> dict: # List directory contents
```
**Usage in Code:**
```python
files = filesystem.list_directory(path=".")
content = filesystem.read_file(path=files["entries"][0])
result = content
```
</Tab>
<Tab title="Tool-Level Binding">
When `code_mode_binding_level: "tool"`, each tool gets its own file:
```
servers/
├── filesystem/
│ ├── read_file.pyi
│ ├── write_file.pyi
│ └── list_directory.pyi
├── web_search/
│ └── search.pyi
└── database/
└── query.pyi
```
**Generated Stub Example:**
```python
# servers/filesystem/read_file.pyi
# Usage: filesystem.read_file(param=value)
def read_file(path: str) -> dict: # Read contents of a file
```
**Usage in Code:**
```python
content = filesystem.read_file(path="config.json")
result = content
```
</Tab>
</Tabs>
### **Code Execution Flow**
```mermaid
sequenceDiagram
participant LLM as 🤖 LLM
participant CM as 📝 Code Mode Handler
participant VM as 🖥️ Starlark Interpreter
participant TM as 🔧 Tools Manager
participant MCP as 🌐 MCP Servers
LLM->>CM: executeToolCode({ code: "..." })
CM->>VM: Initialize sandbox
CM->>VM: Inject tool bindings
CM->>VM: Execute Python code
loop For each tool call in code
VM->>TM: server.tool(param=value)
TM->>MCP: Execute tool
MCP-->>TM: Tool result
TM-->>VM: Return result
end
VM-->>CM: Execution result
CM-->>LLM: { result, logs }
```
### **Starlark Sandbox**
The code execution environment is carefully sandboxed using Starlark, a Python-like language designed for configuration and embedded scripting:
<AccordionGroup>
<Accordion title="Available Features" icon="check" defaultOpen>
- ✅ **Python-like syntax** - Familiar Python syntax and semantics
- ✅ **Synchronous calls** - No async/await needed, direct function calls
- ✅ **List comprehensions** - `[x for x in items if condition]`
- ✅ **print()** - Output captured and returned in logs
- ✅ **Dict/List operations** - Standard Python data structures
- ✅ **Tool bindings** - All connected MCP tools as globals
</Accordion>
<Accordion title="Restricted Features" icon="ban">
- ❌ **Imports** - No `import` statements (tools are pre-bound)
- ❌ **Classes** - Use dicts and functions instead
- ❌ **File I/O** - No direct filesystem access (use MCP tools)
- ❌ **Network** - No direct network access (use MCP tools)
- ❌ **Randomness/Time** - Deterministic execution only
</Accordion>
</AccordionGroup>
### **Code Mode Security Model**
```mermaid
graph TB
subgraph "Security Layers"
L1["🔒 Code Validation<br/>Syntax checking before execution"]
L2["🛡️ Sandboxed Runtime<br/>No external module access"]
L3["⏱️ Execution Timeout<br/>Bounded runtime"]
L4["🔐 Tool ACL<br/>Only allowed tools accessible"]
end
subgraph "Execution Boundaries"
B1["No filesystem access<br/>(except via MCP tools)"]
B2["No network access<br/>(except via MCP tools)"]
B3["No process spawning"]
B4["Memory isolation enforced"]
end
L1 --> L2 --> L3 --> L4
L4 --> B1
L4 --> B2
L4 --> B3
L4 --> B4
```
### **Code Mode Configuration**
<Tabs>
<Tab title="Gateway (config.json)">
```json
{
"mcp": {
"client_configs": [
{
"name": "filesystem",
"is_code_mode_client": true,
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": ["-y", "@anthropic/mcp-filesystem"]
},
"tools_to_execute": ["*"]
}
],
"tool_manager_config": {
"code_mode_binding_level": "server",
"tool_execution_timeout": "30s"
}
}
}
```
</Tab>
<Tab title="Go SDK">
```go
mcpConfig := &schemas.MCPConfig{
ClientConfigs: []schemas.MCPClientConfig{
{
Name: "filesystem",
IsCodeModeClient: true,
ConnectionType: schemas.MCPConnectionTypeSTDIO,
StdioConfig: &schemas.MCPStdioConfig{
Command: "npx",
Args: []string{"-y", "@anthropic/mcp-filesystem"},
},
ToolsToExecute: []string{"*"},
},
},
ToolManagerConfig: &schemas.MCPToolManagerConfig{
CodeModeBindingLevel: schemas.CodeModeBindingLevelServer,
ToolExecutionTimeout: 30 * time.Second,
},
}
```
</Tab>
</Tabs>
### **Code Mode vs Agent Mode**
| Aspect | Agent Mode | Code Mode |
|--------|------------|-----------|
| **Execution Model** | LLM decides one tool at a time | LLM writes code orchestrating multiple tools |
| **Iterations** | Multiple LLM round-trips | Single LLM call, code handles orchestration |
| **Complexity** | Simple tool chains | Complex workflows with conditionals/loops |
| **Latency** | Higher (multiple LLM calls) | Lower (single LLM call + code execution) |
| **Control** | Per-tool approval possible | Code runs atomically |
| **Best For** | Interactive agents | Batch operations, complex data processing |
---
## MCP Integration Patterns
### **Common Integration Scenarios**
**1. Filesystem Operations**
- **Tools:** `list_files`, `read_file`, `write_file`, `create_directory`
- **Use Cases:** Code analysis, document processing, file management
- **Security:** Sandboxed file access, path validation, permission checks
- **Performance:** Local execution for fast file operations
**2. Web Search & Information Retrieval**
- **Tools:** `web_search`, `fetch_url`, `extract_content`, `summarize`
- **Use Cases:** Research assistance, fact-checking, content gathering
- **Integration:** External search APIs, content parsing services
- **Caching:** Response caching for repeated queries
**3. Database Operations**
- **Tools:** `query_database`, `insert_record`, `update_record`, `schema_info`
- **Use Cases:** Data analysis, report generation, database administration
- **Security:** Read-only access by default, query validation, injection prevention
- **Performance:** Connection pooling, query optimization
**4. API Integrations**
- **Tools:** Custom business logic tools, third-party service integration
- **Use Cases:** CRM operations, payment processing, notification sending
- **Authentication:** API key management, OAuth token handling
- **Error Handling:** Retry logic, fallback mechanisms
### **MCP Server Development Patterns**
**Simple STDIO Server:**
- **Language:** Any language that can read/write JSON to stdin/stdout
- **Deployment:** Single executable, minimal dependencies
- **Use Case:** Local tools, development utilities, simple scripts
**HTTP Service Server:**
- **Architecture:** RESTful API with MCP protocol endpoints
- **Scalability:** Horizontal scaling, load balancing
- **Use Case:** Shared tools, enterprise integrations, cloud services
**Hybrid Approach:**
- **Local + Remote:** Combine STDIO tools for local operations with HTTP for remote services
- **Failover:** Use local fallbacks when remote services are unavailable
- **Optimization:** Route tool calls to most appropriate execution environment
> **📖 MCP Development:** [Tool Development Guide →](../../mcp/overview)
---
## Security & Safety Considerations
### **MCP Security Architecture**
```mermaid
graph TB
subgraph "Security Layers"
L1[Connection Security<br/>Authentication & Encryption]
L2[Tool Validation<br/>Schema & Permission Checks]
L3[Execution Security<br/>Sandboxing & Limits]
L4[Result Security<br/>Output Validation & Filtering]
end
subgraph "Threat Mitigation"
T1[Malicious Tools<br/>Code Injection Prevention]
T2[Resource Abuse<br/>Rate Limiting & Quotas]
T3[Data Exposure<br/>Output Sanitization]
T4[System Access<br/>Privilege Isolation]
end
L1 --> T1
L2 --> T2
L3 --> T4
L4 --> T3
```
**Security Measures:**
**Connection Security:**
- **Authentication** - API keys, certificates, or token-based auth for HTTP/SSE
- **Encryption** - TLS for HTTP connections, secure pipes for STDIO
- **Network Isolation** - Firewall rules and network segmentation
**Execution Security:**
- **Sandboxing** - Isolated execution environments for tools
- **Resource Limits** - CPU, memory, and time constraints
- **Permission Model** - Principle of least privilege for tool access
**Operational Security:**
- **Regular Updates** - Keep MCP servers and tools updated
- **Monitoring** - Continuous security monitoring and alerting
- **Incident Response** - Procedures for security incidents involving tools
---
## Related Architecture Documentation
- **[Request Flow](./request-flow)** - MCP integration in request processing
- **[Concurrency Model](./concurrency)** - MCP concurrency and worker integration
- **[Plugin System](./plugins)** - Integration between MCP and plugin systems
- **[Benchmarks](../../benchmarking/getting-started)** - MCP performance impact and optimization

View File

@@ -0,0 +1,552 @@
---
title: "Plugins"
description: "Deep dive into Bifrost's extensible plugin architecture - how plugins work internally, lifecycle management, execution model, and integration patterns."
icon: "puzzle-piece"
---
## Plugin Architecture Philosophy
### **Core Design Principles**
Bifrost's plugin system is built around five key principles that ensure extensibility without compromising performance or reliability:
| Principle | Implementation | Benefit |
| ----------------------------- | ------------------------------------------------ | ------------------------------------------------ |
| **Plugin-First Design** | Core logic designed around plugin hook points | Maximum extensibility without core modifications |
| **Zero-Copy Integration** | Direct memory access to request/response objects | Minimal performance overhead |
| **Lifecycle Management** | Complete plugin lifecycle with automatic cleanup | Resource safety and leak prevention |
| **Interface-Based Safety** | Well-defined interfaces for type safety | Compile-time validation and consistency |
| **Failure Isolation** | Plugin errors don't crash the core system | Fault tolerance and system stability |
### **Plugin System Overview**
```mermaid
graph TB
subgraph "Plugin Management Layer"
PluginMgr[Plugin Manager<br/>Central Controller]
Registry[Plugin Registry<br/>Discovery & Loading]
Lifecycle[Lifecycle Manager<br/>State Management]
end
subgraph "Plugin Execution Layer"
Pipeline[Plugin Pipeline<br/>Execution Orchestrator]
PreHooks[Pre-Processing Hooks<br/>Request Modification]
PostHooks[Post-Processing Hooks<br/>Response Enhancement]
end
subgraph "Plugin Categories"
Auth[Authentication<br/>& Authorization]
RateLimit[Rate Limiting<br/>& Throttling]
Transform[Data Transformation<br/>& Validation]
Monitor[Monitoring<br/>& Analytics]
Custom[Custom Business<br/>Logic]
end
PluginMgr --> Registry
Registry --> Lifecycle
Lifecycle --> Pipeline
Pipeline --> PreHooks
Pipeline --> PostHooks
PreHooks --> Auth
PreHooks --> RateLimit
PostHooks --> Transform
PostHooks --> Monitor
PostHooks --> Custom
```
---
## Plugin Lifecycle Management
### **Complete Lifecycle States**
Every plugin goes through a well-defined lifecycle that ensures proper resource management and error handling:
```mermaid
stateDiagram-v2
[*] --> PluginInit: Plugin Creation
PluginInit --> Registered: Add to BifrostConfig
Registered --> PreHookCall: Request Received
PreHookCall --> ModifyRequest: Normal Flow
PreHookCall --> ShortCircuitResponse: Return Response
PreHookCall --> ShortCircuitError: Return Error
ModifyRequest --> ProviderCall: Send to Provider
ProviderCall --> PostHookCall: Receive Response
ShortCircuitResponse --> PostHookCall: Skip Provider
ShortCircuitError --> PostHookCall: Pipeline Symmetry
PostHookCall --> ModifyResponse: Process Result
PostHookCall --> RecoverError: Error Recovery
PostHookCall --> FallbackCheck: Check AllowFallbacks
PostHookCall --> ResponseReady: Pass Through
FallbackCheck --> TryFallback: AllowFallbacks=true/nil
FallbackCheck --> ResponseReady: AllowFallbacks=false
TryFallback --> PreHookCall: Next Provider
ModifyResponse --> ResponseReady: Modified
RecoverError --> ResponseReady: Recovered
ResponseReady --> [*]: Return to Client
Registered --> CleanupCall: Bifrost Shutdown
CleanupCall --> [*]: Plugin Destroyed
```
### **Lifecycle Phase Details**
**Discovery Phase:**
- **Purpose:** Find and catalog available plugins
- **Sources:** Command line, environment variables, JSON configuration, directory scanning
- **Validation:** Basic existence and format checks
- **Output:** Plugin descriptors with metadata
**Loading Phase:**
- **Purpose:** Load plugin binaries into memory
- **Security:** Digital signature verification and checksum validation
- **Compatibility:** Interface implementation validation
- **Resource:** Memory and capability assessment
**Initialization Phase:**
- **Purpose:** Configure plugin with runtime settings
- **Timeout:** Bounded initialization time to prevent hanging
- **Dependencies:** External service connectivity verification
- **State:** Internal state setup and resource allocation
**Runtime Phase:**
- **Purpose:** Active request processing
- **Monitoring:** Continuous health checking and performance tracking
- **Recovery:** Automatic error recovery and degraded mode handling
- **Metrics:** Real-time performance and health metrics collection
> **Plugin Lifecycle:** [Plugin Management →](../../enterprise/custom-plugins)
---
## Plugin Execution Pipeline
### **Request Processing Flow**
The plugin pipeline ensures consistent, predictable execution while maintaining high performance:
#### **Normal Execution Flow (No Short-Circuit)**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Plugin2
participant Provider
Client->>Bifrost: Request
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: modified request
Bifrost->>Plugin2: PreLLMHook(request)
Plugin2-->>Bifrost: modified request
Bifrost->>Provider: API Call
Provider-->>Bifrost: response
Bifrost->>Plugin2: PostLLMHook(response)
Plugin2-->>Bifrost: modified response
Bifrost->>Plugin1: PostLLMHook(response)
Plugin1-->>Bifrost: modified response
Bifrost-->>Client: Final Response
```
**Execution Order:**
1. **PreHooks:** Execute in registration order (1 → 2 → N)
2. **Provider Call:** If no short-circuit occurred
3. **PostHooks:** Execute in reverse order (N → 2 → 1)
#### **Short-Circuit Response Flow (Cache Hit)**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Cache
participant Auth
participant Provider
Client->>Bifrost: Request
Bifrost->>Auth: PreLLMHook(request)
Auth-->>Bifrost: modified request
Bifrost->>Cache: PreLLMHook(request)
Cache-->>Bifrost: LLMPluginShortCircuit{Response}
Note over Provider: Provider call skipped
Bifrost->>Cache: PostLLMHook(response)
Cache-->>Bifrost: modified response
Bifrost->>Auth: PostLLMHook(response)
Auth-->>Bifrost: modified response
Bifrost-->>Client: Cached Response
```
#### **Streaming Response Flow**
For streaming responses, the plugin pipeline executes post-hooks for every delta/chunk received from the provider:
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Plugin2
participant Provider
Client->>Bifrost: Stream Request
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: modified request
Bifrost->>Plugin2: PreLLMHook(request)
Plugin2-->>Bifrost: modified request
Bifrost->>Provider: Stream API Call
loop For Each Delta
Provider-->>Bifrost: stream delta
Bifrost->>Plugin2: PostLLMHook(delta)
Plugin2-->>Bifrost: modified delta
Bifrost->>Plugin1: PostLLMHook(delta)
Plugin1-->>Bifrost: modified delta
Bifrost-->>Client: Send Delta
end
Provider-->>Bifrost: final chunk (finish reason)
Bifrost->>Plugin2: PostLLMHook(final)
Plugin2-->>Bifrost: modified final
Bifrost->>Plugin1: PostLLMHook(final)
Plugin1-->>Bifrost: modified final
Bifrost-->>Client: Final Chunk
```
**Streaming Execution Characteristics:**
1. **Delta Processing:**
- Each stream delta (chunk) goes through all post-hooks
- Plugins can modify/transform each delta before it reaches the client
- Deltas can contain: text content, tool calls, role changes, or usage info
2. **Special Delta Types:**
- **Start Event:** Initial delta with role information
- **Content Delta:** Regular text or tool call content
- **Usage Update:** Token usage statistics (if enabled)
- **Final Chunk:** Contains finish reason and any final metadata
3. **Plugin Considerations:**
- Plugins must handle streaming responses efficiently
- Each delta should be processed quickly to maintain stream responsiveness
- Plugins can track state across deltas using context
- Heavy processing should be done asynchronously
4. **Error Handling:**
- If a post-hook returns an error, it's sent as an error stream chunk
- Stream is terminated after error chunks
- Plugins can recover from errors by providing valid responses
5. **Performance Optimization:**
- Lightweight delta processing to minimize latency
- Object pooling for common data structures
- Non-blocking operations for logging and metrics
- Efficient memory management for stream processing
> **Streaming Details:** [Streaming Guide →](../../quickstart/gateway/streaming)
**Short-Circuit Rules:**
- **Provider Skipped:** When plugin returns short-circuit response/error
- **PostLLMHook Guarantee:** All executed PreHooks get corresponding PostLLMHook calls
- **Reverse Order:** PostHooks execute in reverse order of PreHooks
#### **Short-Circuit Error Flow (Allow Fallbacks)**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Provider1
participant Provider2
Client->>Bifrost: Request (Provider1 + Fallback Provider2)
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: LLMPluginShortCircuit{Error, AllowFallbacks=true}
Note over Provider1: Provider1 call skipped
Bifrost->>Plugin1: PostLLMHook(error)
Plugin1-->>Bifrost: error unchanged
Note over Bifrost: Try fallback provider
Bifrost->>Plugin1: PreLLMHook(request for Provider2)
Plugin1-->>Bifrost: modified request
Bifrost->>Provider2: API Call
Provider2-->>Bifrost: response
Bifrost->>Plugin1: PostLLMHook(response)
Plugin1-->>Bifrost: modified response
Bifrost-->>Client: Final Response
```
#### **Error Recovery Flow**
```mermaid
sequenceDiagram
participant Client
participant Bifrost
participant Plugin1
participant Plugin2
participant Provider
participant RecoveryPlugin
Client->>Bifrost: Request
Bifrost->>Plugin1: PreLLMHook(request)
Plugin1-->>Bifrost: modified request
Bifrost->>Plugin2: PreLLMHook(request)
Plugin2-->>Bifrost: modified request
Bifrost->>RecoveryPlugin: PreLLMHook(request)
RecoveryPlugin-->>Bifrost: modified request
Bifrost->>Provider: API Call
Provider-->>Bifrost: error
Bifrost->>RecoveryPlugin: PostLLMHook(error)
RecoveryPlugin-->>Bifrost: recovered response
Bifrost->>Plugin2: PostLLMHook(response)
Plugin2-->>Bifrost: modified response
Bifrost->>Plugin1: PostLLMHook(response)
Plugin1-->>Bifrost: modified response
Bifrost-->>Client: Recovered Response
```
**Error Recovery Features:**
- **Error Transformation:** Plugins can convert errors to successful responses
- **Graceful Degradation:** Provide fallback responses for service failures
- **Context Preservation:** Error context is maintained through recovery process
### **Complex Plugin Decision Flow**
Real-world plugin interactions involving authentication, rate limiting, and caching with different decision paths:
```mermaid
graph TD
A["Client Request"] --> B["Bifrost"]
B --> C["Auth Plugin PreLLMHook"]
C --> D{"Authenticated?"}
D -->|No| E["Return Auth Error<br/>AllowFallbacks=false"]
D -->|Yes| F["RateLimit Plugin PreLLMHook"]
F --> G{"Rate Limited?"}
G -->|Yes| H["Return Rate Error<br/>AllowFallbacks=nil"]
G -->|No| I["Cache Plugin PreLLMHook"]
I --> J{"Cache Hit?"}
J -->|Yes| K["Return Cached Response"]
J -->|No| L["Provider API Call"]
L --> M["Cache Plugin PostLLMHook"]
M --> N["Store in Cache"]
N --> O["RateLimit Plugin PostLLMHook"]
O --> P["Auth Plugin PostLLMHook"]
P --> Q["Final Response"]
E --> R["Skip Fallbacks"]
H --> S["Try Fallback Provider"]
K --> T["Skip Provider Call"]
```
### **Execution Characteristics**
**Symmetric Execution Pattern:**
- **Pre-processing:** Plugins execute in priority order (high to low)
- **Post-processing:** Plugins execute in reverse order (low to high)
- **Rationale:** Ensures proper cleanup and state management (last in, first out)
**Performance Optimizations:**
- **Timeout Boundaries:** Each plugin has configurable execution timeouts
- **Panic Recovery:** Plugin panics are caught and logged without crashing the system
- **Resource Limits:** Memory and CPU limits prevent runaway plugins
- **Circuit Breaking:** Repeated failures trigger plugin isolation
**Error Handling Strategies:**
- **Continue:** Use original request/response if plugin fails
- **Fail Fast:** Return error immediately if critical plugin fails
- **Retry:** Attempt plugin execution with exponential backoff
- **Fallback:** Use alternative plugin or default behavior
> **Plugin Execution:** [Request Flow →](./request-flow#stage-3-plugin-pipeline-processing)
---
## Security & Validation
### **Multi-Layer Security Model**
Plugin security operates at multiple layers to ensure system integrity:
```mermaid
graph TB
subgraph "Security Validation Layers"
L1[Layer 1: Binary Validation<br/>Signature & Checksum]
L2[Layer 2: Interface Validation<br/>Type Safety & Compatibility]
L3[Layer 3: Runtime Validation<br/>Resource Limits & Timeouts]
L4[Layer 4: Execution Isolation<br/>Panic Recovery & Error Handling]
end
subgraph "Security Benefits"
Integrity[Code Integrity<br/>Verified Authenticity]
Safety[Type Safety<br/>Compile-time Checks]
Stability[System Stability<br/>Isolated Failures]
Performance[Performance Protection<br/>Resource Limits]
end
L1 --> Integrity
L2 --> Safety
L3 --> Performance
L4 --> Stability
```
### **Validation Process**
**Binary Security:**
- **Digital Signatures:** Cryptographic verification of plugin authenticity
- **Checksum Validation:** File integrity verification
- **Source Verification:** Trusted source requirements
**Interface Security:**
- **Type Safety:** Interface implementation verification
- **Version Compatibility:** Plugin API version checking
- **Memory Safety:** Safe memory access patterns
**Runtime Security:**
- **Resource Quotas:** Memory and CPU usage limits
- **Execution Timeouts:** Bounded execution time
- **Sandbox Execution:** Isolated execution environment
**Operational Security:**
- **Health Monitoring:** Continuous plugin health assessment
- **Error Tracking:** Plugin error rate monitoring
- **Automatic Recovery:** Failed plugin restart and recovery
---
## Plugin Performance & Monitoring
### **Comprehensive Metrics System**
Bifrost provides detailed metrics for plugin performance and health monitoring:
```mermaid
graph TB
subgraph "Execution Metrics"
ExecTime[Execution Time<br/>Latency per Plugin]
ExecCount[Execution Count<br/>Request Volume]
SuccessRate[Success Rate<br/>Error Percentage]
Throughput[Throughput<br/>Requests/Second]
end
subgraph "Resource Metrics"
MemoryUsage[Memory Usage<br/>Per Plugin Instance]
CPUUsage[CPU Utilization<br/>Processing Time]
IOMetrics[I/O Operations<br/>Network/Disk Activity]
PoolUtilization[Pool Utilization<br/>Resource Efficiency]
end
subgraph "Health Metrics"
ErrorRate[Error Rate<br/>Failed Executions]
PanicCount[Panic Recovery<br/>Crash Events]
TimeoutCount[Timeout Events<br/>Slow Executions]
RecoveryRate[Recovery Success<br/>Failure Handling]
end
subgraph "Business Metrics"
AddedLatency[Added Latency<br/>Plugin Overhead]
SystemImpact[System Impact<br/>Overall Performance]
FeatureUsage[Feature Usage<br/>Plugin Utilization]
CostImpact[Cost Impact<br/>Resource Consumption]
end
```
### **Performance Characteristics**
**Plugin Execution Performance:**
- **Typical Overhead:** 1-10μs per plugin for simple operations
- **Authentication Plugins:** 1-5μs for key validation
- **Rate Limiting Plugins:** 500ns for quota checks
- **Monitoring Plugins:** 200ns for metric collection
- **Transformation Plugins:** 2-10μs depending on complexity
**Resource Usage Patterns:**
- **Memory Efficiency:** Object pooling reduces allocations
- **CPU Optimization:** Minimal processing overhead
- **Network Impact:** Configurable external service calls
- **Storage Overhead:** Minimal for stateless plugins
---
## Plugin Integration Patterns
### **Common Integration Scenarios**
**1. Authentication & Authorization**
- **Pre-processing Hook:** Validate API keys or JWT tokens
- **Configuration:** External identity provider integration
- **Error Handling:** Return 401/403 responses for invalid credentials
- **Performance:** Sub-5μs validation with caching
**2. Rate Limiting & Quotas**
- **Pre-processing Hook:** Check request quotas and limits
- **Storage:** Redis or in-memory rate limit tracking
- **Algorithms:** Token bucket, sliding window, fixed window
- **Responses:** 429 Too Many Requests with retry headers
**3. Request/Response Transformation**
- **Dual Hooks:** Pre-processing for requests, post-processing for responses
- **Use Cases:** Data format conversion, field mapping, content filtering
- **Performance:** Streaming transformations for large payloads
- **Compatibility:** Provider-specific format adaptations
**4. Monitoring & Analytics**
- **Post-processing Hook:** Collect metrics and logs after request completion
- **Destinations:** Prometheus, DataDog, custom analytics systems
- **Data:** Request/response metadata, performance metrics, error tracking
- **Privacy:** Configurable data sanitization and filtering
### **Plugin Communication Patterns**
**Plugin-to-Plugin Communication:**
- **Shared Context:** Plugins can store data in request context for downstream plugins
- **Event System:** Plugin can emit events for other plugins to consume
- **Data Passing:** Structured data exchange between related plugins
**Plugin-to-External Service Communication:**
- **HTTP Clients:** Built-in HTTP client pools for external API calls
- **Database Connections:** Connection pooling for database access
- **Message Queues:** Integration with message queue systems
- **Caching Systems:** Redis, Memcached integration for state storage
> **📖 Integration Examples:** [Plugin Development Guide →](../../enterprise/custom-plugins)
---
## Related Architecture Documentation
- **[Request Flow](./request-flow)** - Plugin execution in request processing pipeline
- **[Concurrency Model](./concurrency)** - Plugin concurrency and threading considerations
- **[Benchmarks](../../benchmarking/getting-started)** - Plugin performance characteristics and optimization
- **[MCP System](./mcp)** - Integration between plugins and MCP system

View File

View File

@@ -0,0 +1,527 @@
---
title: "Request Flow"
description: "Deep dive into Bifrost's request processing pipeline - from transport layer ingestion through provider execution to response delivery."
icon: "route"
---
## Stage 1: Transport Layer Processing
### **HTTP Transport Flow**
```mermaid
sequenceDiagram
participant Client
participant HTTPTransport
participant Router
participant Validation
Client->>HTTPTransport: POST /v1/chat/completions
HTTPTransport->>HTTPTransport: Parse Headers
HTTPTransport->>HTTPTransport: Extract Body
HTTPTransport->>Validation: Validate JSON Schema
Validation->>Router: BifrostRequest
Router-->>HTTPTransport: Processing Started
HTTPTransport-->>Client: HTTP 200 (async processing)
```
**Key Processing Steps:**
1. **Request Reception** - FastHTTP server receives request
2. **Header Processing** - Extract authentication, content-type, custom headers
3. **Body Parsing** - JSON unmarshaling with schema validation
4. **Request Transformation** - Convert to internal `BifrostRequest` schema
5. **Context Creation** - Build request context with metadata
**Performance Characteristics:**
- **Parsing Time:** ~2.1μs for typical requests
- **Validation Overhead:** ~400ns for schema checks
- **Memory Allocation:** Zero-copy where possible
### **Go SDK Flow**
```mermaid
sequenceDiagram
participant Application
participant SDK
participant Core
participant Validation
Application->>SDK: bifrost.ChatCompletion(req)
SDK->>SDK: Type Validation
SDK->>Core: Direct Function Call
Core->>Validation: Schema Validation
Validation-->>Core: Validated Request
Core-->>SDK: Processing Result
SDK-->>Application: Typed Response
```
**Advantages:**
- **Zero Serialization** - Direct Go struct passing
- **Type Safety** - Compile-time validation
- **Lower Latency** - No HTTP/JSON overhead
- **Memory Efficiency** - No intermediate allocations
---
## Stage 2: Request Routing & Load Balancing
### **Provider Selection Logic**
```mermaid
flowchart TD
Request[Incoming Request] --> ModelCheck{Model Available?}
ModelCheck -->|Yes| ProviderDirect[Use Specified Provider]
ModelCheck -->|No| ModelMapping[Model → Provider Mapping]
ProviderDirect --> KeyPool[API Key Pool]
ModelMapping --> KeyPool
KeyPool --> WeightedSelect[Weighted Random Selection]
WeightedSelect --> HealthCheck{Provider Healthy?}
HealthCheck -->|Yes| AssignWorker[Assign Worker]
HealthCheck -->|No| CircuitBreaker[Circuit Breaker]
CircuitBreaker --> FallbackCheck{Fallback Available?}
FallbackCheck -->|Yes| FallbackProvider[Try Fallback]
FallbackCheck -->|No| ErrorResponse[Return Error]
FallbackProvider --> KeyPool
```
**Key Selection Algorithm:**
```go
// Weighted random key selection
type KeySelector struct {
keys []APIKey
weights []float64
total float64
}
func (ks *KeySelector) SelectKey() *APIKey {
r := rand.Float64() * ks.total
cumulative := 0.0
for i, weight := range ks.weights {
cumulative += weight
if r <= cumulative {
return &ks.keys[i]
}
}
return &ks.keys[len(ks.keys)-1]
}
```
**Performance Metrics:**
- **Key Selection Time:** ~10ns (constant time)
- **Health Check Overhead:** ~50ns (cached results)
- **Fallback Decision:** ~25ns (configuration lookup)
---
## Stage 3: Plugin Pipeline Processing
### **Pre-Processing Hooks**
```mermaid
sequenceDiagram
participant Request
participant AuthPlugin
participant RateLimitPlugin
participant TransformPlugin
participant Core
Request->>AuthPlugin: ProcessRequest()
AuthPlugin->>AuthPlugin: Validate API Key
AuthPlugin->>RateLimitPlugin: Authorized Request
RateLimitPlugin->>RateLimitPlugin: Check Rate Limits
RateLimitPlugin->>TransformPlugin: Allowed Request
TransformPlugin->>TransformPlugin: Modify Request
TransformPlugin->>Core: Final Request
```
**Plugin Execution Model:**
```go
type PluginManager struct {
plugins []Plugin
}
func (pm *PluginManager) ExecutePreHooks(
ctx BifrostContext,
req *BifrostRequest,
) (*BifrostRequest, *BifrostError) {
for _, plugin := range pm.plugins {
modifiedReq, err := plugin.ProcessRequest(ctx, req)
if err != nil {
return nil, err
}
req = modifiedReq
}
return req, nil
}
```
**Plugin Types & Performance:**
| Plugin Type | Processing Time | Memory Impact | Failure Mode |
| --------------------- | --------------- | ------------- | ---------------------- |
| **Authentication** | ~1-5μs | Minimal | Reject request |
| **Rate Limiting** | ~500ns | Cache-based | Throttle/reject |
| **Request Transform** | ~2-10μs | Copy-on-write | Continue with original |
| **Monitoring** | ~200ns | Append-only | Continue silently |
---
## Stage 4: MCP Tool Discovery & Integration
### **Tool Discovery Process**
```mermaid
flowchart TD
Request[Request with Model] --> MCPCheck{MCP Enabled?}
MCPCheck -->|No| SkipMCP[Skip MCP Processing]
MCPCheck -->|Yes| ClientLookup[MCP Client Lookup]
ClientLookup --> ToolFilter[Tool Filtering]
ToolFilter --> ToolInject[Inject Tools into Request]
ToolFilter --> IncludeCheck{Include Filter?}
ToolFilter --> ExcludeCheck{Exclude Filter?}
IncludeCheck -->|Yes| IncludeTools[Include Specified Tools]
IncludeCheck -->|No| AllTools[Include All Tools]
ExcludeCheck -->|Yes| RemoveTools[Remove Excluded Tools]
ExcludeCheck -->|No| KeepFiltered[Keep Filtered Tools]
IncludeTools --> ToolInject
AllTools --> ToolInject
RemoveTools --> ToolInject
KeepFiltered --> ToolInject
ToolInject --> EnhancedRequest[Request with Tools]
SkipMCP --> EnhancedRequest
```
**Tool Integration Algorithm:**
```go
func (mcpm *MCPManager) EnhanceRequest(
ctx BifrostContext,
req *BifrostChatRequest,
) (*BifrostRequest, error) {
// Extract tool filtering from context
includeClients := ctx.GetStringSlice("mcp-include-clients")
includeTools := ctx.GetStringSlice("mcp-include-tools")
// Get available tools
availableTools := mcpm.getAvailableTools(includeClients)
// Filter tools
filteredTools := mcpm.filterTools(availableTools, includeTools)
// Inject into request
if req.Params == nil {
req.Params = &ChatParameters{}
}
req.Params.Tools = append(req.Params.Tools, filteredTools...)
return req, nil
}
```
**MCP Performance Impact:**
- **Tool Discovery:** ~100-500μs (cached after first request)
- **Tool Filtering:** ~50-200ns per tool
- **Request Enhancement:** ~1-5μs depending on tool count
---
## Stage 5: Memory Pool Management
### **Object Pool Lifecycle**
```mermaid
stateDiagram-v2
[*] --> PoolInit: System Startup
PoolInit --> Available: Objects Pre-allocated
Available --> Acquired: Request Processing
Acquired --> InUse: Object Populated
InUse --> Processing: Worker Processing
Processing --> Completed: Processing Done
Completed --> Reset: Object Cleanup
Reset --> Available: Return to Pool
Available --> Expansion: Pool Exhaustion
Expansion --> Available: New Objects Created
Reset --> GC: Pool Full
GC --> [*]: Garbage Collection
```
**Memory Pool Implementation:**
```go
type MemoryPools struct {
channelPool sync.Pool
messagePool sync.Pool
responsePool sync.Pool
bufferPool sync.Pool
}
func (mp *MemoryPools) GetChannel() *ProcessingChannel {
if ch := mp.channelPool.Get(); ch != nil {
return ch.(*ProcessingChannel)
}
return NewProcessingChannel()
}
func (mp *MemoryPools) ReturnChannel(ch *ProcessingChannel) {
ch.Reset() // Clear previous data
mp.channelPool.Put(ch)
}
```
---
## Stage 6: Worker Pool Processing
### **Worker Assignment & Execution**
```mermaid
sequenceDiagram
participant Queue
participant WorkerPool
participant Worker
participant Provider
participant Circuit
Queue->>WorkerPool: Enqueue Request
WorkerPool->>Worker: Assign Available Worker
Worker->>Circuit: Check Circuit Breaker
Circuit->>Provider: Forward Request
Provider-->>Circuit: Response/Error
Circuit->>Circuit: Update Health Metrics
Circuit-->>Worker: Provider Response
Worker-->>WorkerPool: Release Worker
WorkerPool-->>Queue: Request Completed
```
**Worker Pool Architecture:**
```go
type ProviderWorkerPool struct {
workers chan *Worker
queue chan *ProcessingJob
config WorkerPoolConfig
metrics *PoolMetrics
}
func (pwp *ProviderWorkerPool) ProcessRequest(job *ProcessingJob) {
// Get worker from pool
worker := <-pwp.workers
go func() {
defer func() {
// Return worker to pool
pwp.workers <- worker
}()
// Process request
result := worker.Execute(job)
job.ResultChan <- result
}()
}
```
---
## Stage 7: Provider API Communication
### **HTTP Request Execution**
```mermaid
sequenceDiagram
participant Worker
participant HTTPClient
participant Provider
participant CircuitBreaker
participant Metrics
Worker->>HTTPClient: PrepareRequest()
HTTPClient->>HTTPClient: Add Headers & Auth
HTTPClient->>CircuitBreaker: CheckHealth()
CircuitBreaker->>Provider: HTTP Request
Provider-->>CircuitBreaker: HTTP Response
CircuitBreaker->>Metrics: Record Metrics
CircuitBreaker-->>HTTPClient: Response/Error
HTTPClient-->>Worker: Parsed Response
```
**Request Preparation Pipeline:**
```go
func (w *ProviderWorker) ExecuteRequest(job *ProcessingJob) *ProviderResponse {
// Prepare HTTP request
httpReq := w.prepareHTTPRequest(job.Request)
// Add authentication
w.addAuthentication(httpReq, job.APIKey)
// Execute with timeout
ctx, cancel := context.WithTimeout(context.Background(), job.Timeout)
defer cancel()
httpResp, err := w.httpClient.Do(httpReq.WithContext(ctx))
if err != nil {
return w.handleError(err, job)
}
// Parse response
return w.parseResponse(httpResp, job)
}
```
---
## Stage 8: Tool Execution & Response Processing
### **MCP Tool Execution Flow**
```mermaid
sequenceDiagram
participant Provider
participant MCPProcessor
participant MCPServer
participant ToolExecutor
participant ResponseBuilder
Provider->>MCPProcessor: Response with Tool Calls
MCPProcessor->>MCPProcessor: Extract Tool Calls
loop For each tool call
MCPProcessor->>MCPServer: Execute Tool
MCPServer->>ToolExecutor: Tool Invocation
ToolExecutor-->>MCPServer: Tool Result
MCPServer-->>MCPProcessor: Tool Response
end
MCPProcessor->>ResponseBuilder: Combine Results
ResponseBuilder-->>Provider: Enhanced Response
```
**Tool Execution Pipeline:**
```go
func (mcp *MCPProcessor) ProcessToolCalls(
response *ProviderResponse,
) (*ProviderResponse, error) {
toolCalls := mcp.extractToolCalls(response)
if len(toolCalls) == 0 {
return response, nil
}
// Execute tools concurrently
results := make(chan ToolResult, len(toolCalls))
for _, toolCall := range toolCalls {
go func(tc ToolCall) {
result := mcp.executeTool(tc)
results <- result
}(toolCall)
}
// Collect results
toolResults := make([]ToolResult, 0, len(toolCalls))
for i := 0; i < len(toolCalls); i++ {
toolResults = append(toolResults, <-results)
}
// Enhance response
return mcp.enhanceResponse(response, toolResults), nil
}
```
---
## Stage 9: Post-Processing & Response Formation
### **Plugin Post-Processing**
```mermaid
sequenceDiagram
participant CoreResponse
participant LoggingPlugin
participant CachePlugin
participant MetricsPlugin
participant Transport
CoreResponse->>LoggingPlugin: ProcessResponse()
LoggingPlugin->>LoggingPlugin: Log Request/Response
LoggingPlugin->>CachePlugin: Response + Logs
CachePlugin->>CachePlugin: Cache Response
CachePlugin->>MetricsPlugin: Cached Response
MetricsPlugin->>MetricsPlugin: Record Metrics
MetricsPlugin->>Transport: Final Response
```
**Response Enhancement Pipeline:**
```go
func (pm *PluginManager) ExecutePostHooks(
ctx BifrostContext,
req *BifrostRequest,
resp *BifrostResponse,
) (*BifrostResponse, error) {
for _, plugin := range pm.plugins {
enhancedResp, err := plugin.ProcessResponse(ctx, req, resp)
if err != nil {
// Log error but continue processing
pm.logger.Warn("Plugin post-processing error", "plugin", plugin.Name(), "error", err)
continue
}
resp = enhancedResp
}
return resp, nil
}
```
### **Response Serialization**
```mermaid
flowchart TD
Response[BifrostResponse] --> Format{Response Format}
Format -->|HTTP| JSONSerialize[JSON Serialization]
Format -->|SDK| DirectReturn[Direct Go Struct]
JSONSerialize --> Compress[Compression]
DirectReturn --> TypeCheck[Type Validation]
Compress --> Headers[Set Headers]
TypeCheck --> Return[Return Response]
Headers --> HTTPResponse[HTTP Response]
HTTPResponse --> Client[Client Response]
Return --> Client
```
---
## Related Architecture Documentation
- **[Concurrency Model](./concurrency)** - Worker pools and threading details
- **[Plugin System](./plugins)** - Plugin execution and lifecycle
- **[MCP System](./mcp)** - Tool discovery and execution internals
- **[Benchmarks](../../benchmarking/getting-started)** - Detailed performance analysis