first commit
This commit is contained in:
764
docs/architecture/core/concurrency.mdx
Normal file
764
docs/architecture/core/concurrency.mdx
Normal file
@@ -0,0 +1,764 @@
|
||||
---
|
||||
title: "Concurrency"
|
||||
description: "Deep dive into Bifrost's advanced concurrency architecture - worker pools, goroutine management, channel-based communication, and resource isolation patterns."
|
||||
icon: "traffic-light"
|
||||
---
|
||||
|
||||
## Concurrency Philosophy
|
||||
|
||||
### **Core Principles**
|
||||
|
||||
| Principle | Implementation | Benefit |
|
||||
| ---------------------------------- | -------------------------------------- | -------------------------------------- |
|
||||
| **Provider Isolation** | Independent worker pools per provider | Fault tolerance, no cascade failures |
|
||||
| **Channel-Based Communication** | Go channels for all async operations | Type-safe, deadlock-free communication |
|
||||
| **Resource Pooling** | Object pools with lifecycle management | Predictable memory usage, minimal GC |
|
||||
| **Non-Blocking Operations** | Async processing throughout pipeline | Maximum concurrency, no blocking waits |
|
||||
| **Backpressure Handling** | Configurable buffers and flow control | Graceful degradation under load |
|
||||
|
||||
### **Threading Architecture Overview**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Main Thread"
|
||||
Main[Main Process<br/>HTTP Server]
|
||||
Router[Request Router<br/>Goroutine]
|
||||
PluginMgr[Plugin Manager<br/>Goroutine]
|
||||
end
|
||||
|
||||
subgraph "Provider Worker Pools"
|
||||
subgraph "OpenAI Pool"
|
||||
OAI1[Worker 1<br/>Goroutine]
|
||||
OAI2[Worker 2<br/>Goroutine]
|
||||
OAIN[Worker N<br/>Goroutine]
|
||||
end
|
||||
subgraph "Anthropic Pool"
|
||||
ANT1[Worker 1<br/>Goroutine]
|
||||
ANT2[Worker 2<br/>Goroutine]
|
||||
ANTN[Worker N<br/>Goroutine]
|
||||
end
|
||||
subgraph "Bedrock Pool"
|
||||
BED1[Worker 1<br/>Goroutine]
|
||||
BED2[Worker 2<br/>Goroutine]
|
||||
BEDN[Worker N<br/>Goroutine]
|
||||
end
|
||||
end
|
||||
|
||||
subgraph "Memory Pools"
|
||||
ChannelPool[Channel Pool<br/>sync.Pool]
|
||||
MessagePool[Message Pool<br/>sync.Pool]
|
||||
ResponsePool[Response Pool<br/>sync.Pool]
|
||||
end
|
||||
|
||||
Main --> Router
|
||||
Router --> PluginMgr
|
||||
PluginMgr --> OAI1
|
||||
PluginMgr --> ANT1
|
||||
PluginMgr --> BED1
|
||||
|
||||
OAI1 --> ChannelPool
|
||||
ANT1 --> MessagePool
|
||||
BED1 --> ResponsePool
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Worker Pool Architecture
|
||||
|
||||
### **Provider-Isolated Worker Pools**
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> PoolInit: Worker Pool Creation
|
||||
PoolInit --> WorkerSpawn: Spawn Worker Goroutines
|
||||
WorkerSpawn --> Listening: Workers Listen on Channels
|
||||
|
||||
Listening --> Processing: Job Received
|
||||
Processing --> API_Call: Provider API Request
|
||||
API_Call --> Response: Process Response
|
||||
Response --> Listening: Job Complete
|
||||
|
||||
Listening --> Shutdown: Graceful Shutdown
|
||||
Processing --> Shutdown: Complete Current Job
|
||||
Shutdown --> [*]: Pool Destroyed
|
||||
```
|
||||
|
||||
**Worker Pool Architecture:**
|
||||
|
||||
The worker pool system maintains a sophisticated balance between resource efficiency and performance isolation:
|
||||
|
||||
**Key Components:**
|
||||
|
||||
- **Worker Pool Management** - Pre-spawned workers reduce startup latency
|
||||
- **Job Queue System** - Buffered channels provide smooth load balancing
|
||||
- **Resource Pools** - HTTP clients and API keys are pooled for efficiency
|
||||
- **Health Monitoring** - Circuit breakers detect and isolate failing providers
|
||||
- **Graceful Shutdown** - Workers complete current jobs before terminating
|
||||
|
||||
**Startup Process:**
|
||||
|
||||
1. **Worker Pre-spawning** - Workers are created during pool initialization
|
||||
2. **Channel Setup** - Job queues and worker channels are established
|
||||
3. **Resource Allocation** - HTTP clients and API keys are distributed
|
||||
4. **Health Checks** - Initial connectivity tests verify provider availability
|
||||
5. **Ready State** - Pool becomes available for request processing
|
||||
|
||||
**Job Dispatch Logic:**
|
||||
|
||||
- **Round-Robin Assignment** - Jobs are distributed evenly across available workers
|
||||
- **Load Balancing** - Worker availability determines job assignment
|
||||
- **Overflow Handling** - Excess jobs are queued or dropped based on configuration
|
||||
|
||||
### **Worker Lifecycle Management**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Pool
|
||||
participant Worker
|
||||
participant HTTPClient
|
||||
participant Provider
|
||||
participant Metrics
|
||||
|
||||
Pool->>Worker: Start()
|
||||
Worker->>Worker: Initialize HTTP Client
|
||||
Worker->>Pool: Ready Signal
|
||||
|
||||
loop Job Processing
|
||||
Pool->>Worker: Job Assignment
|
||||
Worker->>HTTPClient: Prepare Request
|
||||
HTTPClient->>Provider: API Call
|
||||
Provider-->>HTTPClient: Response
|
||||
HTTPClient-->>Worker: Parsed Response
|
||||
Worker->>Metrics: Record Performance
|
||||
Worker->>Pool: Job Complete
|
||||
end
|
||||
|
||||
Pool->>Worker: Shutdown Signal
|
||||
Worker->>Worker: Complete Current Job
|
||||
Worker-->>Pool: Shutdown Confirmed
|
||||
````
|
||||
|
||||
---
|
||||
|
||||
## Channel-Based Communication
|
||||
|
||||
### **Channel Architecture**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Channel Types"
|
||||
JobQueue[Job Queue<br/>Buffered Channel]
|
||||
WorkerPool[Worker Pool<br/>Buffered Channel]
|
||||
ResultChan[Result Channel<br/>Buffered Channel]
|
||||
QuitChan[Quit Channel<br/>Unbuffered]
|
||||
end
|
||||
|
||||
subgraph "Flow Control"
|
||||
BackPressure[Backpressure<br/>Buffer Limits]
|
||||
Timeout[Timeout<br/>Context Cancellation]
|
||||
Graceful[Graceful Shutdown<br/>Channel Closing]
|
||||
end
|
||||
|
||||
JobQueue --> BackPressure
|
||||
WorkerPool --> Timeout
|
||||
ResultChan --> Graceful
|
||||
```
|
||||
|
||||
**Channel Configuration Principles:**
|
||||
|
||||
Bifrost's channel system balances throughput and memory usage through careful buffer sizing:
|
||||
|
||||
**Job Queuing Configuration:**
|
||||
|
||||
- **Job Queue Buffer** - Sized based on expected burst traffic (100-1000 jobs)
|
||||
- **Worker Pool Size** - Matches provider concurrency limits (10-100 workers)
|
||||
- **Result Buffer** - Accommodates response processing delays (50-500 responses)
|
||||
|
||||
**Flow Control Parameters:**
|
||||
|
||||
- **Queue Wait Limits** - Maximum time jobs wait before timeout (1-10 seconds)
|
||||
- **Processing Timeouts** - Per-job execution limits (30-300 seconds)
|
||||
- **Shutdown Timeouts** - Graceful termination periods (5-30 seconds)
|
||||
|
||||
**Backpressure Policies:**
|
||||
|
||||
- **Drop Policy** - Discard excess jobs when queues are full
|
||||
- **Block Policy** - Wait for queue space with timeout
|
||||
- **Error Policy** - Immediately return error for full queues
|
||||
|
||||
**Channel Type Selection:**
|
||||
|
||||
- **Buffered Channels** - Used for async job processing and result handling
|
||||
- **Unbuffered Channels** - Used for synchronization signals (quit, done)
|
||||
- **Context Cancellation** - Used for timeout and cancellation propagation
|
||||
|
||||
### **Backpressure and Flow Control**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Request[Incoming Request] --> QueueCheck{Queue Full?}
|
||||
QueueCheck -->|No| Queue[Add to Queue]
|
||||
QueueCheck -->|Yes| Policy{Drop Policy?}
|
||||
|
||||
Policy -->|Drop| Drop[Drop Request<br/>Return Error]
|
||||
Policy -->|Block| Block[Block Until Space<br/>With Timeout]
|
||||
Policy -->|Error| Error[Return Queue Full Error]
|
||||
|
||||
Queue --> Worker[Assign to Worker]
|
||||
Block --> TimeoutCheck{Timeout?}
|
||||
TimeoutCheck -->|Yes| Error
|
||||
TimeoutCheck -->|No| Queue
|
||||
|
||||
Worker --> Processing[Process Request]
|
||||
Processing --> Complete[Complete]
|
||||
|
||||
Drop --> Client[Client Response]
|
||||
Error --> Client
|
||||
Complete --> Client
|
||||
````
|
||||
|
||||
**Backpressure Implementation Strategy:**
|
||||
|
||||
The backpressure system protects Bifrost from being overwhelmed while maintaining service availability:
|
||||
|
||||
**Non-Blocking Job Submission:**
|
||||
|
||||
- **Immediate Queue Check** - Jobs are submitted without blocking on queue space
|
||||
- **Success Path** - Available queue space allows immediate job acceptance
|
||||
- **Overflow Detection** - Full queues trigger backpressure policies
|
||||
- **Metrics Collection** - All queue operations are tracked for monitoring
|
||||
|
||||
**Backpressure Policy Execution:**
|
||||
|
||||
- **Drop Policy** - Immediately rejects excess jobs with meaningful error messages
|
||||
- **Block Policy** - Waits for queue space with configurable timeout limits
|
||||
- **Error Policy** - Returns queue full errors for immediate client feedback
|
||||
- **Metrics Tracking** - Dropped, blocked, and successful submissions are measured
|
||||
|
||||
**Timeout Management:**
|
||||
|
||||
- **Context-Based Timeouts** - All blocking operations respect timeout boundaries
|
||||
- **Graceful Degradation** - Timeouts result in controlled error responses
|
||||
- **Resource Protection** - Prevents goroutine leaks from infinite waits
|
||||
|
||||
```go
|
||||
case pool.jobQueue <- job:
|
||||
pool.metrics.IncQueuedJobs()
|
||||
return nil
|
||||
case <-ctx.Done():
|
||||
pool.metrics.IncTimeoutJobs()
|
||||
return errors.New("queue full, timeout waiting")
|
||||
}
|
||||
|
||||
case "error":
|
||||
pool.metrics.IncRejectedJobs()
|
||||
return errors.New("queue full, job rejected")
|
||||
|
||||
default:
|
||||
return errors.New("unknown queue policy")
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Memory Pool Concurrency
|
||||
|
||||
### **Thread-Safe Object Pools**
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "sync.Pool Lifecycle"
|
||||
direction LR
|
||||
GetObject[Get Object<br/>sync.Pool.Get]
|
||||
PoolCheck{Is Pool Empty?}
|
||||
NewObject[New Object<br/>Factory Function]
|
||||
UseObject[Use Object<br/>Application Logic]
|
||||
ResetObject[Reset Object<br/>Clear State]
|
||||
ReturnObject[Return Object<br/>sync.Pool.Put]
|
||||
|
||||
GetObject --> PoolCheck
|
||||
PoolCheck -- Yes --> NewObject
|
||||
PoolCheck -- No --> UseObject
|
||||
NewObject --> UseObject
|
||||
UseObject --> ResetObject
|
||||
ResetObject --> ReturnObject
|
||||
ReturnObject --> GetObject
|
||||
end
|
||||
|
||||
subgraph "GC Interaction"
|
||||
direction TB
|
||||
GCRun[GC Runs]
|
||||
PoolCleanup[Pool Cleanup<br>Removes idle objects]
|
||||
|
||||
GCRun --> PoolCleanup
|
||||
end
|
||||
```
|
||||
|
||||
**Thread-Safe Pool Architecture:**
|
||||
|
||||
Bifrost's memory pool system ensures thread-safe object reuse across multiple goroutines:
|
||||
|
||||
**Pool Structure Design:**
|
||||
|
||||
- **Multiple Pool Types** - Separate pools for channels, messages, responses, and buffers
|
||||
- **Factory Functions** - Dynamic object creation when pools are empty
|
||||
- **Statistics Tracking** - Comprehensive metrics for pool performance monitoring
|
||||
- **Thread Safety** - Synchronized access using Go's sync.Pool and read-write mutexes
|
||||
|
||||
**Object Lifecycle Management:**
|
||||
|
||||
- **Pool Initialization** - Factory functions define object creation patterns
|
||||
- **Unique Identification** - Each pooled object gets a unique ID for tracking
|
||||
- **Timestamp Tracking** - Creation, acquisition, and return times are recorded
|
||||
- **Reusability Flags** - Objects can be marked as non-reusable for single-use scenarios
|
||||
|
||||
**Acquisition Strategy:**
|
||||
|
||||
- **Request Tracking** - All pool requests are counted for monitoring
|
||||
- **Hit/Miss Tracking** - Pool effectiveness is measured through hit ratios
|
||||
- **Fallback Creation** - New objects are created when pools are empty
|
||||
- **Performance Metrics** - Acquisition times and patterns are monitored
|
||||
|
||||
**Return and Reset Process:**
|
||||
|
||||
- **State Validation** - Only reusable objects are returned to pools
|
||||
- **Object Reset** - All object state is cleared before returning to pool
|
||||
- **Return Tracking** - Return operations are counted and timed
|
||||
- **Pool Replenishment** - Returned objects become available for reuse
|
||||
|
||||
### **Pool Performance Monitoring**
|
||||
|
||||
Comprehensive metrics provide insights into pool efficiency and system health:
|
||||
|
||||
**Usage Statistics Collection:**
|
||||
- **Request Counting** - Track total pool requests by object type
|
||||
- **Creation Tracking** - Monitor new object allocations when pools are empty
|
||||
- **Hit/Miss Ratios** - Measure pool effectiveness through reuse rates
|
||||
- **Return Monitoring** - Track successful object returns to pools
|
||||
|
||||
**Performance Metrics Analysis:**
|
||||
- **Acquisition Times** - Measure how long it takes to get objects from pools
|
||||
- **Reset Performance** - Track time spent cleaning objects for reuse
|
||||
- **Hit Ratio Calculation** - Determine percentage of requests served from pools
|
||||
- **Memory Efficiency** - Calculate memory savings from object reuse
|
||||
|
||||
**Key Performance Indicators:**
|
||||
- **Channel Pool Hit Ratio** - Typically 85-95% in steady state
|
||||
- **Message Pool Efficiency** - Usually 80-90% reuse rate
|
||||
- **Response Pool Utilization** - Often 70-85% hit ratio
|
||||
- **Total Memory Savings** - Measured reduction in garbage collection pressure
|
||||
|
||||
**Monitoring Integration:**
|
||||
- **Thread-Safe Access** - All metrics collection is synchronized
|
||||
- **Real-Time Updates** - Statistics are updated with each pool operation
|
||||
- **Export Capability** - Metrics are available in JSON format for monitoring systems
|
||||
- **Alerting Support** - Low hit ratios can trigger performance alerts
|
||||
|
||||
---
|
||||
|
||||
## Goroutine Management
|
||||
|
||||
### **Goroutine Lifecycle Patterns**
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Created: go routine()
|
||||
Created --> Running: Execute Function
|
||||
Running --> Waiting: Channel/Mutex Block
|
||||
Waiting --> Running: Unblocked
|
||||
Running --> Syscall: Network I/O
|
||||
Syscall --> Running: I/O Complete
|
||||
Running --> GCAssist: GC Triggered
|
||||
GCAssist --> Running: GC Complete
|
||||
Running --> Terminated: Function Exit
|
||||
Terminated --> [*]: Cleanup
|
||||
```
|
||||
|
||||
**Goroutine Pool Management Strategy:**
|
||||
|
||||
Bifrost's goroutine management ensures optimal resource usage while preventing goroutine leaks:
|
||||
|
||||
**Pool Configuration Management:**
|
||||
|
||||
- **Goroutine Limits** - Maximum concurrent goroutines prevent resource exhaustion
|
||||
- **Active Counting** - Atomic counters track currently running goroutines
|
||||
- **Idle Timeouts** - Unused goroutines are cleaned up after configured periods
|
||||
- **Resource Boundaries** - Hard limits prevent runaway goroutine creation
|
||||
|
||||
**Lifecycle Orchestration:**
|
||||
|
||||
- **Spawn Channels** - New goroutine creation is tracked through channels
|
||||
- **Completion Monitoring** - Finished goroutines signal completion for cleanup
|
||||
- **Shutdown Coordination** - Graceful shutdown ensures all goroutines complete properly
|
||||
- **Health Monitoring** - Continuous monitoring tracks goroutine health and performance
|
||||
|
||||
**Worker Creation Process:**
|
||||
|
||||
- **Limit Enforcement** - Creation fails when maximum goroutine count is reached
|
||||
- **Unique Identification** - Each goroutine gets a unique ID for tracking and debugging
|
||||
- **Lifecycle Tracking** - Start times and names enable performance analysis
|
||||
- **Atomic Operations** - Thread-safe counters prevent race conditions
|
||||
|
||||
**Panic Recovery and Error Handling:**
|
||||
|
||||
- **Panic Isolation** - Goroutine panics don't crash the entire system
|
||||
- **Error Logging** - Panic details are logged with goroutine context
|
||||
- **Metrics Updates** - Panic counts are tracked for monitoring and alerting
|
||||
- **Resource Cleanup** - Failed goroutines are properly cleaned up and counted
|
||||
|
||||
**Health Monitoring System:**
|
||||
|
||||
- **Periodic Health Checks** - Regular intervals check goroutine pool health
|
||||
- **Completion Tracking** - Finished goroutines are recorded for performance analysis
|
||||
- **Shutdown Handling** - Clean shutdown process ensures no goroutine leaks
|
||||
|
||||
### **Resource Leak Prevention**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
GoroutineStart[Goroutine Start] --> ResourceCheck[Resource Allocation Check]
|
||||
ResourceCheck --> Timeout[Set Timeout Context]
|
||||
Timeout --> Work[Execute Work]
|
||||
|
||||
Work --> Complete{Work Complete?}
|
||||
Complete -->|Yes| Cleanup[Cleanup Resources]
|
||||
Complete -->|No| TimeoutCheck{Timeout?}
|
||||
|
||||
TimeoutCheck -->|Yes| ForceCleanup[Force Cleanup]
|
||||
TimeoutCheck -->|No| Work
|
||||
|
||||
Cleanup --> Return[Return Resources to Pool]
|
||||
ForceCleanup --> Return
|
||||
Return --> End[Goroutine End]
|
||||
````
|
||||
|
||||
**Resource Leak Prevention:**
|
||||
|
||||
```go
|
||||
func (worker *Worker) ExecuteWithCleanup(job *Job) {
|
||||
// Set timeout context
|
||||
ctx, cancel := context.WithTimeout(
|
||||
context.Background(),
|
||||
worker.config.ProcessTimeout,
|
||||
)
|
||||
defer cancel()
|
||||
|
||||
// Acquire resources with timeout
|
||||
resources, err := worker.acquireResources(ctx)
|
||||
if err != nil {
|
||||
job.resultChan <- &Result{Error: err}
|
||||
return
|
||||
}
|
||||
|
||||
// Ensure cleanup happens
|
||||
defer func() {
|
||||
// Always return resources
|
||||
worker.returnResources(resources)
|
||||
|
||||
// Handle panics
|
||||
if r := recover(); r != nil {
|
||||
worker.metrics.IncPanics()
|
||||
job.resultChan <- &Result{
|
||||
Error: fmt.Errorf("worker panic: %v", r),
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
// Execute job with context
|
||||
result := worker.processJob(ctx, job, resources)
|
||||
|
||||
// Return result
|
||||
select {
|
||||
case job.resultChan <- result:
|
||||
// Success
|
||||
case <-ctx.Done():
|
||||
// Timeout - result channel might be closed
|
||||
worker.metrics.IncTimeouts()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Concurrency Optimization Strategies
|
||||
|
||||
### **Load-Based Worker Scaling** (Planned)
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Load Monitoring"
|
||||
QueueDepth[Queue Depth<br/>Monitoring]
|
||||
ResponseTime[Response Time<br/>Tracking]
|
||||
WorkerUtil[Worker Utilization<br/>Metrics]
|
||||
end
|
||||
|
||||
subgraph "Scaling Decisions"
|
||||
ScaleUp{Scale Up?<br/>Load > 80%}
|
||||
ScaleDown{Scale Down?<br/>Load < 30%}
|
||||
Maintain[Maintain<br/>Current Size]
|
||||
end
|
||||
|
||||
subgraph "Actions"
|
||||
AddWorkers[Spawn Additional<br/>Workers]
|
||||
RemoveWorkers[Graceful Worker<br/>Shutdown]
|
||||
NoAction[No Action<br/>Monitor Continue]
|
||||
end
|
||||
|
||||
QueueDepth --> ScaleUp
|
||||
ResponseTime --> ScaleUp
|
||||
WorkerUtil --> ScaleDown
|
||||
|
||||
ScaleUp -->|Yes| AddWorkers
|
||||
ScaleUp -->|No| ScaleDown
|
||||
ScaleDown -->|Yes| RemoveWorkers
|
||||
ScaleDown -->|No| Maintain
|
||||
|
||||
Maintain --> NoAction
|
||||
```
|
||||
|
||||
**Adaptive Scaling Implementation:**
|
||||
|
||||
```go
|
||||
type AdaptiveScaler struct {
|
||||
pool *ProviderWorkerPool
|
||||
config ScalingConfig
|
||||
metrics *ScalingMetrics
|
||||
lastScaleTime time.Time
|
||||
scalingMutex sync.Mutex
|
||||
}
|
||||
|
||||
func (scaler *AdaptiveScaler) EvaluateScaling() {
|
||||
scaler.scalingMutex.Lock()
|
||||
defer scaler.scalingMutex.Unlock()
|
||||
|
||||
// Prevent frequent scaling
|
||||
if time.Since(scaler.lastScaleTime) < scaler.config.MinScaleInterval {
|
||||
return
|
||||
}
|
||||
|
||||
current := scaler.getCurrentMetrics()
|
||||
|
||||
// Scale up conditions
|
||||
if current.QueueUtilization > scaler.config.ScaleUpThreshold ||
|
||||
current.AvgResponseTime > scaler.config.MaxResponseTime {
|
||||
|
||||
scaler.scaleUp(current)
|
||||
return
|
||||
}
|
||||
|
||||
// Scale down conditions
|
||||
if current.QueueUtilization < scaler.config.ScaleDownThreshold &&
|
||||
current.AvgResponseTime < scaler.config.TargetResponseTime {
|
||||
|
||||
scaler.scaleDown(current)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
func (scaler *AdaptiveScaler) scaleUp(metrics *CurrentMetrics) {
|
||||
currentWorkers := scaler.pool.GetWorkerCount()
|
||||
targetWorkers := int(float64(currentWorkers) * scaler.config.ScaleUpFactor)
|
||||
|
||||
// Respect maximum limits
|
||||
if targetWorkers > scaler.config.MaxWorkers {
|
||||
targetWorkers = scaler.config.MaxWorkers
|
||||
}
|
||||
|
||||
additionalWorkers := targetWorkers - currentWorkers
|
||||
if additionalWorkers > 0 {
|
||||
scaler.pool.AddWorkers(additionalWorkers)
|
||||
scaler.lastScaleTime = time.Now()
|
||||
scaler.metrics.RecordScaleUp(additionalWorkers)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### **Provider-Specific Optimization**
|
||||
|
||||
```go
|
||||
type ProviderOptimization struct {
|
||||
// Provider characteristics
|
||||
ProviderName string `json:"provider_name"`
|
||||
RateLimit int `json:"rate_limit"` // Requests per second
|
||||
AvgLatency time.Duration `json:"avg_latency"` // Average response time
|
||||
ErrorRate float64 `json:"error_rate"` // Historical error rate
|
||||
|
||||
// Optimal configuration
|
||||
OptimalWorkers int `json:"optimal_workers"`
|
||||
OptimalBuffer int `json:"optimal_buffer"`
|
||||
TimeoutConfig time.Duration `json:"timeout_config"`
|
||||
RetryStrategy RetryConfig `json:"retry_strategy"`
|
||||
}
|
||||
|
||||
func CalculateOptimalConcurrency(provider ProviderOptimization) ConcurrencyConfig {
|
||||
// Calculate based on rate limits and latency
|
||||
optimalWorkers := provider.RateLimit * int(provider.AvgLatency.Seconds())
|
||||
|
||||
// Adjust for error rate (more workers for higher error rate)
|
||||
errorAdjustment := 1.0 + provider.ErrorRate
|
||||
optimalWorkers = int(float64(optimalWorkers) * errorAdjustment)
|
||||
|
||||
// Buffer should be 2-3x worker count for smooth operation
|
||||
optimalBuffer := optimalWorkers * 3
|
||||
|
||||
return ConcurrencyConfig{
|
||||
Concurrency: optimalWorkers,
|
||||
BufferSize: optimalBuffer,
|
||||
Timeout: provider.AvgLatency * 2, // 2x avg latency for timeout
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Concurrency Monitoring & Metrics
|
||||
|
||||
### **Key Concurrency Metrics**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Worker Metrics"
|
||||
ActiveWorkers[Active Workers<br/>Current Count]
|
||||
IdleWorkers[Idle Workers<br/>Available Count]
|
||||
BusyWorkers[Busy Workers<br/>Processing Count]
|
||||
end
|
||||
|
||||
subgraph "Queue Metrics"
|
||||
QueueDepth[Queue Depth<br/>Pending Jobs]
|
||||
QueueThroughput[Queue Throughput<br/>Jobs/Second]
|
||||
QueueWaitTime[Queue Wait Time<br/>Average Delay]
|
||||
end
|
||||
|
||||
subgraph "Performance Metrics"
|
||||
GoroutineCount[Goroutine Count<br/>Total Active]
|
||||
MemoryUsage[Memory Usage<br/>Pool Utilization]
|
||||
GCPressure[GC Pressure<br/>Collection Frequency]
|
||||
end
|
||||
|
||||
subgraph "Health Metrics"
|
||||
ErrorRate[Error Rate<br/>Failed Jobs %]
|
||||
PanicCount[Panic Count<br/>Crashed Goroutines]
|
||||
DeadlockDetection[Deadlock Detection<br/>Blocked Operations]
|
||||
end
|
||||
```
|
||||
|
||||
**Metrics Collection Strategy:**
|
||||
|
||||
Comprehensive concurrency monitoring provides operational insights and performance optimization data:
|
||||
|
||||
**Worker Pool Monitoring:**
|
||||
|
||||
- **Total Worker Tracking** - Monitor configured vs actual worker counts
|
||||
- **Active Worker Monitoring** - Track workers currently processing requests
|
||||
- **Idle Worker Analysis** - Identify unused capacity and optimization opportunities
|
||||
- **Queue Depth Monitoring** - Track pending job backlog and processing delays
|
||||
|
||||
**Performance Data Collection:**
|
||||
|
||||
- **Throughput Metrics** - Measure jobs processed per second across all pools
|
||||
- **Wait Time Analysis** - Track how long jobs wait in queues before processing
|
||||
- **Memory Pool Performance** - Monitor hit/miss ratios for memory pool effectiveness
|
||||
- **Goroutine Count Tracking** - Ensure goroutine counts remain within healthy limits
|
||||
|
||||
**Health and Reliability Metrics:**
|
||||
|
||||
- **Panic Recovery Tracking** - Count and analyze worker panic occurrences
|
||||
- **Timeout Monitoring** - Track jobs that exceed processing time limits
|
||||
- **Circuit Breaker Events** - Monitor provider isolation events and recoveries
|
||||
- **Error Rate Analysis** - Track failure patterns for capacity planning
|
||||
|
||||
**Real-Time Updates:**
|
||||
|
||||
- **Live Metric Updates** - Worker metrics are updated continuously during operation
|
||||
- **Processing Event Recording** - Each job completion updates relevant metrics
|
||||
- **Performance Correlation** - Queue times and processing times are correlated for analysis
|
||||
- **Success/Failure Tracking** - All job outcomes are recorded for reliability analysis
|
||||
|
||||
---
|
||||
|
||||
## Deadlock Prevention & Detection
|
||||
|
||||
### **Deadlock Prevention Strategies**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Strategy1[Lock Ordering<br/>Consistent Acquisition]
|
||||
Strategy2[Timeout-Based Locks<br/>Context Cancellation]
|
||||
Strategy3[Channel Select<br/>Non-blocking Operations]
|
||||
Strategy4[Resource Hierarchy<br/>Layered Locking]
|
||||
|
||||
Prevention[Deadlock Prevention<br/>Design Patterns]
|
||||
|
||||
Prevention --> Strategy1
|
||||
Prevention --> Strategy2
|
||||
Prevention --> Strategy3
|
||||
Prevention --> Strategy4
|
||||
|
||||
Strategy1 --> Success[No Deadlocks<br/>Guaranteed Order]
|
||||
Strategy2 --> Success
|
||||
Strategy3 --> Success
|
||||
Strategy4 --> Success
|
||||
````
|
||||
|
||||
**Deadlock Prevention Implementation Strategy:**
|
||||
|
||||
Bifrost employs multiple complementary strategies to prevent deadlocks in concurrent operations:
|
||||
|
||||
**Lock Ordering Management:**
|
||||
|
||||
- **Consistent Acquisition Order** - All locks are acquired in a predetermined order
|
||||
- **Global Lock Registry** - Centralized registry maintains lock ordering relationships
|
||||
- **Order Enforcement** - Lock acquisition automatically sorts by predetermined order
|
||||
- **Dependency Tracking** - Lock dependencies are mapped to prevent circular waits
|
||||
|
||||
**Timeout-Based Protection:**
|
||||
|
||||
- **Default Timeouts** - All lock acquisitions have reasonable timeout limits
|
||||
- **Context Cancellation** - Operations respect context cancellation for cleanup
|
||||
- **Maximum Timeout Limits** - Upper bounds prevent indefinite blocking
|
||||
- **Graceful Timeout Handling** - Timeout errors provide meaningful context
|
||||
|
||||
**Multi-Lock Acquisition Process:**
|
||||
|
||||
- **Ordered Sorting** - Multiple locks are sorted before acquisition attempts
|
||||
- **Progressive Acquisition** - Locks are acquired one by one in sorted order
|
||||
- **Failure Recovery** - Failed acquisitions trigger automatic cleanup of held locks
|
||||
- **Resource Tracking** - All acquired locks are tracked for proper release
|
||||
|
||||
**Lock Acquisition Safety:**
|
||||
|
||||
- **Non-Blocking Detection** - Channel-based lock attempts prevent indefinite blocking
|
||||
- **Timeout Enforcement** - All lock attempts respect configured timeout limits
|
||||
- **Error Propagation** - Lock failures are properly propagated with context
|
||||
- **Cleanup Guarantees** - Failed operations always clean up partially acquired resources
|
||||
|
||||
**Deadlock Detection and Recovery:**
|
||||
|
||||
- **Active Monitoring** - Continuous monitoring for potential deadlock conditions
|
||||
- **Automatic Recovery** - Detected deadlocks trigger automatic resolution procedures
|
||||
- **Resource Release** - Deadlock resolution involves strategic resource release
|
||||
- **Prevention Learning** - Deadlock patterns inform prevention strategy improvements
|
||||
|
||||
---
|
||||
|
||||
## Related Architecture Documentation
|
||||
|
||||
- **[Request Flow](./request-flow)** - How concurrency fits in request processing
|
||||
- **[Benchmarks](../../benchmarking/getting-started)** - Concurrency performance characteristics
|
||||
- **[Plugin System](./plugins)** - Plugin concurrency considerations
|
||||
- **[MCP System](./mcp)** - MCP concurrency and worker integration
|
||||
|
||||
## Usage Documentation
|
||||
|
||||
- **[Provider Configuration](../../quickstart/gateway/provider-configuration)** - Configure concurrency settings per provider
|
||||
- **[Performance Analysis](../../benchmarking/getting-started)** - Memory pool configuration and optimization
|
||||
- **[Performance Monitoring](../../features/telemetry)** - Monitor concurrency metrics and health
|
||||
- **[Go SDK Usage](../../quickstart/go-sdk/setting-up)** - Use Bifrost concurrency in Go applications
|
||||
- **[Gateway Setup](../../quickstart/gateway/setting-up)** - Deploy Bifrost with optimal concurrency settings
|
||||
|
||||
---
|
||||
|
||||
**🎯 Next Step:** Understand how plugins integrate with the concurrency model in **[Plugin System](./plugins)**.
|
||||
```
|
||||
985
docs/architecture/core/mcp.mdx
Normal file
985
docs/architecture/core/mcp.mdx
Normal file
@@ -0,0 +1,985 @@
|
||||
---
|
||||
title: "Model Context Protocol (MCP)"
|
||||
description: "Deep dive into Bifrost's Model Context Protocol (MCP) integration - how external tool discovery, execution, and integration work internally."
|
||||
icon: "toolbox"
|
||||
---
|
||||
|
||||
## MCP Architecture Overview
|
||||
|
||||
### **What is MCP in Bifrost?**
|
||||
|
||||
The Model Context Protocol (MCP) system in Bifrost enables AI models to seamlessly discover and execute external tools, transforming static chat models into dynamic, action-capable agents. This architecture bridges the gap between AI reasoning and real-world tool execution.
|
||||
|
||||
**Core MCP Principles:**
|
||||
|
||||
- **Dynamic Discovery** - Tools are discovered at runtime, not hardcoded
|
||||
- **Client-Side Execution** - Bifrost controls all tool execution for security
|
||||
- **Multi-Protocol Support** - STDIO, HTTP, and SSE connection types
|
||||
- **Request-Level Filtering** - Granular control over tool availability
|
||||
- **Async Execution** - Non-blocking tool invocation and response handling
|
||||
|
||||
### **MCP System Components**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "MCP Management Layer"
|
||||
MCPMgr[MCP Manager<br/>Central Controller]
|
||||
ClientRegistry[Client Registry<br/>Connection Management]
|
||||
ToolDiscovery[Tool Discovery<br/>Runtime Registration]
|
||||
end
|
||||
|
||||
subgraph "MCP Execution Layer"
|
||||
ToolFilter[Tool Filter<br/>Access Control]
|
||||
ToolExecutor[Tool Executor<br/>Invocation Engine]
|
||||
ResultProcessor[Result Processor<br/>Response Handling]
|
||||
end
|
||||
|
||||
subgraph "Connection Types"
|
||||
STDIOConn[STDIO Connections<br/>Command-line Tools]
|
||||
HTTPConn[HTTP Connections<br/>Web Services]
|
||||
SSEConn[SSE Connections<br/>Real-time Streams]
|
||||
end
|
||||
|
||||
subgraph "External MCP Servers"
|
||||
FileSystem[Filesystem Tools<br/>File Operations]
|
||||
WebSearch[Web Search<br/>Information Retrieval]
|
||||
Database[Database Tools<br/>Data Access]
|
||||
Custom[Custom Tools<br/>Business Logic]
|
||||
end
|
||||
|
||||
MCPMgr --> ClientRegistry
|
||||
ClientRegistry --> ToolDiscovery
|
||||
ToolDiscovery --> ToolFilter
|
||||
ToolFilter --> ToolExecutor
|
||||
ToolExecutor --> ResultProcessor
|
||||
|
||||
ClientRegistry --> STDIOConn
|
||||
ClientRegistry --> HTTPConn
|
||||
ClientRegistry --> SSEConn
|
||||
|
||||
STDIOConn --> FileSystem
|
||||
HTTPConn --> WebSearch
|
||||
HTTPConn --> Database
|
||||
STDIOConn --> Custom
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MCP Connection Architecture
|
||||
|
||||
### **Multi-Protocol Connection System**
|
||||
|
||||
Bifrost supports four MCP connection types, each optimized for different tool deployment patterns:
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "InProcess Connections"
|
||||
InProcess[In-Memory Tools<br/>Same Process]
|
||||
InProcessEx[Examples:<br/>• Embedded tools<br/>• High-perf operations<br/>• Testing tools]
|
||||
end
|
||||
|
||||
subgraph "STDIO Connections"
|
||||
STDIO[Command Line Tools<br/>Local Execution]
|
||||
STDIOEx[Examples:<br/>• Filesystem tools<br/>• Local scripts<br/>• CLI utilities]
|
||||
end
|
||||
|
||||
subgraph "HTTP Connections"
|
||||
HTTP[Web Service Tools<br/>Remote APIs]
|
||||
HTTPEx[Examples:<br/>• Web search APIs<br/>• Database services<br/>• External integrations]
|
||||
end
|
||||
|
||||
subgraph "SSE Connections"
|
||||
SSE[Real-time Tools<br/>Streaming Data]
|
||||
SSEEx[Examples:<br/>• Live data feeds<br/>• Real-time monitoring<br/>• Event streams]
|
||||
end
|
||||
|
||||
subgraph "Connection Characteristics"
|
||||
Latency[Latency:<br/>InProcess < STDIO < HTTP < SSE]
|
||||
Security[Security:<br/>InProcess/Local > HTTP > SSE]
|
||||
Scalability[Scalability:<br/>HTTP > SSE > STDIO > InProcess]
|
||||
Complexity[Complexity:<br/>InProcess < STDIO < HTTP < SSE]
|
||||
end
|
||||
|
||||
InProcess --> Latency
|
||||
STDIO --> Latency
|
||||
HTTP --> Security
|
||||
SSE --> Scalability
|
||||
HTTP --> Complexity
|
||||
```
|
||||
|
||||
### **Connection Type Details**
|
||||
|
||||
**InProcess Connections (In-Memory Tools):**
|
||||
|
||||
- **Use Case:** Embedded tools, high-performance operations, testing
|
||||
- **Performance:** Lowest possible latency (~0.1ms) with no IPC overhead
|
||||
- **Security:** Highest security as tools run in the same process
|
||||
- **Limitations:** Go package only, cannot be configured via JSON
|
||||
|
||||
**STDIO Connections (Local Tools):**
|
||||
|
||||
- **Use Case:** Command-line tools, local scripts, filesystem operations
|
||||
- **Performance:** Low latency (~1-10ms) due to local execution
|
||||
- **Security:** High security with full local control
|
||||
- **Limitations:** Single-server deployment, resource sharing
|
||||
|
||||
**HTTP Connections (Remote Services):**
|
||||
|
||||
- **Use Case:** Web APIs, microservices, cloud functions
|
||||
- **Performance:** Network-dependent latency (~10-500ms)
|
||||
- **Security:** Configurable with authentication and encryption
|
||||
- **Advantages:** Scalable, multi-server deployment, service isolation
|
||||
|
||||
**SSE Connections (Streaming Tools):**
|
||||
|
||||
- **Use Case:** Real-time data feeds, live monitoring, event streams
|
||||
- **Performance:** Variable latency depending on stream frequency
|
||||
- **Security:** Similar to HTTP with streaming capabilities
|
||||
- **Benefits:** Real-time updates, persistent connections, event-driven
|
||||
|
||||
> **MCP Configuration:** [MCP Setup Guide →](../../mcp/overview)
|
||||
|
||||
---
|
||||
|
||||
## Tool Discovery & Registration
|
||||
|
||||
### **Dynamic Tool Discovery Process**
|
||||
|
||||
The MCP system discovers tools at runtime rather than requiring static configuration, enabling flexible and adaptive tool availability:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Bifrost
|
||||
participant MCPManager
|
||||
participant MCPServer
|
||||
participant ToolRegistry
|
||||
participant AIModel
|
||||
|
||||
Note over Bifrost: System Startup
|
||||
Bifrost->>MCPManager: Initialize MCP System
|
||||
MCPManager->>MCPServer: Establish Connection
|
||||
MCPServer-->>MCPManager: Connection Ready
|
||||
|
||||
MCPManager->>MCPServer: List Available Tools
|
||||
MCPServer-->>MCPManager: Tool Definitions
|
||||
MCPManager->>ToolRegistry: Register Tools
|
||||
|
||||
Note over Bifrost: Runtime Request Processing
|
||||
AIModel->>MCPManager: Request Available Tools
|
||||
MCPManager->>ToolRegistry: Query Tools
|
||||
ToolRegistry-->>MCPManager: Filtered Tool List
|
||||
MCPManager-->>AIModel: Available Tools
|
||||
|
||||
AIModel->>MCPManager: Execute Tool Call
|
||||
MCPManager->>MCPServer: Tool Invocation
|
||||
MCPServer->>MCPServer: Execute Tool Logic
|
||||
MCPServer-->>MCPManager: Tool Result
|
||||
MCPManager-->>AIModel: Enhanced Response
|
||||
```
|
||||
|
||||
### **Tool Registry Management**
|
||||
|
||||
**Registration Process:**
|
||||
|
||||
1. **Connection Establishment** - MCP client connects to configured servers
|
||||
2. **Capability Exchange** - Server announces available tools and schemas
|
||||
3. **Tool Validation** - Bifrost validates tool definitions and security
|
||||
4. **Registry Update** - Tools are registered in the internal tool registry
|
||||
5. **Availability Notification** - Tools become available for AI model use
|
||||
|
||||
**Registry Features:**
|
||||
|
||||
- **Dynamic Updates** - Tools can be added/removed during runtime
|
||||
- **Version Management** - Support for tool versioning and compatibility
|
||||
- **Access Control** - Request-level tool filtering and permissions
|
||||
- **Health Monitoring** - Continuous tool availability checking
|
||||
|
||||
**Tool Metadata Structure:**
|
||||
|
||||
- **Name & Description** - Human-readable tool identification
|
||||
- **Parameters Schema** - JSON schema for tool input validation
|
||||
- **Return Schema** - Expected response format definition
|
||||
- **Capabilities** - Tool feature flags and limitations
|
||||
- **Authentication** - Required credentials and permissions
|
||||
|
||||
---
|
||||
|
||||
## Tool Filtering & Access Control
|
||||
|
||||
### **Multi-Level Filtering System**
|
||||
|
||||
Bifrost provides granular control over tool availability through a sophisticated filtering system:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Request[Incoming Request] --> GlobalFilter{Global MCP Filter}
|
||||
GlobalFilter -->|Enabled| ClientFilter[MCP Client Filtering]
|
||||
GlobalFilter -->|Disabled| NoMCP[No MCP Tools]
|
||||
|
||||
ClientFilter --> IncludeClients{Include Clients?}
|
||||
IncludeClients -->|Yes| IncludeList[Include Specified<br/>MCP Clients]
|
||||
IncludeClients -->|No| AllClients[All MCP Clients]
|
||||
|
||||
IncludeList --> ExcludeClients{Exclude Clients?}
|
||||
AllClients --> ExcludeClients
|
||||
ExcludeClients -->|Yes| RemoveClients[Remove Excluded<br/>MCP Clients]
|
||||
ExcludeClients -->|No| ClientsFiltered[Filtered Clients]
|
||||
|
||||
RemoveClients --> ToolFilter[Tool-Level Filtering]
|
||||
ClientsFiltered --> ToolFilter
|
||||
|
||||
ToolFilter --> IncludeTools{Include Tools?}
|
||||
IncludeTools -->|Yes| IncludeSpecific[Include Specified<br/>Tools Only]
|
||||
IncludeTools -->|No| AllTools[All Available Tools]
|
||||
|
||||
IncludeSpecific --> ExcludeTools{Exclude Tools?}
|
||||
AllTools --> ExcludeTools
|
||||
ExcludeTools -->|Yes| RemoveTools[Remove Excluded<br/>Tools]
|
||||
ExcludeTools -->|No| FinalTools[Final Tool Set]
|
||||
|
||||
RemoveTools --> FinalTools
|
||||
FinalTools --> AIModel[Available to AI Model]
|
||||
NoMCP --> AIModel
|
||||
```
|
||||
|
||||
### **Filtering Configuration Levels**
|
||||
|
||||
**Request-Level Filtering:**
|
||||
|
||||
```bash
|
||||
# Include only specific MCP clients
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-mcp-include-clients: filesystem,websearch" \
|
||||
-d '{"model": "gpt-4o-mini", "messages": [...]}'
|
||||
|
||||
# Include only specific tools
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "x-bf-mcp-include-tools: filesystem-read_file,websearch-search" \
|
||||
-d '{"model": "gpt-4o-mini", "messages": [...]}'
|
||||
```
|
||||
|
||||
**Configuration-Level Filtering:**
|
||||
|
||||
- **Client Selection** - Choose which MCP servers to connect to
|
||||
- **Tool Blacklisting** - Permanently disable dangerous or unwanted tools
|
||||
- **Permission Mapping** - Map user roles to available tool sets
|
||||
- **Environment-Based** - Different tool sets for development vs production
|
||||
|
||||
**Security Benefits:**
|
||||
|
||||
- **Principle of Least Privilege** - Only necessary tools are exposed
|
||||
- **Dynamic Access Control** - Per-request tool availability
|
||||
- **Audit Trail** - Track which tools are used by which requests
|
||||
- **Risk Mitigation** - Prevent access to dangerous operations
|
||||
|
||||
> **📖 Tool Filtering:** [MCP Tool Control →](../../mcp/filtering)
|
||||
|
||||
---
|
||||
|
||||
## Tool Execution Engine
|
||||
|
||||
### **Async Tool Execution Architecture**
|
||||
|
||||
The MCP execution engine handles tool invocation asynchronously to maintain system responsiveness and enable complex multi-tool workflows:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant AIModel
|
||||
participant ExecutionEngine
|
||||
participant ToolInvoker
|
||||
participant MCPServer
|
||||
participant ResultProcessor
|
||||
|
||||
AIModel->>ExecutionEngine: Tool Call Request
|
||||
ExecutionEngine->>ExecutionEngine: Validate Tool Call
|
||||
ExecutionEngine->>ToolInvoker: Queue Tool Execution
|
||||
|
||||
Note over ToolInvoker: Async Tool Execution
|
||||
ToolInvoker->>MCPServer: Invoke Tool
|
||||
MCPServer->>MCPServer: Execute Tool Logic
|
||||
MCPServer-->>ToolInvoker: Raw Tool Result
|
||||
|
||||
ToolInvoker->>ResultProcessor: Process Result
|
||||
ResultProcessor->>ResultProcessor: Format & Validate
|
||||
ResultProcessor-->>ExecutionEngine: Processed Result
|
||||
|
||||
ExecutionEngine-->>AIModel: Tool Execution Complete
|
||||
|
||||
Note over AIModel: Multi-turn Conversation
|
||||
AIModel->>ExecutionEngine: Continue with Tool Results
|
||||
ExecutionEngine->>ExecutionEngine: Merge Results into Context
|
||||
ExecutionEngine-->>AIModel: Enhanced Response
|
||||
```
|
||||
|
||||
### **Execution Flow Characteristics**
|
||||
|
||||
**Validation Phase:**
|
||||
|
||||
- **Parameter Validation** - Ensure tool arguments match expected schema
|
||||
- **Permission Checking** - Verify tool access permissions for the request
|
||||
- **Rate Limiting** - Apply per-tool and per-user rate limits
|
||||
- **Security Scanning** - Check for potentially dangerous operations
|
||||
|
||||
**Execution Phase:**
|
||||
|
||||
- **Timeout Management** - Bounded execution time to prevent hanging
|
||||
- **Error Handling** - Graceful handling of tool failures and timeouts
|
||||
- **Result Streaming** - Support for tools that return streaming responses
|
||||
- **Resource Monitoring** - Track tool resource usage and performance
|
||||
|
||||
**Response Phase:**
|
||||
|
||||
- **Result Formatting** - Convert tool outputs to consistent format
|
||||
- **Error Enrichment** - Add context and suggestions for tool failures
|
||||
- **Multi-Result Aggregation** - Combine multiple tool outputs coherently
|
||||
- **Context Integration** - Merge tool results into conversation context
|
||||
|
||||
### **Multi-Turn Conversation Support**
|
||||
|
||||
The MCP system enables sophisticated multi-turn conversations where AI models can:
|
||||
|
||||
1. **Initial Tool Discovery** - Request available tools for a given context
|
||||
2. **Tool Execution** - Execute one or more tools based on user request
|
||||
3. **Result Analysis** - Analyze tool outputs and determine next steps
|
||||
4. **Follow-up Actions** - Execute additional tools based on previous results
|
||||
5. **Response Synthesis** - Combine tool results into coherent user response
|
||||
|
||||
**Example Multi-Turn Flow:**
|
||||
|
||||
```
|
||||
User: "Find recent news about AI and save interesting articles"
|
||||
AI: → Execute web_search("AI news recent")
|
||||
AI: → Analyze search results
|
||||
AI: → Execute save_article() for each interesting result
|
||||
AI: → Respond with summary of saved articles
|
||||
```
|
||||
|
||||
### **Complete User-Controlled Tool Execution Flow**
|
||||
|
||||
The following diagram shows the end-to-end user experience with MCP tool execution, highlighting the critical user control points and decision-making process:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A["👤 User Message<br/>\"List files in current directory\""] --> B["🤖 Bifrost Core"]
|
||||
|
||||
B --> C["🔧 MCP Manager<br/>Auto-discovers and adds<br/>available tools to request"]
|
||||
|
||||
C --> D["🌐 LLM Provider<br/>(OpenAI, Anthropic, etc.)"]
|
||||
|
||||
D --> E{"🔍 Response contains<br/>tool_calls?"}
|
||||
|
||||
E -->|No| F["✅ Final Response<br/>Display to user"]
|
||||
|
||||
E -->|Yes| G["📝 Add assistant message<br/>with tool_calls to history"]
|
||||
|
||||
G --> H["🛡️ YOUR EXECUTION LOGIC<br/>(Security, Approval, Logging)"]
|
||||
|
||||
H --> I{"🤔 User Decision Point<br/>Execute this tool?"}
|
||||
|
||||
I -->|Deny| J["❌ Create denial result<br/>Add to conversation history"]
|
||||
|
||||
I -->|Approve| K["⚙️ client.ExecuteMCPTool()<br/>Bifrost executes via MCP"]
|
||||
|
||||
K --> L["📊 Tool Result<br/>Add to conversation history"]
|
||||
|
||||
J --> M["🔄 Continue conversation loop<br/>Send updated history back to LLM"]
|
||||
L --> M
|
||||
|
||||
M --> D
|
||||
|
||||
style A fill:#e1f5fe
|
||||
style F fill:#e8f5e8
|
||||
style H fill:#fff3e0
|
||||
style I fill:#fce4ec
|
||||
style K fill:#f3e5f5
|
||||
```
|
||||
|
||||
**Key Flow Characteristics:**
|
||||
|
||||
**User Control Points:**
|
||||
|
||||
- **Security Layer** - Your application controls all tool execution decisions
|
||||
- **Approval Gate** - Users can approve or deny each tool execution
|
||||
- **Transparency** - Full visibility into what tools will be executed and why
|
||||
- **Conversation Continuity** - Tool results seamlessly integrate into conversation flow
|
||||
|
||||
**Security Benefits:**
|
||||
|
||||
- **No Automatic Execution** - Tools never execute without explicit approval
|
||||
- **Audit Trail** - Complete logging of all tool execution decisions
|
||||
- **Contextual Security** - Approval decisions can consider full conversation context
|
||||
- **Graceful Denials** - Denied tools result in informative responses, not errors
|
||||
|
||||
**Implementation Patterns:**
|
||||
|
||||
```go
|
||||
// Example tool execution control in your application
|
||||
func handleToolExecution(toolCall schemas.ChatToolCall, userContext UserContext) error {
|
||||
// YOUR SECURITY AND APPROVAL LOGIC HERE
|
||||
if !userContext.HasPermission(toolCall.Function.Name) {
|
||||
return createDenialResponse("Tool not permitted for user role")
|
||||
}
|
||||
|
||||
if requiresApproval(toolCall) {
|
||||
approved := promptUserForApproval(toolCall)
|
||||
if !approved {
|
||||
return createDenialResponse("User denied tool execution")
|
||||
}
|
||||
}
|
||||
|
||||
// Execute the tool via Bifrost
|
||||
result, err := client.ExecuteMCPTool(ctx, toolCall)
|
||||
if err != nil {
|
||||
return handleToolError(err)
|
||||
}
|
||||
|
||||
return addToolResultToHistory(result)
|
||||
}
|
||||
```
|
||||
|
||||
This flow ensures that while AI models can discover and request tool usage, all actual execution remains under user control, providing the perfect balance of AI capability and human oversight.
|
||||
|
||||
---
|
||||
|
||||
## Agent Mode Architecture
|
||||
|
||||
Agent Mode transforms Bifrost into an autonomous agent runtime by automatically executing pre-approved tools. This section details the internal architecture of the agent execution loop.
|
||||
|
||||
### **Agent Execution Loop**
|
||||
|
||||
The agent mode operates as an iterative loop that continues until one of the termination conditions is met:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph "Agent Mode Entry"
|
||||
A["📥 Incoming Chat Request"] --> B{"🔍 Check MCP Config<br/>Any tools_to_auto_execute?"}
|
||||
B -->|No| C["📤 Standard Flow<br/>Return tool_calls for manual execution"]
|
||||
B -->|Yes| D["🤖 Enter Agent Loop"]
|
||||
end
|
||||
|
||||
subgraph "Agent Execution Loop"
|
||||
D --> E["🌐 Send to LLM Provider<br/>With available tools"]
|
||||
E --> F{"🔧 Response has<br/>tool_calls?"}
|
||||
F -->|No| G["✅ Return Final Response<br/>No more tools needed"]
|
||||
F -->|Yes| H["📋 Classify Tool Calls"]
|
||||
|
||||
H --> I{"🔐 Separate by<br/>auto-execute status"}
|
||||
I --> J["⚡ Auto-Executable Tools"]
|
||||
I --> K["🛡️ Non-Auto-Executable Tools"]
|
||||
|
||||
J --> L["🔄 Execute in Parallel<br/>Via ToolsManager"]
|
||||
L --> M["📊 Collect Results"]
|
||||
|
||||
K --> N{"Any non-auto<br/>tools found?"}
|
||||
N -->|Yes| O["🛑 Exit Loop Early<br/>Return mixed response"]
|
||||
N -->|No| P{"⏱️ Max depth<br/>reached?"}
|
||||
|
||||
M --> P
|
||||
P -->|Yes| Q["⚠️ Return Current State<br/>May have pending tools"]
|
||||
P -->|No| R["📝 Add results to history"]
|
||||
R --> E
|
||||
end
|
||||
|
||||
subgraph "Response Handling"
|
||||
O --> S["📦 Create Mixed Response<br/>• Content: executed results JSON<br/>• tool_calls: pending tools<br/>• finish_reason: stop"]
|
||||
G --> T["📦 Standard Response<br/>Final answer from LLM"]
|
||||
Q --> U["📦 Depth Limit Response<br/>Current state with any pending"]
|
||||
end
|
||||
|
||||
style D fill:#e3f2fd
|
||||
style L fill:#e8f5e9
|
||||
style O fill:#fff3e0
|
||||
style S fill:#fce4ec
|
||||
```
|
||||
|
||||
### **Tool Classification System**
|
||||
|
||||
When the LLM returns tool calls, Bifrost classifies each tool based on the client configuration:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph "Tool Call Classification"
|
||||
TC["🔧 Tool Call<br/>from LLM Response"] --> CHECK{"Tool in<br/>tools_to_execute?"}
|
||||
CHECK -->|No| SKIP["❌ Skip<br/>Not allowed"]
|
||||
CHECK -->|Yes| AUTO{"Tool in<br/>tools_to_auto_execute?"}
|
||||
AUTO -->|Yes| EXEC["⚡ Auto-Execute<br/>Run immediately"]
|
||||
AUTO -->|No| MANUAL["🛡️ Manual<br/>Return to caller"]
|
||||
end
|
||||
|
||||
subgraph "Configuration Example"
|
||||
CONFIG["MCPClientConfig"]
|
||||
CONFIG --> TE["tools_to_execute: [*]<br/>All tools available"]
|
||||
CONFIG --> TAE["tools_to_auto_execute:<br/>[read_file, list_dir]"]
|
||||
end
|
||||
|
||||
style EXEC fill:#c8e6c9
|
||||
style MANUAL fill:#fff9c4
|
||||
style SKIP fill:#ffcdd2
|
||||
```
|
||||
|
||||
### **Mixed Tool Response Format**
|
||||
|
||||
When a response contains both auto-executable and non-auto-executable tools, the agent creates a special response format:
|
||||
|
||||
<AccordionGroup>
|
||||
<Accordion title="Chat API Response Format" icon="message" defaultOpen>
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-abc123",
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"finish_reason": "stop",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "The Output from allowed tools calls is - {\"filesystem_read_file\":\"file contents here\",\"filesystem_list_directory\":\"[\\\"file1.txt\\\",\\\"file2.txt\\\"]\"}\n\nNow I shall call these tools next...",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "call_write_123",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "filesystem_write_file",
|
||||
"arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
<Note>
|
||||
The `content` field contains JSON-formatted results from auto-executed tools. The `tool_calls` array contains only non-auto-executable tools awaiting approval. Setting `finish_reason` to `"stop"` ensures the agent loop exits.
|
||||
</Note>
|
||||
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Responses API Format" icon="code">
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "resp-abc123",
|
||||
"output": [
|
||||
{
|
||||
"type": "message",
|
||||
"role": "assistant",
|
||||
"content": [{
|
||||
"type": "text",
|
||||
"text": "The Output from allowed tools calls is - {...}\n\nNow I shall call these tools next..."
|
||||
}]
|
||||
},
|
||||
{
|
||||
"type": "function_call",
|
||||
"role": "assistant",
|
||||
"call_id": "call_write_123",
|
||||
"name": "filesystem_write_file",
|
||||
"arguments": "{\"path\":\"output.txt\",\"content\":\"...\"}"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
### **Agent Depth Control**
|
||||
|
||||
The `max_agent_depth` setting prevents infinite loops and controls resource usage:
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Depth Tracking"
|
||||
D0["Depth 0<br/>Initial Request"] --> D1["Depth 1<br/>First tool execution"]
|
||||
D1 --> D2["Depth 2<br/>Second iteration"]
|
||||
D2 --> D3["Depth 3<br/>..."]
|
||||
D3 --> DN["Depth N<br/>Max reached"]
|
||||
end
|
||||
|
||||
DN --> EXIT["🛑 Force Exit<br/>Return current state"]
|
||||
|
||||
subgraph "Configuration"
|
||||
CFG["MCPToolManagerConfig"]
|
||||
CFG --> MAX["max_agent_depth: 10<br/>(default)"]
|
||||
CFG --> TIMEOUT["tool_execution_timeout:<br/>30s per tool"]
|
||||
end
|
||||
```
|
||||
|
||||
<Warning>
|
||||
When max depth is reached, the response may contain pending tool calls that weren't executed. Your application should handle this gracefully.
|
||||
</Warning>
|
||||
|
||||
---
|
||||
|
||||
## Code Mode Architecture
|
||||
|
||||
Code Mode enables AI models to write and execute Python code (Starlark) that orchestrates multiple MCP tools in a single request. This provides a powerful meta-layer for complex multi-tool workflows.
|
||||
|
||||
### **Code Mode System Overview**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Code Mode Components"
|
||||
VM["🖥️ Starlark Interpreter<br/>Python-like Runtime"]
|
||||
VFS["📁 Virtual File System<br/>Tool Definitions as .pyi"]
|
||||
EXEC["⚙️ Code Executor<br/>Sandboxed Execution"]
|
||||
end
|
||||
|
||||
subgraph "Meta Tools"
|
||||
LIST["listToolFiles()<br/>Discover available servers"]
|
||||
READ["readToolFile(fileName)<br/>Get tool signatures"]
|
||||
DOCS["getToolDocs(server, tool)<br/>Get detailed docs"]
|
||||
CODE["executeToolCode(code)<br/>Run Python code"]
|
||||
end
|
||||
|
||||
subgraph "MCP Integration"
|
||||
TOOLS["🔧 Connected MCP Tools"]
|
||||
RESULTS["📊 Tool Results"]
|
||||
end
|
||||
|
||||
LLM["🤖 LLM"] --> LIST
|
||||
LIST --> VFS
|
||||
VFS --> LLM
|
||||
LLM --> READ
|
||||
READ --> VFS
|
||||
VFS --> LLM
|
||||
LLM --> DOCS
|
||||
DOCS --> VFS
|
||||
VFS --> LLM
|
||||
LLM --> CODE
|
||||
CODE --> VM
|
||||
VM --> EXEC
|
||||
EXEC --> TOOLS
|
||||
TOOLS --> RESULTS
|
||||
RESULTS --> LLM
|
||||
|
||||
style VM fill:#e8eaf6
|
||||
style VFS fill:#e3f2fd
|
||||
style CODE fill:#e8f5e9
|
||||
```
|
||||
|
||||
### **Virtual File System (VFS)**
|
||||
|
||||
Code Mode generates Python stub files (`.pyi`) for all connected MCP tools, providing compact function signatures:
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Server-Level Binding">
|
||||
|
||||
When `code_mode_binding_level: "server"` (default), tools are grouped by MCP client:
|
||||
|
||||
```
|
||||
servers/
|
||||
├── filesystem.pyi → All filesystem tools
|
||||
├── web_search.pyi → All web search tools
|
||||
└── database.pyi → All database tools
|
||||
```
|
||||
|
||||
**Generated Stub Example:**
|
||||
```python
|
||||
# servers/filesystem.pyi
|
||||
# Usage: filesystem.tool_name(param=value)
|
||||
# For detailed docs: use getToolDocs(server="filesystem", tool="tool_name")
|
||||
|
||||
def read_file(path: str) -> dict: # Read contents of a file
|
||||
def write_file(path: str, content: str) -> dict: # Write content to a file
|
||||
def list_directory(path: str) -> dict: # List directory contents
|
||||
```
|
||||
|
||||
**Usage in Code:**
|
||||
```python
|
||||
files = filesystem.list_directory(path=".")
|
||||
content = filesystem.read_file(path=files["entries"][0])
|
||||
result = content
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Tool-Level Binding">
|
||||
|
||||
When `code_mode_binding_level: "tool"`, each tool gets its own file:
|
||||
|
||||
```
|
||||
servers/
|
||||
├── filesystem/
|
||||
│ ├── read_file.pyi
|
||||
│ ├── write_file.pyi
|
||||
│ └── list_directory.pyi
|
||||
├── web_search/
|
||||
│ └── search.pyi
|
||||
└── database/
|
||||
└── query.pyi
|
||||
```
|
||||
|
||||
**Generated Stub Example:**
|
||||
```python
|
||||
# servers/filesystem/read_file.pyi
|
||||
# Usage: filesystem.read_file(param=value)
|
||||
|
||||
def read_file(path: str) -> dict: # Read contents of a file
|
||||
```
|
||||
|
||||
**Usage in Code:**
|
||||
```python
|
||||
content = filesystem.read_file(path="config.json")
|
||||
result = content
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### **Code Execution Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant LLM as 🤖 LLM
|
||||
participant CM as 📝 Code Mode Handler
|
||||
participant VM as 🖥️ Starlark Interpreter
|
||||
participant TM as 🔧 Tools Manager
|
||||
participant MCP as 🌐 MCP Servers
|
||||
|
||||
LLM->>CM: executeToolCode({ code: "..." })
|
||||
CM->>VM: Initialize sandbox
|
||||
CM->>VM: Inject tool bindings
|
||||
CM->>VM: Execute Python code
|
||||
|
||||
loop For each tool call in code
|
||||
VM->>TM: server.tool(param=value)
|
||||
TM->>MCP: Execute tool
|
||||
MCP-->>TM: Tool result
|
||||
TM-->>VM: Return result
|
||||
end
|
||||
|
||||
VM-->>CM: Execution result
|
||||
CM-->>LLM: { result, logs }
|
||||
```
|
||||
|
||||
### **Starlark Sandbox**
|
||||
|
||||
The code execution environment is carefully sandboxed using Starlark, a Python-like language designed for configuration and embedded scripting:
|
||||
|
||||
<AccordionGroup>
|
||||
<Accordion title="Available Features" icon="check" defaultOpen>
|
||||
|
||||
- ✅ **Python-like syntax** - Familiar Python syntax and semantics
|
||||
- ✅ **Synchronous calls** - No async/await needed, direct function calls
|
||||
- ✅ **List comprehensions** - `[x for x in items if condition]`
|
||||
- ✅ **print()** - Output captured and returned in logs
|
||||
- ✅ **Dict/List operations** - Standard Python data structures
|
||||
- ✅ **Tool bindings** - All connected MCP tools as globals
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Restricted Features" icon="ban">
|
||||
|
||||
- ❌ **Imports** - No `import` statements (tools are pre-bound)
|
||||
- ❌ **Classes** - Use dicts and functions instead
|
||||
- ❌ **File I/O** - No direct filesystem access (use MCP tools)
|
||||
- ❌ **Network** - No direct network access (use MCP tools)
|
||||
- ❌ **Randomness/Time** - Deterministic execution only
|
||||
|
||||
</Accordion>
|
||||
</AccordionGroup>
|
||||
|
||||
### **Code Mode Security Model**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Security Layers"
|
||||
L1["🔒 Code Validation<br/>Syntax checking before execution"]
|
||||
L2["🛡️ Sandboxed Runtime<br/>No external module access"]
|
||||
L3["⏱️ Execution Timeout<br/>Bounded runtime"]
|
||||
L4["🔐 Tool ACL<br/>Only allowed tools accessible"]
|
||||
end
|
||||
|
||||
subgraph "Execution Boundaries"
|
||||
B1["No filesystem access<br/>(except via MCP tools)"]
|
||||
B2["No network access<br/>(except via MCP tools)"]
|
||||
B3["No process spawning"]
|
||||
B4["Memory isolation enforced"]
|
||||
end
|
||||
|
||||
L1 --> L2 --> L3 --> L4
|
||||
L4 --> B1
|
||||
L4 --> B2
|
||||
L4 --> B3
|
||||
L4 --> B4
|
||||
```
|
||||
|
||||
### **Code Mode Configuration**
|
||||
|
||||
<Tabs>
|
||||
<Tab title="Gateway (config.json)">
|
||||
|
||||
```json
|
||||
{
|
||||
"mcp": {
|
||||
"client_configs": [
|
||||
{
|
||||
"name": "filesystem",
|
||||
"is_code_mode_client": true,
|
||||
"connection_type": "stdio",
|
||||
"stdio_config": {
|
||||
"command": "npx",
|
||||
"args": ["-y", "@anthropic/mcp-filesystem"]
|
||||
},
|
||||
"tools_to_execute": ["*"]
|
||||
}
|
||||
],
|
||||
"tool_manager_config": {
|
||||
"code_mode_binding_level": "server",
|
||||
"tool_execution_timeout": "30s"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
<Tab title="Go SDK">
|
||||
|
||||
```go
|
||||
mcpConfig := &schemas.MCPConfig{
|
||||
ClientConfigs: []schemas.MCPClientConfig{
|
||||
{
|
||||
Name: "filesystem",
|
||||
IsCodeModeClient: true,
|
||||
ConnectionType: schemas.MCPConnectionTypeSTDIO,
|
||||
StdioConfig: &schemas.MCPStdioConfig{
|
||||
Command: "npx",
|
||||
Args: []string{"-y", "@anthropic/mcp-filesystem"},
|
||||
},
|
||||
ToolsToExecute: []string{"*"},
|
||||
},
|
||||
},
|
||||
ToolManagerConfig: &schemas.MCPToolManagerConfig{
|
||||
CodeModeBindingLevel: schemas.CodeModeBindingLevelServer,
|
||||
ToolExecutionTimeout: 30 * time.Second,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
</Tab>
|
||||
</Tabs>
|
||||
|
||||
### **Code Mode vs Agent Mode**
|
||||
|
||||
| Aspect | Agent Mode | Code Mode |
|
||||
|--------|------------|-----------|
|
||||
| **Execution Model** | LLM decides one tool at a time | LLM writes code orchestrating multiple tools |
|
||||
| **Iterations** | Multiple LLM round-trips | Single LLM call, code handles orchestration |
|
||||
| **Complexity** | Simple tool chains | Complex workflows with conditionals/loops |
|
||||
| **Latency** | Higher (multiple LLM calls) | Lower (single LLM call + code execution) |
|
||||
| **Control** | Per-tool approval possible | Code runs atomically |
|
||||
| **Best For** | Interactive agents | Batch operations, complex data processing |
|
||||
|
||||
---
|
||||
|
||||
## MCP Integration Patterns
|
||||
|
||||
### **Common Integration Scenarios**
|
||||
|
||||
**1. Filesystem Operations**
|
||||
|
||||
- **Tools:** `list_files`, `read_file`, `write_file`, `create_directory`
|
||||
- **Use Cases:** Code analysis, document processing, file management
|
||||
- **Security:** Sandboxed file access, path validation, permission checks
|
||||
- **Performance:** Local execution for fast file operations
|
||||
|
||||
**2. Web Search & Information Retrieval**
|
||||
|
||||
- **Tools:** `web_search`, `fetch_url`, `extract_content`, `summarize`
|
||||
- **Use Cases:** Research assistance, fact-checking, content gathering
|
||||
- **Integration:** External search APIs, content parsing services
|
||||
- **Caching:** Response caching for repeated queries
|
||||
|
||||
**3. Database Operations**
|
||||
|
||||
- **Tools:** `query_database`, `insert_record`, `update_record`, `schema_info`
|
||||
- **Use Cases:** Data analysis, report generation, database administration
|
||||
- **Security:** Read-only access by default, query validation, injection prevention
|
||||
- **Performance:** Connection pooling, query optimization
|
||||
|
||||
**4. API Integrations**
|
||||
|
||||
- **Tools:** Custom business logic tools, third-party service integration
|
||||
- **Use Cases:** CRM operations, payment processing, notification sending
|
||||
- **Authentication:** API key management, OAuth token handling
|
||||
- **Error Handling:** Retry logic, fallback mechanisms
|
||||
|
||||
### **MCP Server Development Patterns**
|
||||
|
||||
**Simple STDIO Server:**
|
||||
|
||||
- **Language:** Any language that can read/write JSON to stdin/stdout
|
||||
- **Deployment:** Single executable, minimal dependencies
|
||||
- **Use Case:** Local tools, development utilities, simple scripts
|
||||
|
||||
**HTTP Service Server:**
|
||||
|
||||
- **Architecture:** RESTful API with MCP protocol endpoints
|
||||
- **Scalability:** Horizontal scaling, load balancing
|
||||
- **Use Case:** Shared tools, enterprise integrations, cloud services
|
||||
|
||||
**Hybrid Approach:**
|
||||
|
||||
- **Local + Remote:** Combine STDIO tools for local operations with HTTP for remote services
|
||||
- **Failover:** Use local fallbacks when remote services are unavailable
|
||||
- **Optimization:** Route tool calls to most appropriate execution environment
|
||||
|
||||
> **📖 MCP Development:** [Tool Development Guide →](../../mcp/overview)
|
||||
|
||||
---
|
||||
|
||||
## Security & Safety Considerations
|
||||
|
||||
### **MCP Security Architecture**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Security Layers"
|
||||
L1[Connection Security<br/>Authentication & Encryption]
|
||||
L2[Tool Validation<br/>Schema & Permission Checks]
|
||||
L3[Execution Security<br/>Sandboxing & Limits]
|
||||
L4[Result Security<br/>Output Validation & Filtering]
|
||||
end
|
||||
|
||||
subgraph "Threat Mitigation"
|
||||
T1[Malicious Tools<br/>Code Injection Prevention]
|
||||
T2[Resource Abuse<br/>Rate Limiting & Quotas]
|
||||
T3[Data Exposure<br/>Output Sanitization]
|
||||
T4[System Access<br/>Privilege Isolation]
|
||||
end
|
||||
|
||||
L1 --> T1
|
||||
L2 --> T2
|
||||
L3 --> T4
|
||||
L4 --> T3
|
||||
```
|
||||
|
||||
**Security Measures:**
|
||||
|
||||
**Connection Security:**
|
||||
|
||||
- **Authentication** - API keys, certificates, or token-based auth for HTTP/SSE
|
||||
- **Encryption** - TLS for HTTP connections, secure pipes for STDIO
|
||||
- **Network Isolation** - Firewall rules and network segmentation
|
||||
|
||||
**Execution Security:**
|
||||
|
||||
- **Sandboxing** - Isolated execution environments for tools
|
||||
- **Resource Limits** - CPU, memory, and time constraints
|
||||
- **Permission Model** - Principle of least privilege for tool access
|
||||
|
||||
**Operational Security:**
|
||||
|
||||
- **Regular Updates** - Keep MCP servers and tools updated
|
||||
- **Monitoring** - Continuous security monitoring and alerting
|
||||
- **Incident Response** - Procedures for security incidents involving tools
|
||||
|
||||
---
|
||||
|
||||
## Related Architecture Documentation
|
||||
|
||||
- **[Request Flow](./request-flow)** - MCP integration in request processing
|
||||
- **[Concurrency Model](./concurrency)** - MCP concurrency and worker integration
|
||||
- **[Plugin System](./plugins)** - Integration between MCP and plugin systems
|
||||
- **[Benchmarks](../../benchmarking/getting-started)** - MCP performance impact and optimization
|
||||
|
||||
|
||||
|
||||
552
docs/architecture/core/plugins.mdx
Normal file
552
docs/architecture/core/plugins.mdx
Normal file
@@ -0,0 +1,552 @@
|
||||
---
|
||||
title: "Plugins"
|
||||
description: "Deep dive into Bifrost's extensible plugin architecture - how plugins work internally, lifecycle management, execution model, and integration patterns."
|
||||
icon: "puzzle-piece"
|
||||
---
|
||||
|
||||
## Plugin Architecture Philosophy
|
||||
|
||||
### **Core Design Principles**
|
||||
|
||||
Bifrost's plugin system is built around five key principles that ensure extensibility without compromising performance or reliability:
|
||||
|
||||
| Principle | Implementation | Benefit |
|
||||
| ----------------------------- | ------------------------------------------------ | ------------------------------------------------ |
|
||||
| **Plugin-First Design** | Core logic designed around plugin hook points | Maximum extensibility without core modifications |
|
||||
| **Zero-Copy Integration** | Direct memory access to request/response objects | Minimal performance overhead |
|
||||
| **Lifecycle Management** | Complete plugin lifecycle with automatic cleanup | Resource safety and leak prevention |
|
||||
| **Interface-Based Safety** | Well-defined interfaces for type safety | Compile-time validation and consistency |
|
||||
| **Failure Isolation** | Plugin errors don't crash the core system | Fault tolerance and system stability |
|
||||
|
||||
### **Plugin System Overview**
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Plugin Management Layer"
|
||||
PluginMgr[Plugin Manager<br/>Central Controller]
|
||||
Registry[Plugin Registry<br/>Discovery & Loading]
|
||||
Lifecycle[Lifecycle Manager<br/>State Management]
|
||||
end
|
||||
|
||||
subgraph "Plugin Execution Layer"
|
||||
Pipeline[Plugin Pipeline<br/>Execution Orchestrator]
|
||||
PreHooks[Pre-Processing Hooks<br/>Request Modification]
|
||||
PostHooks[Post-Processing Hooks<br/>Response Enhancement]
|
||||
end
|
||||
|
||||
subgraph "Plugin Categories"
|
||||
Auth[Authentication<br/>& Authorization]
|
||||
RateLimit[Rate Limiting<br/>& Throttling]
|
||||
Transform[Data Transformation<br/>& Validation]
|
||||
Monitor[Monitoring<br/>& Analytics]
|
||||
Custom[Custom Business<br/>Logic]
|
||||
end
|
||||
|
||||
PluginMgr --> Registry
|
||||
Registry --> Lifecycle
|
||||
Lifecycle --> Pipeline
|
||||
|
||||
Pipeline --> PreHooks
|
||||
Pipeline --> PostHooks
|
||||
|
||||
PreHooks --> Auth
|
||||
PreHooks --> RateLimit
|
||||
PostHooks --> Transform
|
||||
PostHooks --> Monitor
|
||||
PostHooks --> Custom
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Plugin Lifecycle Management
|
||||
|
||||
### **Complete Lifecycle States**
|
||||
|
||||
Every plugin goes through a well-defined lifecycle that ensures proper resource management and error handling:
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> PluginInit: Plugin Creation
|
||||
PluginInit --> Registered: Add to BifrostConfig
|
||||
Registered --> PreHookCall: Request Received
|
||||
|
||||
PreHookCall --> ModifyRequest: Normal Flow
|
||||
PreHookCall --> ShortCircuitResponse: Return Response
|
||||
PreHookCall --> ShortCircuitError: Return Error
|
||||
|
||||
ModifyRequest --> ProviderCall: Send to Provider
|
||||
ProviderCall --> PostHookCall: Receive Response
|
||||
|
||||
ShortCircuitResponse --> PostHookCall: Skip Provider
|
||||
ShortCircuitError --> PostHookCall: Pipeline Symmetry
|
||||
|
||||
PostHookCall --> ModifyResponse: Process Result
|
||||
PostHookCall --> RecoverError: Error Recovery
|
||||
PostHookCall --> FallbackCheck: Check AllowFallbacks
|
||||
PostHookCall --> ResponseReady: Pass Through
|
||||
|
||||
FallbackCheck --> TryFallback: AllowFallbacks=true/nil
|
||||
FallbackCheck --> ResponseReady: AllowFallbacks=false
|
||||
TryFallback --> PreHookCall: Next Provider
|
||||
|
||||
ModifyResponse --> ResponseReady: Modified
|
||||
RecoverError --> ResponseReady: Recovered
|
||||
ResponseReady --> [*]: Return to Client
|
||||
|
||||
Registered --> CleanupCall: Bifrost Shutdown
|
||||
CleanupCall --> [*]: Plugin Destroyed
|
||||
```
|
||||
|
||||
### **Lifecycle Phase Details**
|
||||
|
||||
**Discovery Phase:**
|
||||
|
||||
- **Purpose:** Find and catalog available plugins
|
||||
- **Sources:** Command line, environment variables, JSON configuration, directory scanning
|
||||
- **Validation:** Basic existence and format checks
|
||||
- **Output:** Plugin descriptors with metadata
|
||||
|
||||
**Loading Phase:**
|
||||
|
||||
- **Purpose:** Load plugin binaries into memory
|
||||
- **Security:** Digital signature verification and checksum validation
|
||||
- **Compatibility:** Interface implementation validation
|
||||
- **Resource:** Memory and capability assessment
|
||||
|
||||
**Initialization Phase:**
|
||||
|
||||
- **Purpose:** Configure plugin with runtime settings
|
||||
- **Timeout:** Bounded initialization time to prevent hanging
|
||||
- **Dependencies:** External service connectivity verification
|
||||
- **State:** Internal state setup and resource allocation
|
||||
|
||||
**Runtime Phase:**
|
||||
|
||||
- **Purpose:** Active request processing
|
||||
- **Monitoring:** Continuous health checking and performance tracking
|
||||
- **Recovery:** Automatic error recovery and degraded mode handling
|
||||
- **Metrics:** Real-time performance and health metrics collection
|
||||
|
||||
> **Plugin Lifecycle:** [Plugin Management →](../../enterprise/custom-plugins)
|
||||
|
||||
---
|
||||
|
||||
## Plugin Execution Pipeline
|
||||
|
||||
### **Request Processing Flow**
|
||||
|
||||
The plugin pipeline ensures consistent, predictable execution while maintaining high performance:
|
||||
|
||||
#### **Normal Execution Flow (No Short-Circuit)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Bifrost
|
||||
participant Plugin1
|
||||
participant Plugin2
|
||||
participant Provider
|
||||
|
||||
Client->>Bifrost: Request
|
||||
Bifrost->>Plugin1: PreLLMHook(request)
|
||||
Plugin1-->>Bifrost: modified request
|
||||
Bifrost->>Plugin2: PreLLMHook(request)
|
||||
Plugin2-->>Bifrost: modified request
|
||||
Bifrost->>Provider: API Call
|
||||
Provider-->>Bifrost: response
|
||||
Bifrost->>Plugin2: PostLLMHook(response)
|
||||
Plugin2-->>Bifrost: modified response
|
||||
Bifrost->>Plugin1: PostLLMHook(response)
|
||||
Plugin1-->>Bifrost: modified response
|
||||
Bifrost-->>Client: Final Response
|
||||
```
|
||||
|
||||
**Execution Order:**
|
||||
|
||||
1. **PreHooks:** Execute in registration order (1 → 2 → N)
|
||||
2. **Provider Call:** If no short-circuit occurred
|
||||
3. **PostHooks:** Execute in reverse order (N → 2 → 1)
|
||||
|
||||
#### **Short-Circuit Response Flow (Cache Hit)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Bifrost
|
||||
participant Cache
|
||||
participant Auth
|
||||
participant Provider
|
||||
|
||||
Client->>Bifrost: Request
|
||||
Bifrost->>Auth: PreLLMHook(request)
|
||||
Auth-->>Bifrost: modified request
|
||||
Bifrost->>Cache: PreLLMHook(request)
|
||||
Cache-->>Bifrost: LLMPluginShortCircuit{Response}
|
||||
Note over Provider: Provider call skipped
|
||||
Bifrost->>Cache: PostLLMHook(response)
|
||||
Cache-->>Bifrost: modified response
|
||||
Bifrost->>Auth: PostLLMHook(response)
|
||||
Auth-->>Bifrost: modified response
|
||||
Bifrost-->>Client: Cached Response
|
||||
```
|
||||
|
||||
#### **Streaming Response Flow**
|
||||
|
||||
For streaming responses, the plugin pipeline executes post-hooks for every delta/chunk received from the provider:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Bifrost
|
||||
participant Plugin1
|
||||
participant Plugin2
|
||||
participant Provider
|
||||
|
||||
Client->>Bifrost: Stream Request
|
||||
Bifrost->>Plugin1: PreLLMHook(request)
|
||||
Plugin1-->>Bifrost: modified request
|
||||
Bifrost->>Plugin2: PreLLMHook(request)
|
||||
Plugin2-->>Bifrost: modified request
|
||||
Bifrost->>Provider: Stream API Call
|
||||
|
||||
loop For Each Delta
|
||||
Provider-->>Bifrost: stream delta
|
||||
Bifrost->>Plugin2: PostLLMHook(delta)
|
||||
Plugin2-->>Bifrost: modified delta
|
||||
Bifrost->>Plugin1: PostLLMHook(delta)
|
||||
Plugin1-->>Bifrost: modified delta
|
||||
Bifrost-->>Client: Send Delta
|
||||
end
|
||||
|
||||
Provider-->>Bifrost: final chunk (finish reason)
|
||||
Bifrost->>Plugin2: PostLLMHook(final)
|
||||
Plugin2-->>Bifrost: modified final
|
||||
Bifrost->>Plugin1: PostLLMHook(final)
|
||||
Plugin1-->>Bifrost: modified final
|
||||
Bifrost-->>Client: Final Chunk
|
||||
```
|
||||
|
||||
**Streaming Execution Characteristics:**
|
||||
|
||||
1. **Delta Processing:**
|
||||
- Each stream delta (chunk) goes through all post-hooks
|
||||
- Plugins can modify/transform each delta before it reaches the client
|
||||
- Deltas can contain: text content, tool calls, role changes, or usage info
|
||||
|
||||
2. **Special Delta Types:**
|
||||
- **Start Event:** Initial delta with role information
|
||||
- **Content Delta:** Regular text or tool call content
|
||||
- **Usage Update:** Token usage statistics (if enabled)
|
||||
- **Final Chunk:** Contains finish reason and any final metadata
|
||||
|
||||
3. **Plugin Considerations:**
|
||||
- Plugins must handle streaming responses efficiently
|
||||
- Each delta should be processed quickly to maintain stream responsiveness
|
||||
- Plugins can track state across deltas using context
|
||||
- Heavy processing should be done asynchronously
|
||||
|
||||
4. **Error Handling:**
|
||||
- If a post-hook returns an error, it's sent as an error stream chunk
|
||||
- Stream is terminated after error chunks
|
||||
- Plugins can recover from errors by providing valid responses
|
||||
|
||||
5. **Performance Optimization:**
|
||||
- Lightweight delta processing to minimize latency
|
||||
- Object pooling for common data structures
|
||||
- Non-blocking operations for logging and metrics
|
||||
- Efficient memory management for stream processing
|
||||
|
||||
> **Streaming Details:** [Streaming Guide →](../../quickstart/gateway/streaming)
|
||||
|
||||
**Short-Circuit Rules:**
|
||||
|
||||
- **Provider Skipped:** When plugin returns short-circuit response/error
|
||||
- **PostLLMHook Guarantee:** All executed PreHooks get corresponding PostLLMHook calls
|
||||
- **Reverse Order:** PostHooks execute in reverse order of PreHooks
|
||||
|
||||
#### **Short-Circuit Error Flow (Allow Fallbacks)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Bifrost
|
||||
participant Plugin1
|
||||
participant Provider1
|
||||
participant Provider2
|
||||
|
||||
Client->>Bifrost: Request (Provider1 + Fallback Provider2)
|
||||
Bifrost->>Plugin1: PreLLMHook(request)
|
||||
Plugin1-->>Bifrost: LLMPluginShortCircuit{Error, AllowFallbacks=true}
|
||||
Note over Provider1: Provider1 call skipped
|
||||
Bifrost->>Plugin1: PostLLMHook(error)
|
||||
Plugin1-->>Bifrost: error unchanged
|
||||
|
||||
Note over Bifrost: Try fallback provider
|
||||
Bifrost->>Plugin1: PreLLMHook(request for Provider2)
|
||||
Plugin1-->>Bifrost: modified request
|
||||
Bifrost->>Provider2: API Call
|
||||
Provider2-->>Bifrost: response
|
||||
Bifrost->>Plugin1: PostLLMHook(response)
|
||||
Plugin1-->>Bifrost: modified response
|
||||
Bifrost-->>Client: Final Response
|
||||
```
|
||||
|
||||
#### **Error Recovery Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant Bifrost
|
||||
participant Plugin1
|
||||
participant Plugin2
|
||||
participant Provider
|
||||
participant RecoveryPlugin
|
||||
|
||||
Client->>Bifrost: Request
|
||||
Bifrost->>Plugin1: PreLLMHook(request)
|
||||
Plugin1-->>Bifrost: modified request
|
||||
Bifrost->>Plugin2: PreLLMHook(request)
|
||||
Plugin2-->>Bifrost: modified request
|
||||
Bifrost->>RecoveryPlugin: PreLLMHook(request)
|
||||
RecoveryPlugin-->>Bifrost: modified request
|
||||
Bifrost->>Provider: API Call
|
||||
Provider-->>Bifrost: error
|
||||
Bifrost->>RecoveryPlugin: PostLLMHook(error)
|
||||
RecoveryPlugin-->>Bifrost: recovered response
|
||||
Bifrost->>Plugin2: PostLLMHook(response)
|
||||
Plugin2-->>Bifrost: modified response
|
||||
Bifrost->>Plugin1: PostLLMHook(response)
|
||||
Plugin1-->>Bifrost: modified response
|
||||
Bifrost-->>Client: Recovered Response
|
||||
```
|
||||
|
||||
**Error Recovery Features:**
|
||||
|
||||
- **Error Transformation:** Plugins can convert errors to successful responses
|
||||
- **Graceful Degradation:** Provide fallback responses for service failures
|
||||
- **Context Preservation:** Error context is maintained through recovery process
|
||||
|
||||
### **Complex Plugin Decision Flow**
|
||||
|
||||
Real-world plugin interactions involving authentication, rate limiting, and caching with different decision paths:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A["Client Request"] --> B["Bifrost"]
|
||||
B --> C["Auth Plugin PreLLMHook"]
|
||||
C --> D{"Authenticated?"}
|
||||
D -->|No| E["Return Auth Error<br/>AllowFallbacks=false"]
|
||||
D -->|Yes| F["RateLimit Plugin PreLLMHook"]
|
||||
F --> G{"Rate Limited?"}
|
||||
G -->|Yes| H["Return Rate Error<br/>AllowFallbacks=nil"]
|
||||
G -->|No| I["Cache Plugin PreLLMHook"]
|
||||
I --> J{"Cache Hit?"}
|
||||
J -->|Yes| K["Return Cached Response"]
|
||||
J -->|No| L["Provider API Call"]
|
||||
L --> M["Cache Plugin PostLLMHook"]
|
||||
M --> N["Store in Cache"]
|
||||
N --> O["RateLimit Plugin PostLLMHook"]
|
||||
O --> P["Auth Plugin PostLLMHook"]
|
||||
P --> Q["Final Response"]
|
||||
|
||||
E --> R["Skip Fallbacks"]
|
||||
H --> S["Try Fallback Provider"]
|
||||
K --> T["Skip Provider Call"]
|
||||
```
|
||||
|
||||
### **Execution Characteristics**
|
||||
|
||||
**Symmetric Execution Pattern:**
|
||||
|
||||
- **Pre-processing:** Plugins execute in priority order (high to low)
|
||||
- **Post-processing:** Plugins execute in reverse order (low to high)
|
||||
- **Rationale:** Ensures proper cleanup and state management (last in, first out)
|
||||
|
||||
**Performance Optimizations:**
|
||||
|
||||
- **Timeout Boundaries:** Each plugin has configurable execution timeouts
|
||||
- **Panic Recovery:** Plugin panics are caught and logged without crashing the system
|
||||
- **Resource Limits:** Memory and CPU limits prevent runaway plugins
|
||||
- **Circuit Breaking:** Repeated failures trigger plugin isolation
|
||||
|
||||
**Error Handling Strategies:**
|
||||
|
||||
- **Continue:** Use original request/response if plugin fails
|
||||
- **Fail Fast:** Return error immediately if critical plugin fails
|
||||
- **Retry:** Attempt plugin execution with exponential backoff
|
||||
- **Fallback:** Use alternative plugin or default behavior
|
||||
|
||||
> **Plugin Execution:** [Request Flow →](./request-flow#stage-3-plugin-pipeline-processing)
|
||||
|
||||
---
|
||||
|
||||
## Security & Validation
|
||||
|
||||
### **Multi-Layer Security Model**
|
||||
|
||||
Plugin security operates at multiple layers to ensure system integrity:
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Security Validation Layers"
|
||||
L1[Layer 1: Binary Validation<br/>Signature & Checksum]
|
||||
L2[Layer 2: Interface Validation<br/>Type Safety & Compatibility]
|
||||
L3[Layer 3: Runtime Validation<br/>Resource Limits & Timeouts]
|
||||
L4[Layer 4: Execution Isolation<br/>Panic Recovery & Error Handling]
|
||||
end
|
||||
|
||||
subgraph "Security Benefits"
|
||||
Integrity[Code Integrity<br/>Verified Authenticity]
|
||||
Safety[Type Safety<br/>Compile-time Checks]
|
||||
Stability[System Stability<br/>Isolated Failures]
|
||||
Performance[Performance Protection<br/>Resource Limits]
|
||||
end
|
||||
|
||||
L1 --> Integrity
|
||||
L2 --> Safety
|
||||
L3 --> Performance
|
||||
L4 --> Stability
|
||||
```
|
||||
|
||||
### **Validation Process**
|
||||
|
||||
**Binary Security:**
|
||||
|
||||
- **Digital Signatures:** Cryptographic verification of plugin authenticity
|
||||
- **Checksum Validation:** File integrity verification
|
||||
- **Source Verification:** Trusted source requirements
|
||||
|
||||
**Interface Security:**
|
||||
|
||||
- **Type Safety:** Interface implementation verification
|
||||
- **Version Compatibility:** Plugin API version checking
|
||||
- **Memory Safety:** Safe memory access patterns
|
||||
|
||||
**Runtime Security:**
|
||||
|
||||
- **Resource Quotas:** Memory and CPU usage limits
|
||||
- **Execution Timeouts:** Bounded execution time
|
||||
- **Sandbox Execution:** Isolated execution environment
|
||||
|
||||
**Operational Security:**
|
||||
|
||||
- **Health Monitoring:** Continuous plugin health assessment
|
||||
- **Error Tracking:** Plugin error rate monitoring
|
||||
- **Automatic Recovery:** Failed plugin restart and recovery
|
||||
|
||||
---
|
||||
|
||||
## Plugin Performance & Monitoring
|
||||
|
||||
### **Comprehensive Metrics System**
|
||||
|
||||
Bifrost provides detailed metrics for plugin performance and health monitoring:
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Execution Metrics"
|
||||
ExecTime[Execution Time<br/>Latency per Plugin]
|
||||
ExecCount[Execution Count<br/>Request Volume]
|
||||
SuccessRate[Success Rate<br/>Error Percentage]
|
||||
Throughput[Throughput<br/>Requests/Second]
|
||||
end
|
||||
|
||||
subgraph "Resource Metrics"
|
||||
MemoryUsage[Memory Usage<br/>Per Plugin Instance]
|
||||
CPUUsage[CPU Utilization<br/>Processing Time]
|
||||
IOMetrics[I/O Operations<br/>Network/Disk Activity]
|
||||
PoolUtilization[Pool Utilization<br/>Resource Efficiency]
|
||||
end
|
||||
|
||||
subgraph "Health Metrics"
|
||||
ErrorRate[Error Rate<br/>Failed Executions]
|
||||
PanicCount[Panic Recovery<br/>Crash Events]
|
||||
TimeoutCount[Timeout Events<br/>Slow Executions]
|
||||
RecoveryRate[Recovery Success<br/>Failure Handling]
|
||||
end
|
||||
|
||||
subgraph "Business Metrics"
|
||||
AddedLatency[Added Latency<br/>Plugin Overhead]
|
||||
SystemImpact[System Impact<br/>Overall Performance]
|
||||
FeatureUsage[Feature Usage<br/>Plugin Utilization]
|
||||
CostImpact[Cost Impact<br/>Resource Consumption]
|
||||
end
|
||||
```
|
||||
|
||||
### **Performance Characteristics**
|
||||
|
||||
**Plugin Execution Performance:**
|
||||
|
||||
- **Typical Overhead:** 1-10μs per plugin for simple operations
|
||||
- **Authentication Plugins:** 1-5μs for key validation
|
||||
- **Rate Limiting Plugins:** 500ns for quota checks
|
||||
- **Monitoring Plugins:** 200ns for metric collection
|
||||
- **Transformation Plugins:** 2-10μs depending on complexity
|
||||
|
||||
**Resource Usage Patterns:**
|
||||
|
||||
- **Memory Efficiency:** Object pooling reduces allocations
|
||||
- **CPU Optimization:** Minimal processing overhead
|
||||
- **Network Impact:** Configurable external service calls
|
||||
- **Storage Overhead:** Minimal for stateless plugins
|
||||
|
||||
---
|
||||
|
||||
## Plugin Integration Patterns
|
||||
|
||||
### **Common Integration Scenarios**
|
||||
|
||||
**1. Authentication & Authorization**
|
||||
|
||||
- **Pre-processing Hook:** Validate API keys or JWT tokens
|
||||
- **Configuration:** External identity provider integration
|
||||
- **Error Handling:** Return 401/403 responses for invalid credentials
|
||||
- **Performance:** Sub-5μs validation with caching
|
||||
|
||||
**2. Rate Limiting & Quotas**
|
||||
|
||||
- **Pre-processing Hook:** Check request quotas and limits
|
||||
- **Storage:** Redis or in-memory rate limit tracking
|
||||
- **Algorithms:** Token bucket, sliding window, fixed window
|
||||
- **Responses:** 429 Too Many Requests with retry headers
|
||||
|
||||
**3. Request/Response Transformation**
|
||||
|
||||
- **Dual Hooks:** Pre-processing for requests, post-processing for responses
|
||||
- **Use Cases:** Data format conversion, field mapping, content filtering
|
||||
- **Performance:** Streaming transformations for large payloads
|
||||
- **Compatibility:** Provider-specific format adaptations
|
||||
|
||||
**4. Monitoring & Analytics**
|
||||
|
||||
- **Post-processing Hook:** Collect metrics and logs after request completion
|
||||
- **Destinations:** Prometheus, DataDog, custom analytics systems
|
||||
- **Data:** Request/response metadata, performance metrics, error tracking
|
||||
- **Privacy:** Configurable data sanitization and filtering
|
||||
|
||||
### **Plugin Communication Patterns**
|
||||
|
||||
**Plugin-to-Plugin Communication:**
|
||||
|
||||
- **Shared Context:** Plugins can store data in request context for downstream plugins
|
||||
- **Event System:** Plugin can emit events for other plugins to consume
|
||||
- **Data Passing:** Structured data exchange between related plugins
|
||||
|
||||
**Plugin-to-External Service Communication:**
|
||||
|
||||
- **HTTP Clients:** Built-in HTTP client pools for external API calls
|
||||
- **Database Connections:** Connection pooling for database access
|
||||
- **Message Queues:** Integration with message queue systems
|
||||
- **Caching Systems:** Redis, Memcached integration for state storage
|
||||
|
||||
> **📖 Integration Examples:** [Plugin Development Guide →](../../enterprise/custom-plugins)
|
||||
|
||||
---
|
||||
|
||||
## Related Architecture Documentation
|
||||
|
||||
- **[Request Flow](./request-flow)** - Plugin execution in request processing pipeline
|
||||
- **[Concurrency Model](./concurrency)** - Plugin concurrency and threading considerations
|
||||
- **[Benchmarks](../../benchmarking/getting-started)** - Plugin performance characteristics and optimization
|
||||
- **[MCP System](./mcp)** - Integration between plugins and MCP system
|
||||
|
||||
0
docs/architecture/core/providers.mdx
Normal file
0
docs/architecture/core/providers.mdx
Normal file
527
docs/architecture/core/request-flow.mdx
Normal file
527
docs/architecture/core/request-flow.mdx
Normal file
@@ -0,0 +1,527 @@
|
||||
---
|
||||
title: "Request Flow"
|
||||
description: "Deep dive into Bifrost's request processing pipeline - from transport layer ingestion through provider execution to response delivery."
|
||||
icon: "route"
|
||||
---
|
||||
|
||||
## Stage 1: Transport Layer Processing
|
||||
|
||||
### **HTTP Transport Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant HTTPTransport
|
||||
participant Router
|
||||
participant Validation
|
||||
|
||||
Client->>HTTPTransport: POST /v1/chat/completions
|
||||
HTTPTransport->>HTTPTransport: Parse Headers
|
||||
HTTPTransport->>HTTPTransport: Extract Body
|
||||
HTTPTransport->>Validation: Validate JSON Schema
|
||||
Validation->>Router: BifrostRequest
|
||||
Router-->>HTTPTransport: Processing Started
|
||||
HTTPTransport-->>Client: HTTP 200 (async processing)
|
||||
```
|
||||
|
||||
**Key Processing Steps:**
|
||||
|
||||
1. **Request Reception** - FastHTTP server receives request
|
||||
2. **Header Processing** - Extract authentication, content-type, custom headers
|
||||
3. **Body Parsing** - JSON unmarshaling with schema validation
|
||||
4. **Request Transformation** - Convert to internal `BifrostRequest` schema
|
||||
5. **Context Creation** - Build request context with metadata
|
||||
|
||||
**Performance Characteristics:**
|
||||
|
||||
- **Parsing Time:** ~2.1μs for typical requests
|
||||
- **Validation Overhead:** ~400ns for schema checks
|
||||
- **Memory Allocation:** Zero-copy where possible
|
||||
|
||||
### **Go SDK Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Application
|
||||
participant SDK
|
||||
participant Core
|
||||
participant Validation
|
||||
|
||||
Application->>SDK: bifrost.ChatCompletion(req)
|
||||
SDK->>SDK: Type Validation
|
||||
SDK->>Core: Direct Function Call
|
||||
Core->>Validation: Schema Validation
|
||||
Validation-->>Core: Validated Request
|
||||
Core-->>SDK: Processing Result
|
||||
SDK-->>Application: Typed Response
|
||||
```
|
||||
|
||||
**Advantages:**
|
||||
|
||||
- **Zero Serialization** - Direct Go struct passing
|
||||
- **Type Safety** - Compile-time validation
|
||||
- **Lower Latency** - No HTTP/JSON overhead
|
||||
- **Memory Efficiency** - No intermediate allocations
|
||||
|
||||
---
|
||||
|
||||
## Stage 2: Request Routing & Load Balancing
|
||||
|
||||
### **Provider Selection Logic**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Request[Incoming Request] --> ModelCheck{Model Available?}
|
||||
ModelCheck -->|Yes| ProviderDirect[Use Specified Provider]
|
||||
ModelCheck -->|No| ModelMapping[Model → Provider Mapping]
|
||||
|
||||
ProviderDirect --> KeyPool[API Key Pool]
|
||||
ModelMapping --> KeyPool
|
||||
|
||||
KeyPool --> WeightedSelect[Weighted Random Selection]
|
||||
WeightedSelect --> HealthCheck{Provider Healthy?}
|
||||
|
||||
HealthCheck -->|Yes| AssignWorker[Assign Worker]
|
||||
HealthCheck -->|No| CircuitBreaker[Circuit Breaker]
|
||||
|
||||
CircuitBreaker --> FallbackCheck{Fallback Available?}
|
||||
FallbackCheck -->|Yes| FallbackProvider[Try Fallback]
|
||||
FallbackCheck -->|No| ErrorResponse[Return Error]
|
||||
|
||||
FallbackProvider --> KeyPool
|
||||
```
|
||||
|
||||
**Key Selection Algorithm:**
|
||||
|
||||
```go
|
||||
// Weighted random key selection
|
||||
type KeySelector struct {
|
||||
keys []APIKey
|
||||
weights []float64
|
||||
total float64
|
||||
}
|
||||
|
||||
func (ks *KeySelector) SelectKey() *APIKey {
|
||||
r := rand.Float64() * ks.total
|
||||
cumulative := 0.0
|
||||
|
||||
for i, weight := range ks.weights {
|
||||
cumulative += weight
|
||||
if r <= cumulative {
|
||||
return &ks.keys[i]
|
||||
}
|
||||
}
|
||||
return &ks.keys[len(ks.keys)-1]
|
||||
}
|
||||
```
|
||||
|
||||
**Performance Metrics:**
|
||||
|
||||
- **Key Selection Time:** ~10ns (constant time)
|
||||
- **Health Check Overhead:** ~50ns (cached results)
|
||||
- **Fallback Decision:** ~25ns (configuration lookup)
|
||||
|
||||
---
|
||||
|
||||
## Stage 3: Plugin Pipeline Processing
|
||||
|
||||
### **Pre-Processing Hooks**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Request
|
||||
participant AuthPlugin
|
||||
participant RateLimitPlugin
|
||||
participant TransformPlugin
|
||||
participant Core
|
||||
|
||||
Request->>AuthPlugin: ProcessRequest()
|
||||
AuthPlugin->>AuthPlugin: Validate API Key
|
||||
AuthPlugin->>RateLimitPlugin: Authorized Request
|
||||
|
||||
RateLimitPlugin->>RateLimitPlugin: Check Rate Limits
|
||||
RateLimitPlugin->>TransformPlugin: Allowed Request
|
||||
|
||||
TransformPlugin->>TransformPlugin: Modify Request
|
||||
TransformPlugin->>Core: Final Request
|
||||
```
|
||||
|
||||
**Plugin Execution Model:**
|
||||
|
||||
```go
|
||||
type PluginManager struct {
|
||||
plugins []Plugin
|
||||
}
|
||||
|
||||
func (pm *PluginManager) ExecutePreHooks(
|
||||
ctx BifrostContext,
|
||||
req *BifrostRequest,
|
||||
) (*BifrostRequest, *BifrostError) {
|
||||
for _, plugin := range pm.plugins {
|
||||
modifiedReq, err := plugin.ProcessRequest(ctx, req)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
req = modifiedReq
|
||||
}
|
||||
return req, nil
|
||||
}
|
||||
```
|
||||
|
||||
**Plugin Types & Performance:**
|
||||
|
||||
| Plugin Type | Processing Time | Memory Impact | Failure Mode |
|
||||
| --------------------- | --------------- | ------------- | ---------------------- |
|
||||
| **Authentication** | ~1-5μs | Minimal | Reject request |
|
||||
| **Rate Limiting** | ~500ns | Cache-based | Throttle/reject |
|
||||
| **Request Transform** | ~2-10μs | Copy-on-write | Continue with original |
|
||||
| **Monitoring** | ~200ns | Append-only | Continue silently |
|
||||
|
||||
---
|
||||
|
||||
## Stage 4: MCP Tool Discovery & Integration
|
||||
|
||||
### **Tool Discovery Process**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Request[Request with Model] --> MCPCheck{MCP Enabled?}
|
||||
MCPCheck -->|No| SkipMCP[Skip MCP Processing]
|
||||
MCPCheck -->|Yes| ClientLookup[MCP Client Lookup]
|
||||
|
||||
ClientLookup --> ToolFilter[Tool Filtering]
|
||||
ToolFilter --> ToolInject[Inject Tools into Request]
|
||||
|
||||
ToolFilter --> IncludeCheck{Include Filter?}
|
||||
ToolFilter --> ExcludeCheck{Exclude Filter?}
|
||||
|
||||
IncludeCheck -->|Yes| IncludeTools[Include Specified Tools]
|
||||
IncludeCheck -->|No| AllTools[Include All Tools]
|
||||
|
||||
ExcludeCheck -->|Yes| RemoveTools[Remove Excluded Tools]
|
||||
ExcludeCheck -->|No| KeepFiltered[Keep Filtered Tools]
|
||||
|
||||
IncludeTools --> ToolInject
|
||||
AllTools --> ToolInject
|
||||
RemoveTools --> ToolInject
|
||||
KeepFiltered --> ToolInject
|
||||
|
||||
ToolInject --> EnhancedRequest[Request with Tools]
|
||||
SkipMCP --> EnhancedRequest
|
||||
```
|
||||
|
||||
**Tool Integration Algorithm:**
|
||||
|
||||
```go
|
||||
func (mcpm *MCPManager) EnhanceRequest(
|
||||
ctx BifrostContext,
|
||||
req *BifrostChatRequest,
|
||||
) (*BifrostRequest, error) {
|
||||
// Extract tool filtering from context
|
||||
includeClients := ctx.GetStringSlice("mcp-include-clients")
|
||||
includeTools := ctx.GetStringSlice("mcp-include-tools")
|
||||
|
||||
// Get available tools
|
||||
availableTools := mcpm.getAvailableTools(includeClients)
|
||||
|
||||
// Filter tools
|
||||
filteredTools := mcpm.filterTools(availableTools, includeTools)
|
||||
|
||||
// Inject into request
|
||||
if req.Params == nil {
|
||||
req.Params = &ChatParameters{}
|
||||
}
|
||||
req.Params.Tools = append(req.Params.Tools, filteredTools...)
|
||||
|
||||
return req, nil
|
||||
}
|
||||
```
|
||||
|
||||
**MCP Performance Impact:**
|
||||
|
||||
- **Tool Discovery:** ~100-500μs (cached after first request)
|
||||
- **Tool Filtering:** ~50-200ns per tool
|
||||
- **Request Enhancement:** ~1-5μs depending on tool count
|
||||
|
||||
---
|
||||
|
||||
## Stage 5: Memory Pool Management
|
||||
|
||||
### **Object Pool Lifecycle**
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> PoolInit: System Startup
|
||||
PoolInit --> Available: Objects Pre-allocated
|
||||
|
||||
Available --> Acquired: Request Processing
|
||||
Acquired --> InUse: Object Populated
|
||||
InUse --> Processing: Worker Processing
|
||||
Processing --> Completed: Processing Done
|
||||
Completed --> Reset: Object Cleanup
|
||||
Reset --> Available: Return to Pool
|
||||
|
||||
Available --> Expansion: Pool Exhaustion
|
||||
Expansion --> Available: New Objects Created
|
||||
|
||||
Reset --> GC: Pool Full
|
||||
GC --> [*]: Garbage Collection
|
||||
```
|
||||
|
||||
**Memory Pool Implementation:**
|
||||
|
||||
```go
|
||||
type MemoryPools struct {
|
||||
channelPool sync.Pool
|
||||
messagePool sync.Pool
|
||||
responsePool sync.Pool
|
||||
bufferPool sync.Pool
|
||||
}
|
||||
|
||||
func (mp *MemoryPools) GetChannel() *ProcessingChannel {
|
||||
if ch := mp.channelPool.Get(); ch != nil {
|
||||
return ch.(*ProcessingChannel)
|
||||
}
|
||||
return NewProcessingChannel()
|
||||
}
|
||||
|
||||
func (mp *MemoryPools) ReturnChannel(ch *ProcessingChannel) {
|
||||
ch.Reset() // Clear previous data
|
||||
mp.channelPool.Put(ch)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 6: Worker Pool Processing
|
||||
|
||||
### **Worker Assignment & Execution**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Queue
|
||||
participant WorkerPool
|
||||
participant Worker
|
||||
participant Provider
|
||||
participant Circuit
|
||||
|
||||
Queue->>WorkerPool: Enqueue Request
|
||||
WorkerPool->>Worker: Assign Available Worker
|
||||
Worker->>Circuit: Check Circuit Breaker
|
||||
Circuit->>Provider: Forward Request
|
||||
|
||||
Provider-->>Circuit: Response/Error
|
||||
Circuit->>Circuit: Update Health Metrics
|
||||
Circuit-->>Worker: Provider Response
|
||||
Worker-->>WorkerPool: Release Worker
|
||||
WorkerPool-->>Queue: Request Completed
|
||||
```
|
||||
|
||||
**Worker Pool Architecture:**
|
||||
|
||||
```go
|
||||
type ProviderWorkerPool struct {
|
||||
workers chan *Worker
|
||||
queue chan *ProcessingJob
|
||||
config WorkerPoolConfig
|
||||
metrics *PoolMetrics
|
||||
}
|
||||
|
||||
func (pwp *ProviderWorkerPool) ProcessRequest(job *ProcessingJob) {
|
||||
// Get worker from pool
|
||||
worker := <-pwp.workers
|
||||
|
||||
go func() {
|
||||
defer func() {
|
||||
// Return worker to pool
|
||||
pwp.workers <- worker
|
||||
}()
|
||||
|
||||
// Process request
|
||||
result := worker.Execute(job)
|
||||
job.ResultChan <- result
|
||||
}()
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 7: Provider API Communication
|
||||
|
||||
### **HTTP Request Execution**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Worker
|
||||
participant HTTPClient
|
||||
participant Provider
|
||||
participant CircuitBreaker
|
||||
participant Metrics
|
||||
|
||||
Worker->>HTTPClient: PrepareRequest()
|
||||
HTTPClient->>HTTPClient: Add Headers & Auth
|
||||
HTTPClient->>CircuitBreaker: CheckHealth()
|
||||
CircuitBreaker->>Provider: HTTP Request
|
||||
|
||||
Provider-->>CircuitBreaker: HTTP Response
|
||||
CircuitBreaker->>Metrics: Record Metrics
|
||||
CircuitBreaker-->>HTTPClient: Response/Error
|
||||
HTTPClient-->>Worker: Parsed Response
|
||||
```
|
||||
|
||||
**Request Preparation Pipeline:**
|
||||
|
||||
```go
|
||||
func (w *ProviderWorker) ExecuteRequest(job *ProcessingJob) *ProviderResponse {
|
||||
// Prepare HTTP request
|
||||
httpReq := w.prepareHTTPRequest(job.Request)
|
||||
|
||||
// Add authentication
|
||||
w.addAuthentication(httpReq, job.APIKey)
|
||||
|
||||
// Execute with timeout
|
||||
ctx, cancel := context.WithTimeout(context.Background(), job.Timeout)
|
||||
defer cancel()
|
||||
|
||||
httpResp, err := w.httpClient.Do(httpReq.WithContext(ctx))
|
||||
if err != nil {
|
||||
return w.handleError(err, job)
|
||||
}
|
||||
|
||||
// Parse response
|
||||
return w.parseResponse(httpResp, job)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 8: Tool Execution & Response Processing
|
||||
|
||||
### **MCP Tool Execution Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Provider
|
||||
participant MCPProcessor
|
||||
participant MCPServer
|
||||
participant ToolExecutor
|
||||
participant ResponseBuilder
|
||||
|
||||
Provider->>MCPProcessor: Response with Tool Calls
|
||||
MCPProcessor->>MCPProcessor: Extract Tool Calls
|
||||
|
||||
loop For each tool call
|
||||
MCPProcessor->>MCPServer: Execute Tool
|
||||
MCPServer->>ToolExecutor: Tool Invocation
|
||||
ToolExecutor-->>MCPServer: Tool Result
|
||||
MCPServer-->>MCPProcessor: Tool Response
|
||||
end
|
||||
|
||||
MCPProcessor->>ResponseBuilder: Combine Results
|
||||
ResponseBuilder-->>Provider: Enhanced Response
|
||||
```
|
||||
|
||||
**Tool Execution Pipeline:**
|
||||
|
||||
```go
|
||||
func (mcp *MCPProcessor) ProcessToolCalls(
|
||||
response *ProviderResponse,
|
||||
) (*ProviderResponse, error) {
|
||||
toolCalls := mcp.extractToolCalls(response)
|
||||
if len(toolCalls) == 0 {
|
||||
return response, nil
|
||||
}
|
||||
|
||||
// Execute tools concurrently
|
||||
results := make(chan ToolResult, len(toolCalls))
|
||||
for _, toolCall := range toolCalls {
|
||||
go func(tc ToolCall) {
|
||||
result := mcp.executeTool(tc)
|
||||
results <- result
|
||||
}(toolCall)
|
||||
}
|
||||
|
||||
// Collect results
|
||||
toolResults := make([]ToolResult, 0, len(toolCalls))
|
||||
for i := 0; i < len(toolCalls); i++ {
|
||||
toolResults = append(toolResults, <-results)
|
||||
}
|
||||
|
||||
// Enhance response
|
||||
return mcp.enhanceResponse(response, toolResults), nil
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 9: Post-Processing & Response Formation
|
||||
|
||||
### **Plugin Post-Processing**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CoreResponse
|
||||
participant LoggingPlugin
|
||||
participant CachePlugin
|
||||
participant MetricsPlugin
|
||||
participant Transport
|
||||
|
||||
CoreResponse->>LoggingPlugin: ProcessResponse()
|
||||
LoggingPlugin->>LoggingPlugin: Log Request/Response
|
||||
LoggingPlugin->>CachePlugin: Response + Logs
|
||||
|
||||
CachePlugin->>CachePlugin: Cache Response
|
||||
CachePlugin->>MetricsPlugin: Cached Response
|
||||
|
||||
MetricsPlugin->>MetricsPlugin: Record Metrics
|
||||
MetricsPlugin->>Transport: Final Response
|
||||
```
|
||||
|
||||
**Response Enhancement Pipeline:**
|
||||
|
||||
```go
|
||||
func (pm *PluginManager) ExecutePostHooks(
|
||||
ctx BifrostContext,
|
||||
req *BifrostRequest,
|
||||
resp *BifrostResponse,
|
||||
) (*BifrostResponse, error) {
|
||||
for _, plugin := range pm.plugins {
|
||||
enhancedResp, err := plugin.ProcessResponse(ctx, req, resp)
|
||||
if err != nil {
|
||||
// Log error but continue processing
|
||||
pm.logger.Warn("Plugin post-processing error", "plugin", plugin.Name(), "error", err)
|
||||
continue
|
||||
}
|
||||
resp = enhancedResp
|
||||
}
|
||||
return resp, nil
|
||||
}
|
||||
```
|
||||
|
||||
### **Response Serialization**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Response[BifrostResponse] --> Format{Response Format}
|
||||
Format -->|HTTP| JSONSerialize[JSON Serialization]
|
||||
Format -->|SDK| DirectReturn[Direct Go Struct]
|
||||
|
||||
JSONSerialize --> Compress[Compression]
|
||||
DirectReturn --> TypeCheck[Type Validation]
|
||||
|
||||
Compress --> Headers[Set Headers]
|
||||
TypeCheck --> Return[Return Response]
|
||||
|
||||
Headers --> HTTPResponse[HTTP Response]
|
||||
HTTPResponse --> Client[Client Response]
|
||||
Return --> Client
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Architecture Documentation
|
||||
|
||||
- **[Concurrency Model](./concurrency)** - Worker pools and threading details
|
||||
- **[Plugin System](./plugins)** - Plugin execution and lifecycle
|
||||
- **[MCP System](./mcp)** - Tool discovery and execution internals
|
||||
- **[Benchmarks](../../benchmarking/getting-started)** - Detailed performance analysis
|
||||
Reference in New Issue
Block a user