first commit
This commit is contained in:
527
docs/architecture/core/request-flow.mdx
Normal file
527
docs/architecture/core/request-flow.mdx
Normal file
@@ -0,0 +1,527 @@
|
||||
---
|
||||
title: "Request Flow"
|
||||
description: "Deep dive into Bifrost's request processing pipeline - from transport layer ingestion through provider execution to response delivery."
|
||||
icon: "route"
|
||||
---
|
||||
|
||||
## Stage 1: Transport Layer Processing
|
||||
|
||||
### **HTTP Transport Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
participant HTTPTransport
|
||||
participant Router
|
||||
participant Validation
|
||||
|
||||
Client->>HTTPTransport: POST /v1/chat/completions
|
||||
HTTPTransport->>HTTPTransport: Parse Headers
|
||||
HTTPTransport->>HTTPTransport: Extract Body
|
||||
HTTPTransport->>Validation: Validate JSON Schema
|
||||
Validation->>Router: BifrostRequest
|
||||
Router-->>HTTPTransport: Processing Started
|
||||
HTTPTransport-->>Client: HTTP 200 (async processing)
|
||||
```
|
||||
|
||||
**Key Processing Steps:**
|
||||
|
||||
1. **Request Reception** - FastHTTP server receives request
|
||||
2. **Header Processing** - Extract authentication, content-type, custom headers
|
||||
3. **Body Parsing** - JSON unmarshaling with schema validation
|
||||
4. **Request Transformation** - Convert to internal `BifrostRequest` schema
|
||||
5. **Context Creation** - Build request context with metadata
|
||||
|
||||
**Performance Characteristics:**
|
||||
|
||||
- **Parsing Time:** ~2.1μs for typical requests
|
||||
- **Validation Overhead:** ~400ns for schema checks
|
||||
- **Memory Allocation:** Zero-copy where possible
|
||||
|
||||
### **Go SDK Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Application
|
||||
participant SDK
|
||||
participant Core
|
||||
participant Validation
|
||||
|
||||
Application->>SDK: bifrost.ChatCompletion(req)
|
||||
SDK->>SDK: Type Validation
|
||||
SDK->>Core: Direct Function Call
|
||||
Core->>Validation: Schema Validation
|
||||
Validation-->>Core: Validated Request
|
||||
Core-->>SDK: Processing Result
|
||||
SDK-->>Application: Typed Response
|
||||
```
|
||||
|
||||
**Advantages:**
|
||||
|
||||
- **Zero Serialization** - Direct Go struct passing
|
||||
- **Type Safety** - Compile-time validation
|
||||
- **Lower Latency** - No HTTP/JSON overhead
|
||||
- **Memory Efficiency** - No intermediate allocations
|
||||
|
||||
---
|
||||
|
||||
## Stage 2: Request Routing & Load Balancing
|
||||
|
||||
### **Provider Selection Logic**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Request[Incoming Request] --> ModelCheck{Model Available?}
|
||||
ModelCheck -->|Yes| ProviderDirect[Use Specified Provider]
|
||||
ModelCheck -->|No| ModelMapping[Model → Provider Mapping]
|
||||
|
||||
ProviderDirect --> KeyPool[API Key Pool]
|
||||
ModelMapping --> KeyPool
|
||||
|
||||
KeyPool --> WeightedSelect[Weighted Random Selection]
|
||||
WeightedSelect --> HealthCheck{Provider Healthy?}
|
||||
|
||||
HealthCheck -->|Yes| AssignWorker[Assign Worker]
|
||||
HealthCheck -->|No| CircuitBreaker[Circuit Breaker]
|
||||
|
||||
CircuitBreaker --> FallbackCheck{Fallback Available?}
|
||||
FallbackCheck -->|Yes| FallbackProvider[Try Fallback]
|
||||
FallbackCheck -->|No| ErrorResponse[Return Error]
|
||||
|
||||
FallbackProvider --> KeyPool
|
||||
```
|
||||
|
||||
**Key Selection Algorithm:**
|
||||
|
||||
```go
|
||||
// Weighted random key selection
|
||||
type KeySelector struct {
|
||||
keys []APIKey
|
||||
weights []float64
|
||||
total float64
|
||||
}
|
||||
|
||||
func (ks *KeySelector) SelectKey() *APIKey {
|
||||
r := rand.Float64() * ks.total
|
||||
cumulative := 0.0
|
||||
|
||||
for i, weight := range ks.weights {
|
||||
cumulative += weight
|
||||
if r <= cumulative {
|
||||
return &ks.keys[i]
|
||||
}
|
||||
}
|
||||
return &ks.keys[len(ks.keys)-1]
|
||||
}
|
||||
```
|
||||
|
||||
**Performance Metrics:**
|
||||
|
||||
- **Key Selection Time:** ~10ns (constant time)
|
||||
- **Health Check Overhead:** ~50ns (cached results)
|
||||
- **Fallback Decision:** ~25ns (configuration lookup)
|
||||
|
||||
---
|
||||
|
||||
## Stage 3: Plugin Pipeline Processing
|
||||
|
||||
### **Pre-Processing Hooks**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Request
|
||||
participant AuthPlugin
|
||||
participant RateLimitPlugin
|
||||
participant TransformPlugin
|
||||
participant Core
|
||||
|
||||
Request->>AuthPlugin: ProcessRequest()
|
||||
AuthPlugin->>AuthPlugin: Validate API Key
|
||||
AuthPlugin->>RateLimitPlugin: Authorized Request
|
||||
|
||||
RateLimitPlugin->>RateLimitPlugin: Check Rate Limits
|
||||
RateLimitPlugin->>TransformPlugin: Allowed Request
|
||||
|
||||
TransformPlugin->>TransformPlugin: Modify Request
|
||||
TransformPlugin->>Core: Final Request
|
||||
```
|
||||
|
||||
**Plugin Execution Model:**
|
||||
|
||||
```go
|
||||
type PluginManager struct {
|
||||
plugins []Plugin
|
||||
}
|
||||
|
||||
func (pm *PluginManager) ExecutePreHooks(
|
||||
ctx BifrostContext,
|
||||
req *BifrostRequest,
|
||||
) (*BifrostRequest, *BifrostError) {
|
||||
for _, plugin := range pm.plugins {
|
||||
modifiedReq, err := plugin.ProcessRequest(ctx, req)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
req = modifiedReq
|
||||
}
|
||||
return req, nil
|
||||
}
|
||||
```
|
||||
|
||||
**Plugin Types & Performance:**
|
||||
|
||||
| Plugin Type | Processing Time | Memory Impact | Failure Mode |
|
||||
| --------------------- | --------------- | ------------- | ---------------------- |
|
||||
| **Authentication** | ~1-5μs | Minimal | Reject request |
|
||||
| **Rate Limiting** | ~500ns | Cache-based | Throttle/reject |
|
||||
| **Request Transform** | ~2-10μs | Copy-on-write | Continue with original |
|
||||
| **Monitoring** | ~200ns | Append-only | Continue silently |
|
||||
|
||||
---
|
||||
|
||||
## Stage 4: MCP Tool Discovery & Integration
|
||||
|
||||
### **Tool Discovery Process**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Request[Request with Model] --> MCPCheck{MCP Enabled?}
|
||||
MCPCheck -->|No| SkipMCP[Skip MCP Processing]
|
||||
MCPCheck -->|Yes| ClientLookup[MCP Client Lookup]
|
||||
|
||||
ClientLookup --> ToolFilter[Tool Filtering]
|
||||
ToolFilter --> ToolInject[Inject Tools into Request]
|
||||
|
||||
ToolFilter --> IncludeCheck{Include Filter?}
|
||||
ToolFilter --> ExcludeCheck{Exclude Filter?}
|
||||
|
||||
IncludeCheck -->|Yes| IncludeTools[Include Specified Tools]
|
||||
IncludeCheck -->|No| AllTools[Include All Tools]
|
||||
|
||||
ExcludeCheck -->|Yes| RemoveTools[Remove Excluded Tools]
|
||||
ExcludeCheck -->|No| KeepFiltered[Keep Filtered Tools]
|
||||
|
||||
IncludeTools --> ToolInject
|
||||
AllTools --> ToolInject
|
||||
RemoveTools --> ToolInject
|
||||
KeepFiltered --> ToolInject
|
||||
|
||||
ToolInject --> EnhancedRequest[Request with Tools]
|
||||
SkipMCP --> EnhancedRequest
|
||||
```
|
||||
|
||||
**Tool Integration Algorithm:**
|
||||
|
||||
```go
|
||||
func (mcpm *MCPManager) EnhanceRequest(
|
||||
ctx BifrostContext,
|
||||
req *BifrostChatRequest,
|
||||
) (*BifrostRequest, error) {
|
||||
// Extract tool filtering from context
|
||||
includeClients := ctx.GetStringSlice("mcp-include-clients")
|
||||
includeTools := ctx.GetStringSlice("mcp-include-tools")
|
||||
|
||||
// Get available tools
|
||||
availableTools := mcpm.getAvailableTools(includeClients)
|
||||
|
||||
// Filter tools
|
||||
filteredTools := mcpm.filterTools(availableTools, includeTools)
|
||||
|
||||
// Inject into request
|
||||
if req.Params == nil {
|
||||
req.Params = &ChatParameters{}
|
||||
}
|
||||
req.Params.Tools = append(req.Params.Tools, filteredTools...)
|
||||
|
||||
return req, nil
|
||||
}
|
||||
```
|
||||
|
||||
**MCP Performance Impact:**
|
||||
|
||||
- **Tool Discovery:** ~100-500μs (cached after first request)
|
||||
- **Tool Filtering:** ~50-200ns per tool
|
||||
- **Request Enhancement:** ~1-5μs depending on tool count
|
||||
|
||||
---
|
||||
|
||||
## Stage 5: Memory Pool Management
|
||||
|
||||
### **Object Pool Lifecycle**
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> PoolInit: System Startup
|
||||
PoolInit --> Available: Objects Pre-allocated
|
||||
|
||||
Available --> Acquired: Request Processing
|
||||
Acquired --> InUse: Object Populated
|
||||
InUse --> Processing: Worker Processing
|
||||
Processing --> Completed: Processing Done
|
||||
Completed --> Reset: Object Cleanup
|
||||
Reset --> Available: Return to Pool
|
||||
|
||||
Available --> Expansion: Pool Exhaustion
|
||||
Expansion --> Available: New Objects Created
|
||||
|
||||
Reset --> GC: Pool Full
|
||||
GC --> [*]: Garbage Collection
|
||||
```
|
||||
|
||||
**Memory Pool Implementation:**
|
||||
|
||||
```go
|
||||
type MemoryPools struct {
|
||||
channelPool sync.Pool
|
||||
messagePool sync.Pool
|
||||
responsePool sync.Pool
|
||||
bufferPool sync.Pool
|
||||
}
|
||||
|
||||
func (mp *MemoryPools) GetChannel() *ProcessingChannel {
|
||||
if ch := mp.channelPool.Get(); ch != nil {
|
||||
return ch.(*ProcessingChannel)
|
||||
}
|
||||
return NewProcessingChannel()
|
||||
}
|
||||
|
||||
func (mp *MemoryPools) ReturnChannel(ch *ProcessingChannel) {
|
||||
ch.Reset() // Clear previous data
|
||||
mp.channelPool.Put(ch)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 6: Worker Pool Processing
|
||||
|
||||
### **Worker Assignment & Execution**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Queue
|
||||
participant WorkerPool
|
||||
participant Worker
|
||||
participant Provider
|
||||
participant Circuit
|
||||
|
||||
Queue->>WorkerPool: Enqueue Request
|
||||
WorkerPool->>Worker: Assign Available Worker
|
||||
Worker->>Circuit: Check Circuit Breaker
|
||||
Circuit->>Provider: Forward Request
|
||||
|
||||
Provider-->>Circuit: Response/Error
|
||||
Circuit->>Circuit: Update Health Metrics
|
||||
Circuit-->>Worker: Provider Response
|
||||
Worker-->>WorkerPool: Release Worker
|
||||
WorkerPool-->>Queue: Request Completed
|
||||
```
|
||||
|
||||
**Worker Pool Architecture:**
|
||||
|
||||
```go
|
||||
type ProviderWorkerPool struct {
|
||||
workers chan *Worker
|
||||
queue chan *ProcessingJob
|
||||
config WorkerPoolConfig
|
||||
metrics *PoolMetrics
|
||||
}
|
||||
|
||||
func (pwp *ProviderWorkerPool) ProcessRequest(job *ProcessingJob) {
|
||||
// Get worker from pool
|
||||
worker := <-pwp.workers
|
||||
|
||||
go func() {
|
||||
defer func() {
|
||||
// Return worker to pool
|
||||
pwp.workers <- worker
|
||||
}()
|
||||
|
||||
// Process request
|
||||
result := worker.Execute(job)
|
||||
job.ResultChan <- result
|
||||
}()
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 7: Provider API Communication
|
||||
|
||||
### **HTTP Request Execution**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Worker
|
||||
participant HTTPClient
|
||||
participant Provider
|
||||
participant CircuitBreaker
|
||||
participant Metrics
|
||||
|
||||
Worker->>HTTPClient: PrepareRequest()
|
||||
HTTPClient->>HTTPClient: Add Headers & Auth
|
||||
HTTPClient->>CircuitBreaker: CheckHealth()
|
||||
CircuitBreaker->>Provider: HTTP Request
|
||||
|
||||
Provider-->>CircuitBreaker: HTTP Response
|
||||
CircuitBreaker->>Metrics: Record Metrics
|
||||
CircuitBreaker-->>HTTPClient: Response/Error
|
||||
HTTPClient-->>Worker: Parsed Response
|
||||
```
|
||||
|
||||
**Request Preparation Pipeline:**
|
||||
|
||||
```go
|
||||
func (w *ProviderWorker) ExecuteRequest(job *ProcessingJob) *ProviderResponse {
|
||||
// Prepare HTTP request
|
||||
httpReq := w.prepareHTTPRequest(job.Request)
|
||||
|
||||
// Add authentication
|
||||
w.addAuthentication(httpReq, job.APIKey)
|
||||
|
||||
// Execute with timeout
|
||||
ctx, cancel := context.WithTimeout(context.Background(), job.Timeout)
|
||||
defer cancel()
|
||||
|
||||
httpResp, err := w.httpClient.Do(httpReq.WithContext(ctx))
|
||||
if err != nil {
|
||||
return w.handleError(err, job)
|
||||
}
|
||||
|
||||
// Parse response
|
||||
return w.parseResponse(httpResp, job)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 8: Tool Execution & Response Processing
|
||||
|
||||
### **MCP Tool Execution Flow**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Provider
|
||||
participant MCPProcessor
|
||||
participant MCPServer
|
||||
participant ToolExecutor
|
||||
participant ResponseBuilder
|
||||
|
||||
Provider->>MCPProcessor: Response with Tool Calls
|
||||
MCPProcessor->>MCPProcessor: Extract Tool Calls
|
||||
|
||||
loop For each tool call
|
||||
MCPProcessor->>MCPServer: Execute Tool
|
||||
MCPServer->>ToolExecutor: Tool Invocation
|
||||
ToolExecutor-->>MCPServer: Tool Result
|
||||
MCPServer-->>MCPProcessor: Tool Response
|
||||
end
|
||||
|
||||
MCPProcessor->>ResponseBuilder: Combine Results
|
||||
ResponseBuilder-->>Provider: Enhanced Response
|
||||
```
|
||||
|
||||
**Tool Execution Pipeline:**
|
||||
|
||||
```go
|
||||
func (mcp *MCPProcessor) ProcessToolCalls(
|
||||
response *ProviderResponse,
|
||||
) (*ProviderResponse, error) {
|
||||
toolCalls := mcp.extractToolCalls(response)
|
||||
if len(toolCalls) == 0 {
|
||||
return response, nil
|
||||
}
|
||||
|
||||
// Execute tools concurrently
|
||||
results := make(chan ToolResult, len(toolCalls))
|
||||
for _, toolCall := range toolCalls {
|
||||
go func(tc ToolCall) {
|
||||
result := mcp.executeTool(tc)
|
||||
results <- result
|
||||
}(toolCall)
|
||||
}
|
||||
|
||||
// Collect results
|
||||
toolResults := make([]ToolResult, 0, len(toolCalls))
|
||||
for i := 0; i < len(toolCalls); i++ {
|
||||
toolResults = append(toolResults, <-results)
|
||||
}
|
||||
|
||||
// Enhance response
|
||||
return mcp.enhanceResponse(response, toolResults), nil
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage 9: Post-Processing & Response Formation
|
||||
|
||||
### **Plugin Post-Processing**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CoreResponse
|
||||
participant LoggingPlugin
|
||||
participant CachePlugin
|
||||
participant MetricsPlugin
|
||||
participant Transport
|
||||
|
||||
CoreResponse->>LoggingPlugin: ProcessResponse()
|
||||
LoggingPlugin->>LoggingPlugin: Log Request/Response
|
||||
LoggingPlugin->>CachePlugin: Response + Logs
|
||||
|
||||
CachePlugin->>CachePlugin: Cache Response
|
||||
CachePlugin->>MetricsPlugin: Cached Response
|
||||
|
||||
MetricsPlugin->>MetricsPlugin: Record Metrics
|
||||
MetricsPlugin->>Transport: Final Response
|
||||
```
|
||||
|
||||
**Response Enhancement Pipeline:**
|
||||
|
||||
```go
|
||||
func (pm *PluginManager) ExecutePostHooks(
|
||||
ctx BifrostContext,
|
||||
req *BifrostRequest,
|
||||
resp *BifrostResponse,
|
||||
) (*BifrostResponse, error) {
|
||||
for _, plugin := range pm.plugins {
|
||||
enhancedResp, err := plugin.ProcessResponse(ctx, req, resp)
|
||||
if err != nil {
|
||||
// Log error but continue processing
|
||||
pm.logger.Warn("Plugin post-processing error", "plugin", plugin.Name(), "error", err)
|
||||
continue
|
||||
}
|
||||
resp = enhancedResp
|
||||
}
|
||||
return resp, nil
|
||||
}
|
||||
```
|
||||
|
||||
### **Response Serialization**
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Response[BifrostResponse] --> Format{Response Format}
|
||||
Format -->|HTTP| JSONSerialize[JSON Serialization]
|
||||
Format -->|SDK| DirectReturn[Direct Go Struct]
|
||||
|
||||
JSONSerialize --> Compress[Compression]
|
||||
DirectReturn --> TypeCheck[Type Validation]
|
||||
|
||||
Compress --> Headers[Set Headers]
|
||||
TypeCheck --> Return[Return Response]
|
||||
|
||||
Headers --> HTTPResponse[HTTP Response]
|
||||
HTTPResponse --> Client[Client Response]
|
||||
Return --> Client
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Architecture Documentation
|
||||
|
||||
- **[Concurrency Model](./concurrency)** - Worker pools and threading details
|
||||
- **[Plugin System](./plugins)** - Plugin execution and lifecycle
|
||||
- **[MCP System](./mcp)** - Tool discovery and execution internals
|
||||
- **[Benchmarks](../../benchmarking/getting-started)** - Detailed performance analysis
|
||||
Reference in New Issue
Block a user