first commit
This commit is contained in:
81
docs/benchmarking/getting-started.mdx
Normal file
81
docs/benchmarking/getting-started.mdx
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
title: "Getting Started"
|
||||
description: "Introduction to Bifrost's performance capabilities and how to choose the right instance size for your workload."
|
||||
icon: "rocket"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Bifrost has been rigorously tested under high load conditions to ensure optimal performance for production deployments. Our benchmark tests demonstrate exceptional performance characteristics at **5,000 requests per second (RPS)** across different AWS EC2 instance types.
|
||||
|
||||
**Key Performance Highlights:**
|
||||
- **Perfect Success Rate**: 100% request success rate under high load
|
||||
- **Minimal Overhead**: Less than 15µs added latency per request on average
|
||||
- **Efficient Queue Management**: Sub-microsecond queue wait times on optimized instances
|
||||
- **Fast Key Selection**: Near-instantaneous weighted API key selection (~10 ns)
|
||||
|
||||
---
|
||||
|
||||
## Test Environment Summary
|
||||
|
||||
Bifrost was benchmarked on two primary AWS EC2 instance configurations:
|
||||
|
||||
### **t3.medium (2 vCPUs, 4GB RAM)**
|
||||
- **Buffer Size**: 15,000
|
||||
- **Initial Pool Size**: 10,000
|
||||
- **Use Case**: Cost-effective option for moderate workloads
|
||||
|
||||
### **t3.xlarge (4 vCPUs, 16GB RAM)**
|
||||
- **Buffer Size**: 20,000
|
||||
- **Initial Pool Size**: 15,000
|
||||
- **Use Case**: High-performance option for demanding workloads
|
||||
|
||||
---
|
||||
|
||||
## Performance Comparison at a Glance
|
||||
|
||||
| Metric | t3.medium | t3.xlarge | Improvement |
|
||||
|--------|-----------|-----------|-------------|
|
||||
| **Success Rate @ 5k RPS** | 100% | 100% | No failed requests |
|
||||
| **Bifrost Overhead** | 59 µs | 11 µs | **-81%** |
|
||||
| **Average Latency** | 2.12s | 1.61s | **-24%** |
|
||||
| **Queue Wait Time** | 47.13 µs | 1.67 µs | **-96%** |
|
||||
| **JSON Marshaling** | 63.47 µs | 26.80 µs | **-58%** |
|
||||
| **Response Parsing** | 11.30 ms | 2.11 ms | **-81%** |
|
||||
| **Peak Memory Usage** | 1,312.79 MB | 3,340.44 MB | +155% |
|
||||
|
||||
> **Note**: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB), yet still achieved better performance metrics.
|
||||
|
||||
<Note>
|
||||
All benchmarks are on mocked OpenAI calls, whose latency and payload size are mentioned in the respective analysis pages.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Configuration Flexibility
|
||||
|
||||
One of Bifrost's key strengths is its **configuration flexibility**. You can fine-tune the speed ↔ memory trade-off based on your specific requirements:
|
||||
|
||||
| Configuration Parameter | Effect |
|
||||
|------------------------|--------|
|
||||
| `initial_pool_size` | Higher values = faster performance, more memory usage |
|
||||
| `buffer_size` & `concurrency` | Controls queue depth and max parallel workers (per provider) |
|
||||
| `retry` & `timeout` | Tune aggressiveness for each provider to meet your SLOs |
|
||||
|
||||
**Configuration Philosophy:**
|
||||
- **Higher settings** (like t3.xlarge profile) prioritize raw speed
|
||||
- **Lower settings** (like t3.medium profile) optimize for memory efficiency
|
||||
- **Custom tuning** lets you find the sweet spot for your specific workload
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### **Detailed Performance Analysis**
|
||||
- **[t3.medium Performance](./t3.medium)** - Deep dive into cost-effective performance
|
||||
- **[t3.xlarge Performance](./t3.xl)** - High-performance configuration analysis
|
||||
|
||||
### **Run Your Own Tests**
|
||||
- **[Run Your Own Benchmarks](./run-your-own-benchmarks)** - Step-by-step guide to benchmark Bifrost in your environment
|
||||
|
||||
Ready to dive deeper? Choose your instance type above or learn how to run your own performance tests.
|
||||
355
docs/benchmarking/run-your-own-benchmarks.mdx
Normal file
355
docs/benchmarking/run-your-own-benchmarks.mdx
Normal file
@@ -0,0 +1,355 @@
|
||||
---
|
||||
title: "Run Your Own Benchmarks"
|
||||
description: "Step-by-step guide to benchmark Bifrost in your own environment using the official benchmarking tool."
|
||||
icon: "stopwatch"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Want to see Bifrost's performance in your specific environment? The [**Bifrost Benchmarking Repository**](https://github.com/maximhq/bifrost-benchmarking) provides everything you need to conduct comprehensive performance tests tailored to your infrastructure and workload requirements.
|
||||
|
||||
**What You Can Test:**
|
||||
- **Custom Instance Sizes** - Test on your preferred AWS/GCP/Azure instances
|
||||
- **Your Workload Patterns** - Use your actual request/response sizes
|
||||
- **Different Configurations** - Compare various Bifrost settings
|
||||
- **Provider Comparisons** - Benchmark against other AI gateways
|
||||
- **Load Scenarios** - Test burst loads, sustained traffic, and endurance
|
||||
|
||||
> **💡 Open Source**: The benchmarking tool is completely open source! Feel free to submit pull requests if you think anything is missing or could be improved.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before running benchmarks, ensure you have:
|
||||
|
||||
- **Go 1.26.1+** installed on your testing machine
|
||||
- **Bifrost instance** running and accessible
|
||||
- **Target API providers** configured (OpenAI, Anthropic, etc.)
|
||||
- **Network access** between benchmark tool and Bifrost
|
||||
- **Sufficient resources** on the testing machine to generate load
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### **1. Clone the Repository**
|
||||
|
||||
```bash
|
||||
git clone https://github.com/maximhq/bifrost-benchmarking.git
|
||||
cd bifrost-benchmarking
|
||||
```
|
||||
|
||||
### **2. Build the Benchmark Tool**
|
||||
|
||||
```bash
|
||||
go build benchmark.go
|
||||
```
|
||||
|
||||
This creates a `benchmark` executable (or `benchmark.exe` on Windows).
|
||||
|
||||
### **3. Run Your First Benchmark**
|
||||
|
||||
```bash
|
||||
# Basic benchmark: 500 RPS for 10 seconds
|
||||
./benchmark -provider bifrost -port 8080
|
||||
|
||||
# Custom benchmark: 1000 RPS for 30 seconds
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 30 -output my_results.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Options
|
||||
|
||||
The benchmark tool offers extensive configuration through command-line flags:
|
||||
|
||||
### **Basic Configuration**
|
||||
|
||||
| Flag | Required | Description | Default |
|
||||
|------|----------|-------------|---------|
|
||||
| `-provider <name>` | ✅ | Provider name (e.g., `bifrost`, `litellm`) | None |
|
||||
| `-port <number>` | ✅ | Port number of your Bifrost instance | None |
|
||||
| `-endpoint <path>` | ❌ | API endpoint path | `v1/chat/completions` |
|
||||
| `-rate <number>` | ❌ | Requests per second | `500` |
|
||||
| `-duration <seconds>` | ❌ | Test duration in seconds | `10` |
|
||||
| `-output <filename>` | ❌ | Results output file | `results.json` |
|
||||
|
||||
### **Advanced Configuration**
|
||||
|
||||
| Flag | Description | Default |
|
||||
|------|-------------|---------|
|
||||
| `-include-provider-in-request` | Include provider name in request payload | `false` |
|
||||
| `-big-payload` | Use larger, more complex request payloads | `false` |
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Scenarios
|
||||
|
||||
### **1. Basic Performance Test**
|
||||
|
||||
Test standard performance with typical request sizes:
|
||||
|
||||
```bash
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -output basic_test.json
|
||||
```
|
||||
|
||||
**Use Case**: General performance validation
|
||||
|
||||
### **2. High-Load Stress Test**
|
||||
|
||||
Push your instance to its limits:
|
||||
|
||||
```bash
|
||||
./benchmark -provider bifrost -port 8080 -rate 5000 -duration 120 -output stress_test.json
|
||||
```
|
||||
|
||||
**Use Case**: Capacity planning and SLA validation
|
||||
|
||||
### **3. Large Payload Test**
|
||||
|
||||
Test with bigger request/response sizes:
|
||||
|
||||
```bash
|
||||
./benchmark -provider bifrost -port 8080 -rate 500 -duration 60 -big-payload=true -output large_payload.json
|
||||
```
|
||||
|
||||
**Use Case**: Document processing, code generation workloads
|
||||
|
||||
### **4. Endurance Test**
|
||||
|
||||
Long-running stability test:
|
||||
|
||||
```bash
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 1800 -output endurance_test.json
|
||||
```
|
||||
|
||||
**Use Case**: Production readiness validation (30-minute test)
|
||||
|
||||
### **5. Comparative Benchmarking**
|
||||
|
||||
Compare Bifrost against other providers:
|
||||
|
||||
```bash
|
||||
# Test Bifrost
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -output bifrost_results.json
|
||||
|
||||
# Test LiteLLM
|
||||
./benchmark -provider litellm -port 8000 -rate 1000 -duration 60 -output litellm_results.json
|
||||
|
||||
# Test direct OpenAI (if available)
|
||||
./benchmark -provider openai -port 443 -endpoint chat/completions -rate 1000 -duration 60 -output openai_results.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Understanding Results
|
||||
|
||||
The benchmark tool generates detailed JSON results with comprehensive metrics:
|
||||
|
||||
### **Key Metrics Explained**
|
||||
|
||||
```json
|
||||
{
|
||||
"bifrost": {
|
||||
"request_counts": {
|
||||
"total_sent": 30000,
|
||||
"successful": 30000,
|
||||
"failed": 0
|
||||
},
|
||||
"success_rate": 100.0,
|
||||
"latency_metrics": {
|
||||
"mean_ms": 245.5,
|
||||
"p50_ms": 230.2,
|
||||
"p99_ms": 520.8,
|
||||
"max_ms": 845.3
|
||||
},
|
||||
"throughput_rps": 5000.0,
|
||||
"memory_usage": {
|
||||
"before_mb": 512.5,
|
||||
"after_mb": 1312.8,
|
||||
"peak_mb": 1405.2,
|
||||
"average_mb": 1156.7
|
||||
},
|
||||
"timestamp": "2025-01-14T10:30:00Z",
|
||||
"status_codes": {
|
||||
"200": 30000
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### **Critical Performance Indicators**
|
||||
|
||||
**Success Rate:**
|
||||
- **Target**: >99.9% for production readiness
|
||||
- **Excellent**: 100% (perfect reliability)
|
||||
|
||||
**Latency Metrics:**
|
||||
- **P50 (Median)**: Typical user experience
|
||||
- **P99**: Worst-case user experience
|
||||
- **Mean**: Overall average performance
|
||||
|
||||
**Memory Usage:**
|
||||
- **Peak**: Maximum memory consumption
|
||||
- **Average**: Sustained memory usage
|
||||
- **After - Before**: Memory growth during test
|
||||
|
||||
---
|
||||
|
||||
## Instance Sizing Recommendations
|
||||
|
||||
Based on your benchmark results, use these guidelines for production sizing:
|
||||
|
||||
### **Resource Planning Matrix**
|
||||
|
||||
| Target RPS | Memory Usage | Recommended Instance | Notes |
|
||||
|------------|--------------|---------------------|--------|
|
||||
| **< 1,000** | < 1GB | t3.small | Cost-effective for light loads |
|
||||
| **1,000 - 3,000** | 1-2GB | t3.medium | Balanced performance/cost |
|
||||
| **3,000 - 5,000** | 2-4GB | t3.large | High-performance production |
|
||||
| **5,000+** | 3-6GB | t3.xlarge+ | Enterprise/mission-critical |
|
||||
|
||||
### **Configuration Tuning Based on Results**
|
||||
|
||||
**If seeing high latency:**
|
||||
- Increase `initial_pool_size`
|
||||
- Increase `buffer_size`
|
||||
- Consider larger instance
|
||||
|
||||
**If memory usage is high:**
|
||||
- Decrease `initial_pool_size`
|
||||
- Optimize `buffer_size`
|
||||
- Monitor for memory leaks
|
||||
|
||||
**If success rate < 100%:**
|
||||
- Reduce request rate
|
||||
- Increase timeout settings
|
||||
- Check provider limits
|
||||
|
||||
---
|
||||
|
||||
## Advanced Testing Scenarios
|
||||
|
||||
### **Burst Load Testing**
|
||||
|
||||
Simulate traffic spikes:
|
||||
|
||||
```bash
|
||||
# Normal load
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 300 -output normal_load.json
|
||||
|
||||
# Burst load (simulate 5x spike)
|
||||
./benchmark -provider bifrost -port 8080 -rate 5000 -duration 60 -output burst_load.json
|
||||
```
|
||||
|
||||
### **Multi-Instance Testing**
|
||||
|
||||
Test horizontal scaling:
|
||||
|
||||
```bash
|
||||
# Instance 1
|
||||
./benchmark -provider bifrost-1 -port 8080 -rate 2500 -duration 120 -output instance_1.json &
|
||||
|
||||
# Instance 2
|
||||
./benchmark -provider bifrost-2 -port 8081 -rate 2500 -duration 120 -output instance_2.json &
|
||||
|
||||
# Wait for both to complete
|
||||
wait
|
||||
```
|
||||
|
||||
### **Different Payload Sizes**
|
||||
|
||||
Compare performance across payload sizes:
|
||||
|
||||
```bash
|
||||
# Small payloads (default)
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -output small_payload.json
|
||||
|
||||
# Large payloads
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 60 -big-payload=true -output large_payload.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Continuous Benchmarking
|
||||
|
||||
### **Automated Testing Pipeline**
|
||||
|
||||
Set up regular performance regression testing:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# daily_benchmark.sh
|
||||
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
OUTPUT_DIR="benchmarks/$DATE"
|
||||
mkdir -p $OUTPUT_DIR
|
||||
|
||||
# Run standard benchmarks
|
||||
./benchmark -provider bifrost -port 8080 -rate 1000 -duration 300 -output "$OUTPUT_DIR/standard.json"
|
||||
./benchmark -provider bifrost -port 8080 -rate 3000 -duration 180 -output "$OUTPUT_DIR/high_load.json"
|
||||
./benchmark -provider bifrost -port 8080 -rate 500 -duration 600 -big-payload=true -output "$OUTPUT_DIR/large_payload.json"
|
||||
|
||||
echo "Benchmarks completed: $OUTPUT_DIR"
|
||||
```
|
||||
|
||||
### **Performance Monitoring Integration**
|
||||
|
||||
Monitor key metrics over time:
|
||||
- **Success rate trends**
|
||||
- **Latency percentile changes**
|
||||
- **Memory usage patterns**
|
||||
- **Throughput capacity**
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### **Common Issues**
|
||||
|
||||
**Connection Refused:**
|
||||
```bash
|
||||
# Check if Bifrost is running
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Verify port configuration
|
||||
netstat -an | grep 8080
|
||||
```
|
||||
- Check PORT is defined in `.env` file at root.
|
||||
|
||||
**High Error Rates:**
|
||||
- Check provider API key limits
|
||||
- Verify Bifrost configuration
|
||||
- Monitor upstream provider status
|
||||
- Reduce request rate for baseline test
|
||||
|
||||
**Memory Issues:**
|
||||
- Monitor system resources during testing
|
||||
- Check for memory leaks in long tests
|
||||
- Adjust Bifrost pool sizes
|
||||
|
||||
**Inconsistent Results:**
|
||||
- Run multiple test iterations
|
||||
- Account for network variability
|
||||
- Use longer test durations (60+ seconds)
|
||||
- Isolate testing environment
|
||||
- Try hitting gateway requests to a Mock provider
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### **After Running Benchmarks**
|
||||
|
||||
1. **Analyze Results**: Compare against [official benchmarks](./getting-started)
|
||||
2. **Optimize Configuration**: Tune based on your specific results
|
||||
3. **Plan Capacity**: Size instances based on measured performance
|
||||
4. **Set Up Monitoring**: Track key metrics in production
|
||||
|
||||
### **Compare Results**
|
||||
|
||||
- **[t3.medium Performance](./t3.medium)** - Compare against medium instance results
|
||||
- **[t3.xlarge Performance](./t3.xl)** - Compare against high-performance configuration
|
||||
|
||||
**Ready to benchmark? Clone the [repository](https://github.com/maximhq/bifrost-benchmarking) and start testing!**
|
||||
127
docs/benchmarking/t3.medium.mdx
Normal file
127
docs/benchmarking/t3.medium.mdx
Normal file
@@ -0,0 +1,127 @@
|
||||
---
|
||||
title: "t3.medium"
|
||||
description: "Detailed performance metrics and analysis for Bifrost running on AWS t3.medium instances (2 vCPUs, 4GB RAM)."
|
||||
icon: "server"
|
||||
---
|
||||
|
||||
## Instance Configuration
|
||||
|
||||
**AWS t3.medium Specifications:**
|
||||
- **vCPUs**: 2
|
||||
- **Memory**: 4GB RAM
|
||||
- **Network Performance**: Up to 5 Gigabit
|
||||
|
||||
**Bifrost Configuration:**
|
||||
- **Buffer Size**: 15,000
|
||||
- **Initial Pool Size**: 10,000
|
||||
- **Test Load**: 5,000 requests per second (RPS)
|
||||
|
||||
---
|
||||
|
||||
## Performance Results
|
||||
|
||||
### **Overall Performance Metrics**
|
||||
|
||||
| Metric | Value | Notes |
|
||||
|--------|-------|--------|
|
||||
| **Success Rate** | 100.00% | Perfect reliability under high load |
|
||||
| **Average Request Size** | 0.13 KB | Lightweight request payload |
|
||||
| **Average Response Size** | 1.37 KB | Standard response size for testing |
|
||||
| **Average Latency** | 2.12s | Total end-to-end response time |
|
||||
| **Peak Memory Usage** | 1,312.79 MB | ~33% of available 4GB RAM |
|
||||
|
||||
### **Detailed Performance Breakdown**
|
||||
|
||||
| Operation | Latency | Performance Notes |
|
||||
|-----------|---------|-------------------|
|
||||
| **Queue Wait Time** | 47.13 µs | Time waiting in Bifrost's internal queue |
|
||||
| **Key Selection Time** | 16 ns | Weighted API key selection |
|
||||
| **Message Formatting** | 2.19 µs | Request message preparation |
|
||||
| **Params Preparation** | 436 ns | Parameter processing |
|
||||
| **Request Body Preparation** | 2.65 µs | HTTP request body assembly |
|
||||
| **JSON Marshaling** | 63.47 µs | JSON serialization time |
|
||||
| **Request Setup** | 6.59 µs | HTTP client configuration |
|
||||
| **HTTP Request** | 1.56s | Actual provider API call time |
|
||||
| **Error Handling** | 189 ns | Error processing overhead |
|
||||
| **Response Parsing** | 11.30 ms | JSON response deserialization |
|
||||
|
||||
**Bifrost's Total Overhead: 59 µs***
|
||||
|
||||
*\*Excludes JSON marshalling and HTTP calls, which are required in any implementation*
|
||||
|
||||
---
|
||||
|
||||
## Performance Analysis
|
||||
|
||||
### **Strengths on t3.medium**
|
||||
|
||||
1. **Perfect Reliability**: 100% success rate even at 5,000 RPS
|
||||
2. **Memory Efficiency**: Uses only 33% of available RAM (1,312.79 MB / 4GB)
|
||||
3. **Minimal Overhead**: Just 59 µs of added latency per request
|
||||
4. **Fast Operations**: Sub-microsecond performance for most internal operations
|
||||
|
||||
### **Resource Utilization**
|
||||
|
||||
- **Memory Usage**: Very efficient at 1,312.79 MB peak usage
|
||||
- **CPU Performance**: Handles 5,000 RPS workload effectively
|
||||
- **Queue Management**: 47.13 µs average wait time indicates good throughput
|
||||
|
||||
---
|
||||
|
||||
## Configuration Recommendations
|
||||
|
||||
### **Optimal Settings for t3.medium**
|
||||
|
||||
Based on test results, these configurations work well:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"initial_pool_size": 10000,
|
||||
"buffer_size": 15000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### **Tuning Opportunities**
|
||||
|
||||
**For Lower Memory Usage:**
|
||||
- Reduce `initial_pool_size` to 7,500-8,000
|
||||
- Decrease `buffer_size` to 12,000-13,000
|
||||
- Trade-off: Slightly higher latency
|
||||
|
||||
**For Better Performance:**
|
||||
- Increase `initial_pool_size` to 12,000-13,000
|
||||
- Increase `buffer_size` to 17,000-18,000
|
||||
- Trade-off: Higher memory usage (monitor RAM limits)
|
||||
|
||||
---
|
||||
|
||||
## Comparison Context
|
||||
|
||||
### **vs. t3.xlarge Performance**
|
||||
|
||||
| Metric | t3.medium | t3.xlarge | Difference |
|
||||
|--------|-----------|-----------|------------|
|
||||
| **Bifrost Overhead** | 59 µs | 11 µs | +81% slower |
|
||||
| **Queue Wait Time** | 47.13 µs | 1.67 µs | +96% slower |
|
||||
| **JSON Marshaling** | 63.47 µs | 26.80 µs | +58% slower |
|
||||
| **Response Parsing** | 11.30 ms | 2.11 ms | +81% slower |
|
||||
| **Memory Usage** | 1,312.79 MB | 3,340.44 MB | -61% usage |
|
||||
|
||||
**Key Insights:**
|
||||
- t3.medium uses **61% less memory** than t3.xlarge
|
||||
- Performance trade-offs are reasonable for cost savings
|
||||
- Most operations still complete in microseconds
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
**When to upgrade to t3.xlarge:**
|
||||
- Sustained load approaches 4,000+ RPS
|
||||
- Queue wait times consistently exceed 75 µs
|
||||
- Memory usage approaches 75% of available RAM
|
||||
|
||||
- **[Run Your Own Benchmarks](./run-your-own-benchmarks)** to test with your specific workload
|
||||
- **[Compare with t3.xlarge](./t3.xl)** for performance scaling analysis
|
||||
151
docs/benchmarking/t3.xl.mdx
Normal file
151
docs/benchmarking/t3.xl.mdx
Normal file
@@ -0,0 +1,151 @@
|
||||
---
|
||||
title: "t3.xlarge"
|
||||
description: "Detailed performance metrics and analysis for Bifrost running on AWS t3.xlarge instances (4 vCPUs, 16GB RAM)."
|
||||
icon: "server"
|
||||
---
|
||||
|
||||
## Instance Configuration
|
||||
|
||||
**AWS t3.xlarge Specifications:**
|
||||
- **vCPUs**: 4
|
||||
- **Memory**: 16GB RAM
|
||||
- **Network Performance**: Up to 5 Gigabit
|
||||
|
||||
**Bifrost Configuration:**
|
||||
- **Buffer Size**: 20,000
|
||||
- **Initial Pool Size**: 15,000
|
||||
- **Test Load**: 5,000 requests per second (RPS)
|
||||
|
||||
---
|
||||
|
||||
## Performance Results
|
||||
|
||||
### **Overall Performance Metrics**
|
||||
|
||||
| Metric | Value | Notes |
|
||||
|--------|-------|--------|
|
||||
| **Success Rate** | 100.00% | Perfect reliability under high load |
|
||||
| **Average Request Size** | 0.13 KB | Lightweight request payload |
|
||||
| **Average Response Size** | 10.32 KB | **Large response payload testing** |
|
||||
| **Average Latency** | 1.61s | Total end-to-end response time |
|
||||
| **Peak Memory Usage** | 3,340.44 MB | ~21% of available 16GB RAM |
|
||||
|
||||
> **Note**: t3.xlarge tests used significantly larger response payloads (~10 KB vs ~1 KB on t3.medium) to stress-test performance with realistic production data sizes.
|
||||
|
||||
### **Detailed Performance Breakdown**
|
||||
|
||||
| Operation | Latency | Performance Notes |
|
||||
|-----------|---------|-------------------|
|
||||
| **Queue Wait Time** | 1.67 µs | **96% faster** than t3.medium |
|
||||
| **Key Selection Time** | 10 ns | **37% faster** weighted API key selection |
|
||||
| **Message Formatting** | 2.11 µs | Consistent with t3.medium performance |
|
||||
| **Params Preparation** | 417 ns | Slight improvement over t3.medium |
|
||||
| **Request Body Preparation** | 2.36 µs | **11% faster** request assembly |
|
||||
| **JSON Marshaling** | 26.80 µs | **58% faster** serialization |
|
||||
| **Request Setup** | 7.17 µs | Comparable to t3.medium |
|
||||
| **HTTP Request** | 1.50s | **4% faster** provider API calls |
|
||||
| **Error Handling** | 162 ns | **14% faster** error processing |
|
||||
| **Response Parsing** | 2.11 ms | **81% faster** despite 7.5x larger payloads |
|
||||
|
||||
**Bifrost's Total Overhead: 11 µs***
|
||||
|
||||
*\*Excludes JSON marshalling and HTTP calls, which are required in any implementation. 81% reduction compared to t3.medium (59 µs → 11 µs)*
|
||||
|
||||
---
|
||||
|
||||
## Performance Analysis
|
||||
|
||||
### **Exceptional Performance Improvements**
|
||||
|
||||
1. **Dramatic Overhead Reduction**: 81% lower Bifrost overhead (59 µs → 11 µs)
|
||||
2. **Superior Queue Management**: 96% faster queue wait times (47.13 µs → 1.67 µs)
|
||||
3. **Faster JSON Processing**: 58% improvement in marshaling despite larger payloads
|
||||
4. **Efficient Response Parsing**: 81% faster parsing even with 7.5x larger responses
|
||||
5. **Perfect Reliability**: 100% success rate maintained under high load
|
||||
|
||||
### **Resource Utilization**
|
||||
|
||||
- **Memory Efficiency**: Uses only 21% of available RAM (3,340.44 MB / 16GB)
|
||||
- **CPU Performance**: Excellent multi-core utilization for 5,000 RPS
|
||||
- **Headroom**: Substantial capacity for traffic spikes and growth
|
||||
|
||||
---
|
||||
|
||||
## Scalability and Headroom
|
||||
|
||||
### **Exceptional Scaling Characteristics**
|
||||
|
||||
The t3.xlarge configuration demonstrates **excellent scaling potential**:
|
||||
|
||||
**Current Utilization:**
|
||||
- **Memory**: 21% used (13GB available headroom)
|
||||
- **Queue Performance**: 1.67 µs wait time (near-optimal)
|
||||
- **Processing Speed**: Sub-microsecond for most operations
|
||||
|
||||
**Scaling Potential:**
|
||||
- **Traffic Spikes**: Can likely handle 15,000+ RPS bursts
|
||||
- **Response Size Growth**: Efficiently handles 10 KB responses
|
||||
- **Concurrent Users**: Supports thousands of simultaneous users
|
||||
|
||||
---
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### **Optimal Settings for t3.xlarge**
|
||||
|
||||
Based on test results, these configurations provide excellent performance:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"initial_pool_size": 15000,
|
||||
"buffer_size": 20000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### **Performance Tuning Opportunities**
|
||||
|
||||
**For Maximum Performance:**
|
||||
- Increase `initial_pool_size` to 18,000-20,000
|
||||
- Increase `buffer_size` to 25,000-30,000
|
||||
- Trade-off: Higher memory usage (still well within limits)
|
||||
|
||||
**For Memory Optimization:**
|
||||
- Current config already very efficient at 21% RAM usage
|
||||
- Could reduce settings if needed, but performance gains would be lost
|
||||
|
||||
**For Extreme Workloads:**
|
||||
- Consider `initial_pool_size` up to 25,000
|
||||
- Increase `buffer_size` to 35,000+
|
||||
- Monitor memory usage approaching 50% of available RAM
|
||||
|
||||
---
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
### **vs. t3.medium Performance**
|
||||
|
||||
| Metric | t3.medium | t3.xlarge | Improvement |
|
||||
|--------|-----------|-----------|-------------|
|
||||
| **Bifrost Overhead** | 59 µs | 11 µs | **-81%** |
|
||||
| **Average Latency** | 2.12s | 1.61s | **-24%** |
|
||||
| **Queue Wait Time** | 47.13 µs | 1.67 µs | **-96%** |
|
||||
| **JSON Marshaling** | 63.47 µs | 26.80 µs | **-58%** |
|
||||
| **Response Parsing** | 11.30 ms | 2.11 ms | **-81%** |
|
||||
| **Response Size Handled** | 1.37 KB | 10.32 KB | **+7.5x** |
|
||||
| **Peak Memory Usage** | 1,312.79 MB | 3,340.44 MB | +155% |
|
||||
| **Memory Utilization** | 33% | 21% | **-36%** |
|
||||
|
||||
**Key Insights:**
|
||||
- **81% overhead reduction** while handling 7.5x larger responses
|
||||
- **Exceptional efficiency** with only 21% memory utilization
|
||||
- **Dramatic queue performance** improvements
|
||||
- **Substantial headroom** for growth and traffic spikes
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **[Run Your Own Benchmarks](./run-your-own-benchmarks)** with your specific payload sizes
|
||||
- **[Compare with t3.medium](./t3.medium)** for cost-optimization analysis
|
||||
Reference in New Issue
Block a user