first commit
This commit is contained in:
440
docs/deployment-guides/docker-tuning.mdx
Normal file
440
docs/deployment-guides/docker-tuning.mdx
Normal file
@@ -0,0 +1,440 @@
|
||||
---
|
||||
title: "Docker Performance Tuning"
|
||||
description: "Optimize Bifrost container performance with Go runtime tuning, resource limits, and system configuration"
|
||||
icon: "docker"
|
||||
---
|
||||
|
||||
This guide covers performance tuning for Bifrost when running in Docker containers. Proper tuning ensures Bifrost can fully utilize container resources and achieve optimal throughput.
|
||||
|
||||
<Note>
|
||||
These optimizations apply to Docker, Docker Compose, Kubernetes, and any container runtime using cgroups for resource management.
|
||||
</Note>
|
||||
|
||||
## Quick Start
|
||||
|
||||
For most production deployments, add these settings to your container:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost:
|
||||
image: maximhq/bifrost:latest
|
||||
environment:
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=3600MiB # 90% of 4GB memory limit
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '4'
|
||||
memory: 4G
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Go Runtime Tuning
|
||||
|
||||
### GOMAXPROCS (Automatic)
|
||||
|
||||
Bifrost automatically detects container CPU limits using [automaxprocs](https://github.com/uber-go/automaxprocs). This sets `GOMAXPROCS` to match your container's CPU quota from cgroups (v1 and v2).
|
||||
|
||||
**No configuration needed** — this works automatically. You'll see a log line at startup:
|
||||
|
||||
```
|
||||
maxprocs: Updating GOMAXPROCS=4: determined from CPU quota
|
||||
```
|
||||
|
||||
<Warning>
|
||||
Without automaxprocs, Go would detect all host CPUs (e.g., 64 on an EC2 instance) even when the container is limited to 4 CPUs, causing excessive context switching and degraded performance.
|
||||
</Warning>
|
||||
|
||||
### GOGC (Garbage Collection)
|
||||
|
||||
`GOGC` controls garbage collection frequency. The default is `100` (GC triggers when heap grows 100% since last collection).
|
||||
|
||||
| Scenario | Recommended GOGC | Trade-off |
|
||||
|----------|------------------|-----------|
|
||||
| Memory constrained | 50-100 | More frequent GC, lower memory |
|
||||
| High throughput, memory available | 200-400 | Less GC overhead, higher memory |
|
||||
| Latency sensitive | 50-100 | More predictable latency |
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- GOGC=200
|
||||
```
|
||||
|
||||
<Tip>
|
||||
For high-throughput API gateways, `GOGC=200` or `GOGC=400` typically provides the best balance of throughput and memory usage.
|
||||
</Tip>
|
||||
|
||||
### GOMEMLIMIT (Memory Limit)
|
||||
|
||||
`GOMEMLIMIT` sets a soft memory limit for the Go runtime. When approaching this limit, Go becomes more aggressive about garbage collection.
|
||||
|
||||
**Best practice:** Set to ~90% of your container's memory limit to leave headroom for non-heap memory (goroutine stacks, CGO, etc.).
|
||||
|
||||
| Container Memory | Recommended GOMEMLIMIT |
|
||||
|------------------|------------------------|
|
||||
| 512 MB | 450MiB |
|
||||
| 1 GB | 900MiB |
|
||||
| 2 GB | 1800MiB |
|
||||
| 4 GB | 3600MiB |
|
||||
| 8 GB | 7200MiB |
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- GOMEMLIMIT=3600MiB
|
||||
```
|
||||
|
||||
<Note>
|
||||
When using both `GOGC` and `GOMEMLIMIT`, Go GCs based on whichever trigger fires first. For high-throughput workloads, set `GOGC=200` or higher and let `GOMEMLIMIT` be the primary constraint.
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## System Limits
|
||||
|
||||
### File Descriptor Limits (ulimits)
|
||||
|
||||
Each HTTP connection requires a file descriptor. The default container limit (often 1024) is too low for high-concurrency workloads.
|
||||
|
||||
```yaml
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
```
|
||||
|
||||
| Expected Concurrent Connections | Recommended nofile |
|
||||
|--------------------------------|-------------------|
|
||||
| < 1000 | 4096 |
|
||||
| 1000-5000 | 16384 |
|
||||
| 5000-10000 | 32768 |
|
||||
| > 10000 | 65536+ |
|
||||
|
||||
<Warning>
|
||||
If you see errors like `too many open files` or connections being refused under load, increase your `nofile` limit.
|
||||
</Warning>
|
||||
|
||||
### Resource Limits
|
||||
|
||||
Set CPU and memory limits to match your expected workload:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '4'
|
||||
memory: 4G
|
||||
reservations:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
```
|
||||
|
||||
**Sizing guidance:**
|
||||
|
||||
| Expected RPS | Recommended CPUs | Recommended Memory |
|
||||
|--------------|------------------|-------------------|
|
||||
| 100-500 | 1-2 | 512MB-1GB |
|
||||
| 500-2000 | 2-4 | 1-2GB |
|
||||
| 2000-5000 | 4-8 | 2-4GB |
|
||||
| 5000+ | 8+ | 4GB+ |
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose Examples
|
||||
|
||||
### Development
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
environment:
|
||||
- LOG_LEVEL=debug
|
||||
```
|
||||
|
||||
### Production (Single Node)
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- bifrost-data:/app/data
|
||||
environment:
|
||||
- LOG_LEVEL=info
|
||||
- LOG_STYLE=json
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=3600MiB
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '4'
|
||||
memory: 4G
|
||||
reservations:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--no-verbose", "--tries=1", "-O", "/dev/null", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
bifrost-data:
|
||||
```
|
||||
|
||||
### Production (Multi-Node with PostgreSQL)
|
||||
|
||||
<Note>
|
||||
If you use PostgreSQL for Bifrost storage, ensure the database is UTF8 encoded. See [PostgreSQL UTF8 Requirement](../quickstart/gateway/setting-up#postgresql-utf8-requirement).
|
||||
</Note>
|
||||
|
||||
```yaml
|
||||
services:
|
||||
bifrost-1:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8081:8080"
|
||||
environment:
|
||||
- LOG_LEVEL=info
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=1800MiB
|
||||
- BIFROST_DB_TYPE=postgres
|
||||
- BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
depends_on:
|
||||
- postgres
|
||||
|
||||
bifrost-2:
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- "8082:8080"
|
||||
environment:
|
||||
- LOG_LEVEL=info
|
||||
- GOGC=200
|
||||
- GOMEMLIMIT=1800MiB
|
||||
- BIFROST_DB_TYPE=postgres
|
||||
- BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2'
|
||||
memory: 2G
|
||||
depends_on:
|
||||
- postgres
|
||||
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
environment:
|
||||
- POSTGRES_USER=user
|
||||
- POSTGRES_PASSWORD=pass
|
||||
- POSTGRES_DB=bifrost
|
||||
volumes:
|
||||
- postgres-data:/var/lib/postgresql/data
|
||||
|
||||
volumes:
|
||||
postgres-data:
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Kubernetes Configuration
|
||||
|
||||
### Basic Deployment
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: bifrost
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: bifrost
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: bifrost
|
||||
spec:
|
||||
containers:
|
||||
- name: bifrost
|
||||
image: maximhq/bifrost:latest
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
env:
|
||||
- name: GOGC
|
||||
value: "200"
|
||||
- name: GOMEMLIMIT
|
||||
value: "3600MiB"
|
||||
resources:
|
||||
limits:
|
||||
cpu: "4"
|
||||
memory: "4Gi"
|
||||
requests:
|
||||
cpu: "2"
|
||||
memory: "2Gi"
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
```
|
||||
|
||||
### File Descriptor Limits in Kubernetes
|
||||
|
||||
File descriptor limits in Kubernetes are typically set at the node level. Options include:
|
||||
|
||||
1. **Node-level configuration** (recommended): Set `fs.file-max` and ulimits in your node configuration
|
||||
2. **Init container**: Use an init container with elevated privileges to set limits
|
||||
3. **Security context**: Some clusters allow setting capabilities
|
||||
|
||||
```yaml
|
||||
securityContext:
|
||||
capabilities:
|
||||
add: ["SYS_RESOURCE"]
|
||||
```
|
||||
|
||||
<Note>
|
||||
Check your current limits inside a container with: `cat /proc/sys/fs/file-max` and `ulimit -n`
|
||||
</Note>
|
||||
|
||||
---
|
||||
|
||||
## Bifrost Application Settings
|
||||
|
||||
Align Bifrost's internal settings with your container resources:
|
||||
|
||||
### Concurrency and Buffer Size
|
||||
|
||||
Configure per provider in `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openai": {
|
||||
"concurrency_and_buffer_size": {
|
||||
"concurrency": 1000,
|
||||
"buffer_size": 1500
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Formula:**
|
||||
- `concurrency` = expected RPS per provider
|
||||
- `buffer_size` = 1.5 × concurrency
|
||||
|
||||
### Initial Pool Size
|
||||
|
||||
Configure globally in `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"client": {
|
||||
"initial_pool_size": 3000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Formula:** `initial_pool_size` = 1.5 × total expected RPS across all providers
|
||||
|
||||
<Tip>
|
||||
See the [Performance Tuning](/providers/performance) guide for detailed sizing recommendations.
|
||||
</Tip>
|
||||
|
||||
---
|
||||
|
||||
## Tuning Checklist
|
||||
|
||||
<Steps>
|
||||
<Step title="Set container resource limits">
|
||||
Define CPU and memory limits based on expected workload. Start with 2 CPUs / 2GB for moderate loads.
|
||||
</Step>
|
||||
<Step title="Configure GOMEMLIMIT">
|
||||
Set to 90% of container memory limit (e.g., `1800MiB` for 2GB container).
|
||||
</Step>
|
||||
<Step title="Tune GOGC">
|
||||
Start with `GOGC=200` for throughput; reduce to 100 if memory pressure is high.
|
||||
</Step>
|
||||
<Step title="Set file descriptor limits">
|
||||
Set `nofile` ulimit to at least 2× your expected concurrent connections.
|
||||
</Step>
|
||||
<Step title="Align Bifrost settings">
|
||||
Match `concurrency` and `buffer_size` to your container's CPU count and expected RPS.
|
||||
</Step>
|
||||
<Step title="Monitor and adjust">
|
||||
Watch memory usage, GC pause times, and request latencies. Adjust settings based on observed behavior.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### High Memory Usage
|
||||
|
||||
- Reduce `GOGC` (e.g., from 200 to 100)
|
||||
- Ensure `GOMEMLIMIT` is set
|
||||
- Reduce `buffer_size` and `initial_pool_size`
|
||||
|
||||
### High Latency Spikes
|
||||
|
||||
- May indicate GC pauses; try reducing `GOGC`
|
||||
- Check if container is hitting CPU limits
|
||||
- Verify `GOMAXPROCS` matches container CPU quota (check startup logs)
|
||||
|
||||
### Connection Errors Under Load
|
||||
|
||||
- Increase `nofile` ulimit
|
||||
- Ensure `buffer_size` is large enough for traffic spikes
|
||||
- Check provider rate limits
|
||||
|
||||
### Container OOM Killed
|
||||
|
||||
- Reduce `GOMEMLIMIT` to 85% of container memory
|
||||
- Reduce `GOGC` to trigger more frequent GC
|
||||
- Reduce `buffer_size` and `initial_pool_size`
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **[Performance Tuning](/providers/performance)** - Bifrost-specific performance configuration
|
||||
- **[Helm Deployment](/deployment-guides/helm)** - Kubernetes deployment with Helm
|
||||
- **[Multi-Node Setup](/deployment-guides/how-to/multinode)** - Scaling across multiple instances
|
||||
Reference in New Issue
Block a user