bifrost/docs/deployment-guides/docker-tuning.mdx

---
title: "Docker Performance Tuning"
description: "Optimize Bifrost container performance with Go runtime tuning, resource limits, and system configuration"
icon: "docker"
---

This guide covers performance tuning for Bifrost when running in Docker containers. Proper tuning ensures Bifrost can fully utilize container resources and achieve optimal throughput.

<Note>
These optimizations apply to Docker, Docker Compose, Kubernetes, and any container runtime using cgroups for resource management.
</Note>

## Quick Start

For most production deployments, add these settings to your container:

```yaml
services:
  bifrost:
    image: maximhq/bifrost:latest
    environment:
      - GOGC=200
      - GOMEMLIMIT=3600MiB  # 90% of 4GB memory limit
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G
```

---

## Go Runtime Tuning

### GOMAXPROCS (Automatic)

Bifrost automatically detects container CPU limits using [automaxprocs](https://github.com/uber-go/automaxprocs). This sets `GOMAXPROCS` to match your container's CPU quota from cgroups (v1 and v2).

**No configuration needed** — this works automatically. You'll see a log line at startup:

```
maxprocs: Updating GOMAXPROCS=4: determined from CPU quota
```

<Warning>
Without automaxprocs, Go would detect all host CPUs (e.g., 64 on an EC2 instance) even when the container is limited to 4 CPUs, causing excessive context switching and degraded performance.
</Warning>

### GOGC (Garbage Collection)

`GOGC` controls garbage collection frequency. The default is `100` (GC triggers when heap grows 100% since last collection).

| Scenario | Recommended GOGC | Trade-off |
|----------|------------------|-----------|
| Memory constrained | 50-100 | More frequent GC, lower memory |
| High throughput, memory available | 200-400 | Less GC overhead, higher memory |
| Latency sensitive | 50-100 | More predictable latency |

```yaml
environment:
  - GOGC=200
```

<Tip>
For high-throughput API gateways, `GOGC=200` or `GOGC=400` typically provides the best balance of throughput and memory usage.
</Tip>

### GOMEMLIMIT (Memory Limit)

`GOMEMLIMIT` sets a soft memory limit for the Go runtime. When approaching this limit, Go becomes more aggressive about garbage collection.

**Best practice:** Set to ~90% of your container's memory limit to leave headroom for non-heap memory (goroutine stacks, CGO, etc.).

| Container Memory | Recommended GOMEMLIMIT |
|------------------|------------------------|
| 512 MB | 450MiB |
| 1 GB | 900MiB |
| 2 GB | 1800MiB |
| 4 GB | 3600MiB |
| 8 GB | 7200MiB |

```yaml
environment:
  - GOMEMLIMIT=3600MiB
```

<Note>
When using both `GOGC` and `GOMEMLIMIT`, Go GCs based on whichever trigger fires first. For high-throughput workloads, set `GOGC=200` or higher and let `GOMEMLIMIT` be the primary constraint.
</Note>

---

## System Limits

### File Descriptor Limits (ulimits)

Each HTTP connection requires a file descriptor. The default container limit (often 1024) is too low for high-concurrency workloads.

```yaml
ulimits:
  nofile:
    soft: 65536
    hard: 65536
```

| Expected Concurrent Connections | Recommended nofile |
|--------------------------------|-------------------|
| < 1000 | 4096 |
| 1000-5000 | 16384 |
| 5000-10000 | 32768 |
| > 10000 | 65536+ |

<Warning>
If you see errors like `too many open files` or connections being refused under load, increase your `nofile` limit.
</Warning>

### Resource Limits

Set CPU and memory limits to match your expected workload:

```yaml
deploy:
  resources:
    limits:
      cpus: '4'
      memory: 4G
    reservations:
      cpus: '2'
      memory: 2G
```

**Sizing guidance:**

| Expected RPS | Recommended CPUs | Recommended Memory |
|--------------|------------------|-------------------|
| 100-500 | 1-2 | 512MB-1GB |
| 500-2000 | 2-4 | 1-2GB |
| 2000-5000 | 4-8 | 2-4GB |
| 5000+ | 8+ | 4GB+ |

---

## Docker Compose Examples

### Development

```yaml
services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    volumes:
      - ./data:/app/data
    environment:
      - LOG_LEVEL=debug
```

### Production (Single Node)

```yaml
services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    volumes:
      - bifrost-data:/app/data
    environment:
      - LOG_LEVEL=info
      - LOG_STYLE=json
      - GOGC=200
      - GOMEMLIMIT=3600MiB
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G
        reservations:
          cpus: '2'
          memory: 2G
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "-O", "/dev/null", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

volumes:
  bifrost-data:
```

### Production (Multi-Node with PostgreSQL)

<Note>
If you use PostgreSQL for Bifrost storage, ensure the database is UTF8 encoded. See [PostgreSQL UTF8 Requirement](../quickstart/gateway/setting-up#postgresql-utf8-requirement).
</Note>

```yaml
services:
  bifrost-1:
    image: maximhq/bifrost:latest
    ports:
      - "8081:8080"
    environment:
      - LOG_LEVEL=info
      - GOGC=200
      - GOMEMLIMIT=1800MiB
      - BIFROST_DB_TYPE=postgres
      - BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      - postgres

  bifrost-2:
    image: maximhq/bifrost:latest
    ports:
      - "8082:8080"
    environment:
      - LOG_LEVEL=info
      - GOGC=200
      - GOMEMLIMIT=1800MiB
      - BIFROST_DB_TYPE=postgres
      - BIFROST_DB_DSN=postgres://user:pass@postgres:5432/bifrost?sslmode=disable
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
    depends_on:
      - postgres

  postgres:
    image: postgres:16-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=bifrost
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  postgres-data:
```

---

## Kubernetes Configuration

### Basic Deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bifrost
  template:
    metadata:
      labels:
        app: bifrost
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080
          env:
            - name: GOGC
              value: "200"
            - name: GOMEMLIMIT
              value: "3600MiB"
          resources:
            limits:
              cpu: "4"
              memory: "4Gi"
            requests:
              cpu: "2"
              memory: "2Gi"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
```

### File Descriptor Limits in Kubernetes

File descriptor limits in Kubernetes are typically set at the node level. Options include:

1. **Node-level configuration** (recommended): Set `fs.file-max` and ulimits in your node configuration
2. **Init container**: Use an init container with elevated privileges to set limits
3. **Security context**: Some clusters allow setting capabilities

```yaml
securityContext:
  capabilities:
    add: ["SYS_RESOURCE"]
```

<Note>
Check your current limits inside a container with: `cat /proc/sys/fs/file-max` and `ulimit -n`
</Note>

---

## Bifrost Application Settings

Align Bifrost's internal settings with your container resources:

### Concurrency and Buffer Size

Configure per provider in `config.json`:

```json
{
  "providers": {
    "openai": {
      "concurrency_and_buffer_size": {
        "concurrency": 1000,
        "buffer_size": 1500
      }
    }
  }
}
```

**Formula:**
- `concurrency` = expected RPS per provider
- `buffer_size` = 1.5 × concurrency

### Initial Pool Size

Configure globally in `config.json`:

```json
{
  "client": {
    "initial_pool_size": 3000
  }
}
```

**Formula:** `initial_pool_size` = 1.5 × total expected RPS across all providers

<Tip>
See the [Performance Tuning](/providers/performance) guide for detailed sizing recommendations.
</Tip>

---

## Tuning Checklist

<Steps>
  <Step title="Set container resource limits">
    Define CPU and memory limits based on expected workload. Start with 2 CPUs / 2GB for moderate loads.
  </Step>
  <Step title="Configure GOMEMLIMIT">
    Set to 90% of container memory limit (e.g., `1800MiB` for 2GB container).
  </Step>
  <Step title="Tune GOGC">
    Start with `GOGC=200` for throughput; reduce to 100 if memory pressure is high.
  </Step>
  <Step title="Set file descriptor limits">
    Set `nofile` ulimit to at least 2× your expected concurrent connections.
  </Step>
  <Step title="Align Bifrost settings">
    Match `concurrency` and `buffer_size` to your container's CPU count and expected RPS.
  </Step>
  <Step title="Monitor and adjust">
    Watch memory usage, GC pause times, and request latencies. Adjust settings based on observed behavior.
  </Step>
</Steps>

---

## Troubleshooting

### High Memory Usage

- Reduce `GOGC` (e.g., from 200 to 100)
- Ensure `GOMEMLIMIT` is set
- Reduce `buffer_size` and `initial_pool_size`

### High Latency Spikes

- May indicate GC pauses; try reducing `GOGC`
- Check if container is hitting CPU limits
- Verify `GOMAXPROCS` matches container CPU quota (check startup logs)

### Connection Errors Under Load

- Increase `nofile` ulimit
- Ensure `buffer_size` is large enough for traffic spikes
- Check provider rate limits

### Container OOM Killed

- Reduce `GOMEMLIMIT` to 85% of container memory
- Reduce `GOGC` to trigger more frequent GC
- Reduce `buffer_size` and `initial_pool_size`

---

## Related Documentation

- **[Performance Tuning](/providers/performance)** - Bifrost-specific performance configuration
- **[Helm Deployment](/deployment-guides/helm)** - Kubernetes deployment with Helm
- **[Multi-Node Setup](/deployment-guides/how-to/multinode)** - Scaling across multiple instances