first commit

This commit is contained in:
Beyhan Oğur
2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions

View File

@@ -0,0 +1,401 @@
---
title: "Troubleshooting"
description: "Diagnose and fix common issues with Bifrost Helm deployments — pods, database, ingress, secrets, PVCs, and performance"
icon: "wrench"
---
This page covers the most common problems encountered when deploying Bifrost with Helm, along with diagnostic commands and fixes.
---
## Pod Not Starting
### Quick diagnostics
```bash
# Show pod status
kubectl get pods -l app.kubernetes.io/name=bifrost
# Show pod events (most useful first step)
kubectl describe pod -l app.kubernetes.io/name=bifrost
# Show pod logs (use --previous if the pod has already crashed)
kubectl logs -l app.kubernetes.io/name=bifrost
kubectl logs -l app.kubernetes.io/name=bifrost --previous
```
### Image pull errors (`ErrImagePull` / `ImagePullBackOff`)
```bash
# Check which image is being pulled
kubectl describe pod -l app.kubernetes.io/name=bifrost | grep "Image:"
# Verify imagePullSecrets are attached
kubectl get pod -l app.kubernetes.io/name=bifrost -o jsonpath='{.items[0].spec.imagePullSecrets}'
# Test secret manually
kubectl get secret <pull-secret-name> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq .
```
Common causes:
- `image.tag` not set — the chart requires it; the pod will not start without it
- Pull secret missing or expired (ECR tokens expire after 12 hours)
- Incorrect `image.repository` for enterprise registry
```bash
# Fix: set the correct tag
helm upgrade bifrost bifrost/bifrost --reuse-values --set image.tag=v1.4.11
```
### PVC not binding (`Pending`)
```bash
# Check PVC status
kubectl get pvc -l app.kubernetes.io/instance=bifrost
# Show binding events
kubectl describe pvc -l app.kubernetes.io/instance=bifrost
```
Common causes:
- No Persistent Volume provisioner in the cluster
- `storageClass` set to a class that doesn't exist
- `ReadWriteOnce` access mode with multiple replicas (SQLite PVCs are single-node)
```bash
# List available storage classes
kubectl get storageclass
# Fix: pin to a valid storage class
helm upgrade bifrost bifrost/bifrost \
--reuse-values \
--set storage.persistence.storageClass=standard
```
### ConfigMap / Secret errors
```bash
# View the generated ConfigMap (contains rendered config.json)
kubectl get configmap bifrost-config -o yaml
# View secrets the pod depends on
kubectl get secret -l app.kubernetes.io/instance=bifrost
# Decode a specific secret value
kubectl get secret bifrost-encryption -o jsonpath='{.data.key}' | base64 -d
```
### CrashLoopBackOff
```bash
# Get last log lines before the crash
kubectl logs -l app.kubernetes.io/name=bifrost --previous --tail=50
# Common causes shown in logs:
# "encryption key is not initialized" → no key provided; optional, but data will be stored in plaintext
# "failed to connect to database" → see Database section below
# "image.tag is required" → set image.tag in values
```
---
## Database Connection Issues
### Embedded PostgreSQL
```bash
# Check if the PostgreSQL pod is running
kubectl get pods -l app.kubernetes.io/name=bifrost-postgresql
# Connect directly to inspect the database
kubectl exec -it deployment/bifrost-postgresql -- psql -U bifrost -d bifrost
# Test connectivity from the Bifrost pod
kubectl exec -it deployment/bifrost -- nc -zv bifrost-postgresql 5432
# Check PostgreSQL logs
kubectl logs deployment/bifrost-postgresql --tail=50
```
### External PostgreSQL
```bash
# Test connectivity from within the cluster
kubectl run pg-test --image=postgres:16-alpine --rm -it --restart=Never -- \
psql "host=your-db-host dbname=bifrost user=bifrost sslmode=require"
# Verify the secret value is correct
kubectl get secret postgres-credentials -o jsonpath='{.data.password}' | base64 -d
# Check that the external host/port is reachable
kubectl exec -it deployment/bifrost -- nc -zv your-db-host 5432
```
Common causes:
- `sslMode: disable` when the database requires SSL — set `sslMode: require`
- Password in secret doesn't match the database user
- Network policy blocking pod → database traffic
- Database not UTF8 encoded (see [PostgreSQL UTF8 Requirement](/quickstart/gateway/setting-up#postgresql-utf8-requirement))
```bash
# Fix: update the secret and restart
kubectl create secret generic postgres-credentials \
--from-literal=password='correct-password' \
--dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deployment/bifrost
```
---
## Ingress Not Working
```bash
# Check ingress resource status
kubectl describe ingress bifrost
# Check if the ingress controller is running
kubectl get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx
# View ingress controller logs for routing errors
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=50
# Verify DNS resolves to the correct load balancer IP
nslookup bifrost.yourdomain.com
kubectl get ingress bifrost -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# Test without TLS first
curl -v http://bifrost.yourdomain.com/health
```
Common causes:
- `ingress.className` not set or set to a class not installed in the cluster
- TLS certificate not issued yet (cert-manager can take up to 60 seconds)
- Service port mismatch — Bifrost listens on `8080` by default
```bash
# Check cert-manager certificate status
kubectl get certificate -l app.kubernetes.io/instance=bifrost
kubectl describe certificate bifrost-tls
```
---
## Secret and Credential Issues
### Provider API key not resolving
If Bifrost logs show `env.OPENAI_API_KEY: not set` or similar:
```bash
# Check the env var is present in the running pod
kubectl exec -it deployment/bifrost -- env | grep OPENAI
# Verify the providerSecrets secret exists with the right key
kubectl get secret provider-api-keys -o yaml
# Check the providerSecrets configuration rendered correctly
kubectl get configmap bifrost-config -o yaml | grep -A5 providers
```
### Encryption key issues
```bash
# Verify the secret exists and contains the right key name
kubectl get secret bifrost-encryption -o yaml
# Check the exact key name matches encryptionKeySecret.key in values
# Default key name is "encryption-key" — if you used "key", set:
# bifrost.encryptionKeySecret.key: "key"
```
---
## High Memory Usage
```bash
# Check current resource usage
kubectl top pods -l app.kubernetes.io/name=bifrost
# Check if OOM kills are happening
kubectl describe pod -l app.kubernetes.io/name=bifrost | grep -A3 "OOMKilled\|Limits"
# View resource requests/limits on running pods
kubectl get pod -l app.kubernetes.io/name=bifrost \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].resources}{"\n"}{end}'
```
**Increase resource limits:**
```bash
helm upgrade bifrost bifrost/bifrost \
--reuse-values \
--set resources.limits.memory=4Gi \
--set resources.requests.memory=1Gi
```
**Tune Go runtime** (see [Docker Tuning](/deployment-guides/docker-tuning)):
```yaml
env:
- name: GOGC
value: "200" # run GC less often
- name: GOMEMLIMIT
value: "3500MiB" # hard memory ceiling slightly below the container limit
```
---
## High CPU Usage / Latency
```bash
# Check CPU usage
kubectl top pods -l app.kubernetes.io/name=bifrost
# Check if HPA is scaling correctly
kubectl get hpa bifrost
kubectl describe hpa bifrost
```
Common causes:
- `initialPoolSize` too small — goroutines queuing up; increase to `500``1000`
- `dropExcessRequests: false` with a small pool — queue depth growing unboundedly
```bash
helm upgrade bifrost bifrost/bifrost \
--reuse-values \
--set bifrost.client.initialPoolSize=1000 \
--set bifrost.client.dropExcessRequests=true
```
---
## Autoscaling Issues
### HPA not scaling
```bash
# Check HPA status and current metrics
kubectl describe hpa bifrost
# Verify metrics server is installed
kubectl top nodes
kubectl top pods
# Common fix: metrics server not installed
# Install with:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
```
### Pods scaling down too aggressively (drops active SSE streams)
The default `scaleDown.stabilizationWindowSeconds: 300` and `preStop` sleep of 15 seconds should prevent this. If streams are still being cut:
```yaml
terminationGracePeriodSeconds: 120 # increase if streams run longer than 105s
autoscaling:
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # wait 10 min before scaling down
policies:
- type: Pods
value: 1
periodSeconds: 300 # remove at most 1 pod per 5 min
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 30"] # give load balancer more time to drain
```
```bash
helm upgrade bifrost bifrost/bifrost --reuse-values -f graceful-shutdown-values.yaml
```
---
## SQLite / PVC Issues
### StatefulSet migration (upgrading from chart < v2.0.0)
Older chart versions used a Deployment + manual PVC. v2.0.0 moved SQLite to a StatefulSet. If upgrading:
```bash
# 1. Scale down the old deployment
kubectl scale deployment bifrost --replicas=0
# 2. Note the existing PVC name
kubectl get pvc
# 3. Upgrade, pointing at the existing claim
helm upgrade bifrost bifrost/bifrost \
--reuse-values \
--set storage.persistence.existingClaim=<your-old-pvc-name> \
--set image.tag=v1.4.11
```
### Data lost after upgrade
```bash
# Check if PVCs still exist (they persist after helm uninstall)
kubectl get pvc -l app.kubernetes.io/instance=bifrost
# Re-attach by setting existingClaim
helm upgrade bifrost bifrost/bifrost \
--reuse-values \
--set storage.persistence.existingClaim=<pvc-name>
```
---
## Cluster Mode Issues
### Peers not discovering each other
```bash
# Check gossip port is reachable between pods
kubectl exec -it bifrost-0 -- nc -zv bifrost-1.bifrost-headless 7946
# View gossip-related log lines
kubectl logs -l app.kubernetes.io/name=bifrost --tail=100 | grep -i gossip
# Check the headless service exists
kubectl get svc bifrost-headless
```
For Kubernetes-based discovery, verify the service account has pod list permissions:
```bash
kubectl auth can-i list pods --as=system:serviceaccount:default:bifrost
```
---
## Useful Diagnostic Commands
```bash
# Full state dump for a support ticket
kubectl get all -l app.kubernetes.io/instance=bifrost
kubectl describe pod -l app.kubernetes.io/name=bifrost > pod-describe.txt
kubectl logs -l app.kubernetes.io/name=bifrost --tail=200 > pod-logs.txt
# View the full rendered config.json
kubectl get configmap bifrost-config -o jsonpath='{.data.config\.json}' | jq .
# Check current Helm values (shows all overrides)
helm get values bifrost
# Check Helm release status
helm status bifrost
# View Helm release history
helm history bifrost
```
---
## Still Stuck?
- [GitHub Issues](https://github.com/maximhq/bifrost/issues) — search existing issues or open a new one
- [Enterprise Support](mailto:support@getmaxim.ai) — for enterprise customers with SLA