first commit

2026-04-26 21:52:23 +03:00
commit 880f412e2c
2662 changed files with 866266 additions and 0 deletions
--- a/.claude/skills/changelog-writer/SKILL.md
+++ b/.claude/skills/changelog-writer/SKILL.md
@@ -0,0 +1,361 @@
+---
+name: changelog-writer
+description: Write changelogs for Bifrost releases. Reads git history, bumps module versions following the core→framework→plugins→transport hierarchy, writes transports/changelog.md (enterprise-style) and per-module changelog.md files, and updates version files. Invoked with /changelog-writer or /changelog-writer <transport-version>.
+allowed-tools: Read, Grep, Glob, Bash, Edit, Write, Task, AskUserQuestion
+---
+
+# Changelog Writer
+
+Generate changelogs for a new Bifrost release. Reads git history to identify changes per module, asks the user for version bump type, bumps all module versions respecting the dependency hierarchy, writes `transports/changelog.md` and per-module `changelog.md` files, and updates version files.
+
+**IMPORTANT: This skill NEVER creates or modifies files under `docs/`.** No MDX files, no docs.json updates. Only `changelog.md` and `version` files within module directories.
+
+## Module Hierarchy
+
+Changes cascade down this dependency chain:
+
+```
+core → framework → plugins → transports
+```
+
+- **core** depends on nothing internal
+- **framework** depends on core
+- **plugins/*** each depend on core + framework
+- **transports** depends on core + framework + all plugins
+
+If core changes, every module below it must bump its version (at minimum a patch bump).
+
+If framework changes (but not core), plugins and transports must bump.
+
+If only a plugin changes, transports must bump.
+
+If only transports changes, only transports bumps.
+
+## Usage
+
+```
+/changelog-writer                    # Interactive — prompts for everything
+/changelog-writer <transport-ver>    # Pre-set transport version (e.g., v1.5.0)
+```
+
+## Workflow
+
+### Step 1: Gather Current State
+
+Read the current version of every module:
+
+```bash
+echo "core: $(cat core/version)"
+echo "framework: $(cat framework/version)"
+echo "transports: $(cat transports/version)"
+for d in plugins/*/; do echo "$(basename $d): $(cat ${d}version)"; done
+```
+
+Read the latest changelog file to understand the previous release state:
+
+```bash
+# Find the latest docs changelog to determine the last released version
+ls -1t docs/changelogs/*.mdx | head -1
+```
+
+Then read that file to know the previous versions of all modules.
+
+### Step 2: Identify Changes Since Last Release
+
+Use git log to find commits since the last release tag or since the last changelog was written:
+
+```bash
+# Get the transport version from the latest changelog (it matches the release version)
+LAST_VERSION=$(ls -1t docs/changelogs/*.mdx | head -1 | sed 's/.*\/v/v/' | sed 's/.mdx//')
+echo "Last release: $LAST_VERSION"
+
+# Check if a git tag exists
+git tag -l "$LAST_VERSION" "v*"
+
+# Get commits since last release
+# If tag exists:
+git log ${LAST_VERSION}..HEAD --oneline --no-merges
+
+# If no tag, use date-based or commit-based approach:
+# Find the commit that added the last changelog
+git log --oneline --all -- "docs/changelogs/$(ls -1t docs/changelogs/*.mdx | head -1 | xargs basename)" | head -1
+```
+
+For each module, identify which files changed:
+
+```bash
+# Changes in core
+git diff --name-only ${BASE}..HEAD -- core/
+
+# Changes in framework
+git diff --name-only ${BASE}..HEAD -- framework/
+
+# Changes in each plugin
+for d in plugins/*/; do
+  CHANGES=$(git diff --name-only ${BASE}..HEAD -- "$d" | wc -l)
+  if [ "$CHANGES" -gt 0 ]; then
+    echo "$(basename $d): $CHANGES files changed"
+  fi
+done
+
+# Changes in transports
+git diff --name-only ${BASE}..HEAD -- transports/
+```
+
+### Step 3: Classify Changes and Determine Bump Types
+
+Present the identified changes to the user and ask what type of version bump each changed module needs.
+
+**Always ask the user with AskUserQuestion what bump type to use for each module.**
+
+Ask for **every** module that will be bumped — both modules with code changes and modules with only cascade bumps. Use AskUserQuestion with up to 4 questions at a time (the tool's limit), batching in hierarchy order:
+
+1. First batch: core, framework, and up to 2 plugins
+2. Continue with remaining plugins and transports
+
+For each module ask: "What type of version bump for **{module}**?"
+
+Options:
+- **patch** — Bug fixes, small improvements (0.0.X)
+- **minor** — New features, non-breaking changes (0.X.0)
+- **major** — Breaking changes (X.0.0)
+
+**Note:** Minor bumps reset the patch version to 0 (e.g., `1.4.24` → `1.5.0`). Patch bumps only increment the last number (e.g., `1.4.24` → `1.4.25`).
+
+### Step 4: Calculate New Versions
+
+Apply version bumps. Semver rules:
+
+- **patch**: `1.4.4` → `1.4.5`
+- **minor**: `1.4.4` → `1.5.0`
+- **major**: `1.4.4` → `2.0.0`
+
+Calculate new versions for ALL modules following the cascade rules:
+
+```
+new_core_version = bump(current_core, user_chosen_bump) if core changed, else current_core
+new_framework_version = bump(current_framework, user_chosen_bump) if framework changed, else patch_bump(current_framework) if core changed, else current_framework
+new_plugin_X_version = bump(current_plugin_X, user_chosen_bump) if plugin_X changed, else patch_bump(current_plugin_X) if core or framework changed, else current_plugin_X
+new_transport_version = bump(current_transport, user_chosen_bump) if transport changed, else patch_bump(current_transport) if any upstream changed
+```
+
+**Present the version plan to the user for confirmation before proceeding.**
+
+Show a table like:
+
+```
+Module             Current    New       Bump Type    Reason
+core               1.4.4      1.5.0    minor        code changes
+framework          1.2.23     1.3.0    minor        cascade from core
+governance         1.4.24     1.4.25   patch        cascade from core+framework
+...
+transports         1.4.9      1.5.0    minor        cascade from all
+```
+
+Wait for user confirmation. If they want to adjust any version, update accordingly.
+
+### Step 5: Collect and Write Changelog Entries
+
+For each module, compose changelog entries from the git log.
+
+**Read the actual git commits and changed code** to write meaningful entries:
+
+```bash
+# For each changed module, get detailed commit messages
+git log ${BASE}..HEAD --oneline --no-merges -- core/
+git log ${BASE}..HEAD --oneline --no-merges -- framework/
+# etc.
+```
+
+#### Credit Outside Contributors
+
+For each commit that references a PR number (e.g., `#1234`), check if the author is an outside contributor:
+
+```bash
+# Get the repo name
+REPO=$(gh repo view --json nameWithOwner --jq '.nameWithOwner')
+
+# For each PR number found in commits:
+gh api "repos/$REPO/pulls/<PR_NUMBER>" --jq '"\(.number) \(.user.login) \(.author_association)"'
+```
+
+**`author_association` values:**
+- `MEMBER`, `OWNER`, `COLLABORATOR` → internal team, no credit needed
+- `CONTRIBUTOR`, `FIRST_TIMER`, `FIRST_TIME_CONTRIBUTOR`, `NONE` → outside contributor, credit them
+
+**How to credit:**
+
+Use a markdown link to the contributor's GitHub profile: `[@username](https://github.com/username)`
+
+- In **transports/changelog.md** (enterprise-style): append `(thanks [@username](https://github.com/username)!)` to the description
+  - Example: `- **Logprobs JSON Tag** — Fixed logprobs JSON tag in BifrostResponseChoice (thanks [@contributor](https://github.com/contributor)!)`
+- In **per-module changelog.md** (flat-list): append `(thanks [@username](https://github.com/username)!)` to the entry
+  - Example: `- fix: fixed logprobs JSON tag in BifrostResponseChoice (thanks [@contributor](https://github.com/contributor)!)`
+
+If multiple PRs from the same outside contributor are grouped into one entry, credit them once.
+
+**Present the draft entries to the user for review before writing files.**
+
+#### Per-Module changelog.md (core, framework, plugins)
+
+Write simple flat-list entries to each module's `changelog.md`:
+
+```markdown
+- fix: description of what was fixed
+- feat: description of new feature
+- hotfix: description of urgent fix
+```
+
+For modules with only cascading bumps (no code changes), add a `chore:` entry describing which upstream dependencies were bumped. **No changelog should ever be left empty.** Example:
+
+```markdown
+- chore: upgraded core to v1.5.0 and framework to v1.3.0
+```
+
+If only one upstream changed, mention just that one (e.g., `- chore: upgraded core to v1.5.0`). Always use the actual new versions of the upstream modules.
+
+**Formatting rules for per-module changelogs:**
+- Each entry starts with `- ` followed by the type prefix and colon
+- Use `fix:`, `feat:`, `hotfix:`, or `chore:` prefixes
+- Breaking changes get a `<Note>` or `<Warning>` block indented under the entry
+- Keep entries concise — 1 line per change unless a breaking change note is needed
+
+#### transports/changelog.md (Enterprise-Style Format)
+
+The transports changelog uses a categorized format with bold names. Write it using this template:
+
+```markdown
+## ✨ Features
+
+- **Feature Name** — Description of the feature
+- **Feature Name** — Description of the feature
+
+## 🐞 Fixed
+
+- **Bug Name** — Description of what was fixed
+- **Bug Name** — Description of what was fixed
+```
+
+**Formatting rules for transports/changelog.md:**
+- Use `## ✨ Features` and `## 🐞 Fixed` section headers
+- Each entry uses **bold name** followed by em dash (—) and description
+- Keep descriptions concise — 1-2 lines max per bullet
+- Group related commits into a single bullet point
+- Include changes from ALL modules (transports is the top-level summary)
+- Breaking changes get a `<Warning>` or `<Note>` block indented under the entry
+- Omit sections that have no entries (e.g., if there are no features, skip the Features section)
+- If the release has only cascading bumps and no meaningful features or fixes, add a `## 🔧 Maintenance` section with an entry like: `- **Dependency Upgrades** — Bumped core to v1.5.0 and framework to v1.3.0 across all modules`
+
+### Step 6: Update Version Files
+
+Update the `version` file in each module that was bumped:
+
+```bash
+echo "{new_version}" > core/version
+echo "{new_version}" > framework/version
+echo "{new_version}" > transports/version
+echo "{new_version}" > plugins/{plugin}/version
+```
+
+**Do NOT update go.mod files** — that is handled separately by the developer as part of the release process.
+
+### Step 7: Present Summary
+
+After all files are written, present a summary:
+
+```
+## Changelog Written: v{new_transport_version}
+
+### Files Modified:
+- transports/changelog.md
+- core/changelog.md
+- framework/changelog.md
+- plugins/{changed_plugins}/changelog.md
+- {list of version files updated}
+
+### Version Bumps:
+{table of old → new versions}
+
+### Next Steps:
+1. Review the changelogs
+2. Update go.mod files with new dependency versions
+3. Run `go mod tidy` in each module
+4. Create the docs/changelogs MDX file and update docs.json manually
+5. Tag the release: git tag v{new_transport_version}
+```
+
+## Error Handling
+
+### No Changes Detected
+If git diff shows no changes since the last release:
+```
+No changes detected since the last release (v{last_version}).
+Are you sure you want to create a new changelog?
+```
+Ask the user to confirm or provide a different base commit/tag.
+
+### Version Conflict
+If the calculated new version already has a changelog file in docs:
+```
+A changelog for v{version} already exists at docs/changelogs/v{version}.mdx.
+Would you like to:
+1. Continue anyway (version files and changelog.md will be overwritten)
+2. Choose a different version number
+```
+
+### Missing Module Version File
+If a version file is missing:
+```bash
+# Fallback: read version from go.mod
+grep "^module" {module}/go.mod
+```
+Ask the user what version to use.
+
+## Project Directory Reference
+
+```
+bifrost/
+├── core/
+│   ├── version              # Plain text: "1.5.0"
+│   ├── changelog.md         # Simple flat-list format
+│   └── go.mod
+├── framework/
+│   ├── version              # Plain text: "1.3.0"
+│   ├── changelog.md         # Simple flat-list format
+│   └── go.mod
+├── plugins/
+│   ├── governance/
+│   │   ├── version
+│   │   └── changelog.md     # Simple flat-list format
+│   ├── jsonparser/version
+│   ├── litellmcompat/version
+│   ├── logging/
+│   │   ├── version
+│   │   └── changelog.md     # Simple flat-list format
+│   ├── maxim/version
+│   ├── mocker/version
+│   ├── otel/version
+│   ├── semanticcache/version
+│   └── telemetry/version
+├── transports/
+│   ├── version              # Plain text: "1.5.0"
+│   ├── changelog.md         # Enterprise-style format (✨ Features / 🐞 Fixed)
+│   └── go.mod
+└── docs/
+    ├── changelogs/          # ⚠️ DO NOT TOUCH — MDX files managed separately
+    └── docs.json            # ⚠️ DO NOT TOUCH — navigation managed separately
+```
+
+## Plugin List (Alphabetical Order)
+
+This is the canonical order for plugins:
+
+1. governance
+2. jsonparser
+3. litellmcompat
+4. logging
+5. maxim
+6. mocker
+7. otel
+8. semanticcache
+9. telemetry
--- a/.claude/skills/docs-writer/SKILL.md
+++ b/.claude/skills/docs-writer/SKILL.md
@@ -0,0 +1,859 @@
+---
+name: docs-writer
+description: Write, update, and review Mintlify MDX documentation for Bifrost features. Explores the full codebase (UI, Go backend, config schema), validates config.json examples, places screenshot placeholders, and presents outlines for approval before writing. Invoked with /docs-writer <feature-name>, /docs-writer update <doc-path>, or /docs-writer review <doc-path>.
+allowed-tools: Read, Grep, Glob, Bash, Edit, Write, WebSearch, WebFetch, mcp__context7__resolve-library-id, mcp__context7__query-docs, Task, AskUserQuestion, TodoWrite
+---
+
+# Bifrost Documentation Writer
+
+Write, update, and review Mintlify MDX documentation for Bifrost features. Performs thorough codebase research across both the React UI and Go backend, validates config.json examples against the schema, and follows established documentation conventions.
+
+## Usage
+
+```
+/docs-writer <feature-name>                # Write new docs for a feature
+/docs-writer update <doc-path>             # Update an existing doc page
+/docs-writer review <doc-path>             # Review a doc for accuracy and completeness
+```
+
+## Workflow Overview
+
+1. **Understand the request** -- Determine which feature needs documentation
+2. **Research the codebase** -- Explore UI pages, Go handlers, config schema, and existing docs
+3. **Research external context** -- Use Context7 and WebSearch for additional information
+4. **Ask clarifying questions** -- Confirm scope, audience, and edge cases with the user
+5. **Present doc outline** -- Show a structured outline and wait for approval
+6. **Write the documentation** -- Create the MDX file following Mintlify conventions
+7. **Update navigation** -- Add the new page to docs.json
+8. **Present for review** -- Show the complete doc and incorporate feedback
+
+---
+
+## Step 1: Understand the Request
+
+### For new docs (`/docs-writer <feature-name>`)
+
+Parse the feature name and map it to codebase areas. Common feature-to-directory mappings:
+
+| Feature Name | UI Directory | Handler File | Config Schema Section | Existing Docs |
+|---|---|---|---|---|
+| virtual-keys | `ui/app/workspace/virtual-keys/` | `handlers/governance.go` | `governance.virtual_keys` | `docs/features/governance/virtual-keys.mdx` |
+| routing | `ui/app/workspace/routing-rules/` | `handlers/governance.go` | `governance.routing_rules` | `docs/features/governance/routing.mdx` |
+| providers | `ui/app/workspace/providers/` | `handlers/providers.go` | `providers` | `docs/providers/` |
+| mcp | `ui/app/workspace/mcp-registry/` | `handlers/mcp.go` | `mcp` | `docs/mcp/` |
+| plugins | `ui/app/workspace/plugins/` | `handlers/plugins.go` | `plugins` | `docs/features/plugins/` |
+| logs / observability | `ui/app/workspace/logs/` | `handlers/logging.go` | `client.enable_logging` | `docs/features/observability/` |
+| semantic-caching | `ui/app/workspace/config/caching/` | `handlers/cache.go` | `plugins.semantic_cache` + `vector_store` | `docs/features/semantic-caching.mdx` |
+| guardrails | `ui/app/workspace/guardrails/` | Enterprise | `guardrails_config` | `docs/enterprise/guardrails.mdx` |
+| clustering | `ui/app/workspace/cluster/` | Enterprise | `cluster_config` | `docs/enterprise/clustering.mdx` |
+| load-balancing | `ui/app/workspace/adaptive-routing/` | Enterprise | `load_balancer_config` | `docs/enterprise/adaptive-load-balancing.mdx` |
+| audit-logs | `ui/app/workspace/audit-logs/` | Enterprise | `audit_logs` | `docs/enterprise/audit-logs.mdx` |
+| rbac | `ui/app/workspace/rbac/` | Enterprise | `auth_config` | `docs/enterprise/rbac.mdx` |
+| config | `ui/app/workspace/config/` | `handlers/config.go` | `client` | `docs/quickstart/` |
+| budget-and-limits | `ui/app/workspace/virtual-keys/` | `handlers/governance.go` | `governance.budgets`, `governance.rate_limits` | `docs/features/governance/budget-and-limits.mdx` |
+| mcp-tools | `ui/app/workspace/mcp-tool-groups/` | `handlers/governance.go` | `governance.virtual_keys[].mcp_configs` | `docs/features/governance/mcp-tools.mdx` |
+| fallbacks | N/A (request-level) | `handlers/inference.go` | N/A | `docs/features/fallbacks.mdx` |
+| telemetry | `ui/app/workspace/observability/` | `handlers/plugins.go` | `plugins.prometheus`, `plugins.otel` | `docs/features/telemetry.mdx` |
+
+If the feature name does not match any known mapping, search the codebase:
+```bash
+# Search UI for the feature
+ls ui/app/workspace/ | grep -i "<feature>"
+
+# Search handlers for related endpoints
+grep -rn "<feature>" transports/bifrost-http/handlers/ --include='*.go' -l
+
+# Search config schema
+grep -i "<feature>" transports/config.schema.json | head -20
+
+# Search existing docs
+grep -rn "<feature>" docs/ --include='*.mdx' -l
+```
+
+### For updates (`/docs-writer update <doc-path>`)
+
+Read the existing doc first:
+```bash
+cat docs/<doc-path>
+```
+
+Then identify what has changed by checking recent git history:
+```bash
+# Find recent code changes related to the feature
+git log --oneline -20 -- 'ui/app/workspace/<feature>/' 'transports/bifrost-http/handlers/<handler>.go'
+
+# See actual diffs
+git diff HEAD~10 -- 'ui/app/workspace/<feature>/'
+```
+
+### For reviews (`/docs-writer review <doc-path>`)
+
+Read the doc and cross-reference against the current codebase to identify:
+- Outdated information (API endpoints changed, UI flow changed)
+- Missing config.json fields (new schema properties not documented)
+- Missing screenshots for new UI elements
+- Broken internal links
+- config.json examples that do not match the schema
+
+---
+
+## Step 2: Research the Codebase
+
+**ALWAYS perform thorough research before writing.** This is the most critical step.
+
+### 2a. Explore the UI Code
+
+The UI is a React + Vite + TanStack Router application. Feature pages live under `ui/app/workspace/<feature>/`.
+
+```bash
+# List the feature directory structure
+find ui/app/workspace/<feature>/ -type f -name '*.tsx' -o -name '*.ts' | sort
+
+# Read the main page
+cat ui/app/workspace/<feature>/page.tsx
+
+# Find all views/components
+ls ui/app/workspace/<feature>/views/ 2>/dev/null
+ls ui/app/workspace/<feature>/dialogs/ 2>/dev/null
+ls ui/app/workspace/<feature>/fragments/ 2>/dev/null
+
+# Look for form fields, buttons, and interactive elements
+grep -rn 'label\|placeholder\|data-testid\|FormField\|Input\|Select\|Button' ui/app/workspace/<feature>/ --include='*.tsx' | head -40
+
+# Find API calls from the UI
+grep -rn 'fetch\|api/\|useSWR\|mutate' ui/app/workspace/<feature>/ --include='*.tsx' --include='*.ts' | head -20
+
+# Check shared components used
+grep -rn 'import.*from.*components' ui/app/workspace/<feature>/ --include='*.tsx' | head -20
+```
+
+**What to extract from UI research:**
+- Feature name as shown in the UI (page title, sidebar label)
+- All form fields for create/edit operations (field names, types, validation)
+- Table columns and displayed data
+- Available actions (create, edit, delete, import, export)
+- Navigation flow (how users reach this feature)
+- Any special UI patterns (sheets, dialogs, tabs, accordions)
+
+### 2b. Explore the Go Backend
+
+API handlers live in `transports/bifrost-http/handlers/`.
+
+```bash
+# Find the relevant handler file
+grep -rn '<feature>\|<Feature>' transports/bifrost-http/handlers/ --include='*.go' -l
+
+# Read route registrations
+grep -n '/api/' transports/bifrost-http/handlers/<handler>.go
+
+# Read request/response types
+grep -n 'type.*Request\|type.*Response' transports/bifrost-http/handlers/<handler>.go
+
+# Read the handler functions for create/update operations
+grep -n 'func.*create\|func.*update\|func.*delete\|func.*get' transports/bifrost-http/handlers/<handler>.go
+```
+
+**Complete API route reference by handler:**
+
+| Handler File | Route Prefix | Operations |
+|---|---|---|
+| `governance.go` | `/api/governance/virtual-keys` | CRUD virtual keys |
+| `governance.go` | `/api/governance/teams` | CRUD teams |
+| `governance.go` | `/api/governance/customers` | CRUD customers |
+| `governance.go` | `/api/governance/routing-rules` | CRUD routing rules |
+| `governance.go` | `/api/governance/model-configs` | CRUD model configs |
+| `governance.go` | `/api/governance/providers` | GET/PUT/DELETE provider governance |
+| `governance.go` | `/api/governance/budgets` | GET budgets |
+| `governance.go` | `/api/governance/rate-limits` | GET rate limits |
+| `providers.go` | `/api/providers` | CRUD providers |
+| `providers.go` | `/api/keys` | GET keys |
+| `providers.go` | `/api/models` | GET models |
+| `mcp.go` | `/api/mcp/clients` | GET MCP clients |
+| `mcp.go` | `/api/mcp/client/{id}` | POST/PUT/DELETE MCP client |
+| `logging.go` | `/api/logs` | GET/DELETE logs |
+| `logging.go` | `/api/logs/stats` | GET log stats |
+| `logging.go` | `/api/logs/histogram` | GET log histograms |
+| `plugins.go` | `/api/plugins` | CRUD plugins |
+| `config.go` | `/api/config` | GET/PUT config |
+| `config.go` | `/api/proxy-config` | GET/PUT proxy config |
+| `cache.go` | `/api/cache/clear/{requestId}` | DELETE cache |
+| `session.go` | `/api/session/*` | Login/logout/auth check |
+| `oauth2.go` | `/api/oauth/*` | OAuth callback/status |
+
+**What to extract from backend research:**
+- All API endpoints (method, path, description)
+- Request body fields with types and validation rules
+- Response body structure
+- Error responses and status codes
+- Business logic (what happens when an entity is created/updated)
+
+### 2c. Read the Config Schema
+
+The config schema at `transports/config.schema.json` (~2709 lines) is the source of truth for all `config.json` examples.
+
+```bash
+# Extract a specific section from the schema
+cat transports/config.schema.json | python3 -c "
+import json, sys
+schema = json.load(sys.stdin)
+# For top-level properties:
+section = schema['properties'].get('<section_name>', {})
+print(json.dumps(section, indent=2))
+"
+
+# For $defs references:
+cat transports/config.schema.json | python3 -c "
+import json, sys
+schema = json.load(sys.stdin)
+defn = schema.get('\$defs', {}).get('<def_name>', {})
+print(json.dumps(defn, indent=2))
+"
+```
+
+**Top-level config schema properties:**
+- `encryption_key` - Encryption key configuration
+- `auth_config` - Authentication configuration
+- `client` - Client settings (logging, governance, CORS, etc.)
+- `framework` - Framework configuration (pricing)
+- `providers` - Provider configurations (per-provider with keys)
+- `governance` - Governance (budgets, rate_limits, customers, teams, virtual_keys, routing_rules)
+- `mcp` - MCP client configs and tool manager
+- `vector_store` - Vector store backends (weaviate, redis, qdrant, pinecone)
+- `config_store` - Config store backend (file, postgres)
+- `logs_store` - Log store backend (file, postgres)
+- `cluster_config` - Cluster/multinode configuration
+- `scim_config` - SCIM/SSO configuration
+- `load_balancer_config` - Adaptive load balancer
+- `guardrails_config` - Guardrails configuration
+- `plugins` - Plugin configurations
+- `audit_logs` - Audit log configuration
+
+**Key $defs (reusable types):**
+- `routing_rule` - Routing rule definition
+- `virtual_key_provider_config` - Provider config within a virtual key
+- `virtual_key_mcp_config` - MCP config within a virtual key
+- `provider` / `provider_with_bedrock_config` / `provider_with_azure_config` / `provider_with_vertex_config` - Provider definitions
+- `base_key` / `bedrock_key` / `azure_key` / `vertex_key` - Key definitions
+- `mcp_client_config` / `mcp_tool_manager_config` - MCP configs
+- `weaviate_config` / `redis_config` / `qdrant_config` / `pinecone_config` - Vector store configs
+- `proxy_config` - Proxy configuration
+- `cluster_config` / `scim_config` / `load_balancer_config` / `guardrails_config` - Enterprise configs
+- `pricing_config` / `network_config` / `concurrency_config` - Client sub-configs
+- `audit_logs_config` - Audit logs config
+
+**CRITICAL RULE:** Every `config.json` example in documentation MUST be validated against this schema. Extract the relevant section, verify field names, types, required fields, and allowed values before including in the doc.
+
+### 2d. Check Existing Related Docs
+
+```bash
+# Find all docs that mention this feature
+grep -rn '<feature>' docs/ --include='*.mdx' -l
+
+# Read the most relevant existing doc
+cat docs/features/<category>/<feature>.mdx
+
+# Check cross-references
+grep -rn '<feature>' docs/ --include='*.mdx' | grep -i 'link\|href\|](/\|see\|read more'
+```
+
+### 2e. Check OpenAPI Spec
+
+If the feature has management API endpoints, check if they are in the OpenAPI spec:
+```bash
+ls docs/openapi/paths/management/
+cat docs/openapi/paths/management/<relevant>.yaml 2>/dev/null
+```
+
+---
+
+## Step 3: Research External Context
+
+### 3a. Use Context7 for Library Documentation
+
+If the feature involves external libraries or protocols:
+
+1. Resolve the library ID:
+   ```
+   mcp__context7__resolve-library-id with the library name
+   ```
+
+2. Query relevant documentation:
+   ```
+   mcp__context7__query-docs with the resolved library ID
+   ```
+
+**Common libraries to research:**
+- `mintlify` -- For MDX component syntax (Tabs, Info, Note, etc.)
+- `mark3labs/mcp-go` -- For MCP-related features
+- `react` -- For UI architecture context
+- Provider SDKs -- For provider-specific features
+
+### 3b. Use WebSearch for Additional Context
+
+Search for:
+- Official documentation of providers or protocols being documented
+- Best practices for the feature pattern (e.g., "API gateway rate limiting best practices")
+- Related open-source project documentation for comparison
+
+---
+
+## Step 4: Ask Clarifying Questions
+
+**ALWAYS ask clarifying questions before writing.** Present what you have learned and ask:
+
+```
+## Documentation Plan for: <Feature Name>
+
+### What I Found
+- **UI pages:** <list of discovered UI pages and their purpose>
+- **API endpoints:** <list of relevant endpoints>
+- **Config schema fields:** <list of relevant config fields>
+- **Existing docs:** <list of related docs that already exist>
+
+### Questions
+1. **Audience:** Is this doc for self-hosted OSS users, enterprise users, or both?
+2. **Scope:** Should I cover <list specific sub-features>? Anything to exclude?
+3. **Placement:** I plan to place this at `docs/<path>`. Does that look right?
+4. **Tab coverage:** Should I include all three config methods (Web UI / API / config.json)?
+5. **Related features:** Should I cross-reference <related features>?
+6. <Any feature-specific questions based on ambiguities found during research>
+```
+
+Wait for user responses before proceeding.
+
+---
+
+## Step 5: Present Doc Outline
+
+After clarifying questions are answered, present a structured outline:
+
+```
+## Proposed Outline: <Doc Title>
+
+**File:** `docs/<path>/<filename>.mdx`
+**Navigation:** Will be added to `docs.json` under <group> > <subgroup>
+
+### Frontmatter
+- title: "<Title>"
+- description: "<Description>"
+- icon: "<icon-name>"
+
+### Sections
+1. **Overview** - What the feature does, key benefits, core concepts
+2. **How It Works** - Technical explanation with flow diagram
+3. **Configuration** (Tabs: Web UI / API / config.json)
+   - Web UI steps with screenshot placeholders
+   - API examples with curl commands
+   - config.json examples (validated against schema)
+4. **<Feature-specific sections>** - E.g., "Budget Hierarchy", "Rate Limiting", etc.
+5. **Examples** - Real-world use cases with complete configs
+6. **Troubleshooting** - Common issues and solutions
+7. **Next Steps** - Links to related docs
+
+### Screenshots Needed
+- `![<Description>](../../media/ui-<feature>-<element>.png)` -- <what to capture>
+- ...
+
+### Cross-References
+- [<Related Doc 1>](<path>)
+- [<Related Doc 2>](<path>)
+
+**Proceed with writing?** (yes / no / modify outline)
+```
+
+Wait for approval before writing.
+
+---
+
+## Step 6: Write the Documentation
+
+### 6a. MDX File Structure
+
+Every doc file follows this structure:
+
+```mdx
+---
+title: "<Page Title>"
+description: "<Short description for SEO and navigation>"
+icon: "<fontawesome-icon-name>"
+---
+
+## Overview
+
+<1-3 paragraphs explaining what the feature does and why it matters>
+
+**Key Benefits/Features:**
+- **Benefit 1** - Description
+- **Benefit 2** - Description
+
+---
+
+## <Core Concept Section>
+
+<Explain the core concepts, architecture, or flow>
+
+```mermaid
+graph LR
+    A[Request] --> B{Check}
+    B --> C[Result]
+```
+
+---
+
+## Configuration
+
+<Tabs group="config-method">
+<Tab title="Web UI">
+
+1. Step-by-step instructions
+2. With numbered steps
+
+![Description of UI Element](../../media/ui-<feature>-<element>.png)
+
+3. Continue steps after screenshot
+
+</Tab>
+<Tab title="API">
+
+```bash
+curl -X POST http://localhost:8080/api/<endpoint> \
+  -H "Content-Type: application/json" \
+  -d '{
+    "field": "value"
+  }'
+```
+
+**Response:**
+```json
+{
+  "message": "Success",
+  "data": {}
+}
+```
+
+</Tab>
+<Tab title="config.json">
+
+```json
+{
+  "<section>": {
+    "<field>": "<value>"
+  }
+}
+```
+
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `field` | string | Yes | Description |
+
+</Tab>
+</Tabs>
+
+---
+
+## <Additional Sections>
+
+<Info>
+Important information the user should know.
+</Info>
+
+<Note>
+Caveats or edge cases to be aware of.
+</Note>
+
+<Warning>
+Dangerous or irreversible operations.
+</Warning>
+
+---
+
+## Troubleshooting
+
+### Common Issue 1
+**Symptom:** ...
+**Cause:** ...
+**Fix:** ...
+
+---
+
+## Next Steps
+
+- **[Related Feature 1](../path)** - Brief description
+- **[Related Feature 2](../path)** - Brief description
+```
+
+### 6b. Screenshot Placeholder Rules
+
+**Naming convention:** `ui-<feature>-<element>.png`
+
+Examples from existing docs:
+- `ui-virtual-key.png` - Virtual key creation form
+- `ui-virtual-key-routing.png` - Virtual key routing configuration
+- `ui-virtual-key-provider-config.png` - Provider config within VK
+- `ui-create-teams.png` - Team creation form
+- `ui-create-customer.png` - Customer creation form
+- `ui-config.png` - General config page
+- `ui-mcp-servers-table.png` - MCP servers listing
+- `ui-mcp-new-server.png` - New MCP server form
+- `ui-routing-rules-dashboard.png` - Routing rules overview
+- `ui-semantic-cache-config.png` - Semantic cache settings
+- `ui-tracing-config.png` - Tracing configuration
+
+**Path in docs:** Always use relative path from the doc file location:
+- From `docs/features/governance/*.mdx`: `../../media/ui-<name>.png`
+- From `docs/features/*.mdx`: `../media/ui-<name>.png`
+- From `docs/enterprise/*.mdx`: `../media/ui-<name>.png`
+- From `docs/mcp/*.mdx`: `../media/ui-<name>.png`
+- From `docs/providers/*.mdx`: `../media/ui-<name>.png`
+- From `docs/quickstart/gateway/*.mdx`: `../../media/ui-<name>.png`
+
+**Format:**
+```markdown
+![Descriptive Alt Text](../../media/ui-<feature>-<element>.png)
+```
+
+**Rules:**
+- Alt text should describe what the screenshot shows (e.g., "Virtual Key Provider Configuration Interface")
+- File name should be lowercase, hyphen-separated
+- One screenshot per major UI interaction (creation form, table view, config panel, etc.)
+- Place screenshot AFTER the step that leads to that UI state, not before
+- Check if the screenshot already exists in `docs/media/` before creating a new placeholder
+
+### 6c. Config.json Example Validation
+
+**MANDATORY:** Before writing any config.json example:
+
+1. Read the relevant schema section:
+```bash
+cat transports/config.schema.json | python3 -c "
+import json, sys
+schema = json.load(sys.stdin)
+section = schema['properties']['<top_level_key>']
+print(json.dumps(section, indent=2))
+"
+```
+
+2. For $ref references, resolve them:
+```bash
+cat transports/config.schema.json | python3 -c "
+import json, sys
+schema = json.load(sys.stdin)
+ref = schema['\$defs']['<def_name>']
+print(json.dumps(ref, indent=2))
+"
+```
+
+3. Verify your example includes:
+   - All `required` fields
+   - Correct field types (string, integer, number, boolean, array, object)
+   - Valid enum values where applicable
+   - Proper nesting structure
+   - Realistic example values (not just "string" or 0)
+
+4. Cross-check field names against the Go handler request types:
+```bash
+grep -A 30 'type Create.*Request\|type Update.*Request' transports/bifrost-http/handlers/<handler>.go
+```
+
+### 6d. Mintlify Component Reference
+
+**Callout boxes:**
+```mdx
+<Info>Informational content - tips, best practices, important context</Info>
+<Note>Caveats, edge cases, things to be aware of</Note>
+<Warning>Dangerous operations, irreversible actions, breaking changes</Warning>
+<Tip>Helpful shortcuts or pro tips</Tip>
+```
+
+**Tabs (for multi-method configuration):**
+```mdx
+<Tabs group="config-method">
+<Tab title="Web UI">
+Content for Web UI method
+</Tab>
+<Tab title="API">
+Content for API method
+</Tab>
+<Tab title="config.json">
+Content for config.json method
+</Tab>
+</Tabs>
+```
+
+The `group` attribute ensures tab selection persists across the page. Use consistent group names:
+- `config-method` - For Web UI / API / config.json tabs
+- `sdk` or `gateway` - For Gateway / Go SDK tabs
+- Feature-specific group names for feature-specific tabs
+
+**Code blocks:**
+````mdx
+```bash
+# Shell commands
+```
+
+```json
+{
+  "config": "example"
+}
+```
+
+```go
+// Go code
+```
+
+```python
+# Python code
+```
+
+```typescript
+// TypeScript code
+```
+
+```mermaid
+graph LR
+    A --> B
+```
+````
+
+**Tables:**
+```mdx
+| Column 1 | Column 2 | Column 3 |
+|----------|----------|----------|
+| Value 1  | Value 2  | Value 3  |
+```
+
+### 6e. Cross-Reference Conventions
+
+Use relative links for internal cross-references:
+```mdx
+[Link Text](./sibling-page)
+[Link Text](../parent-dir/page)
+[Link Text](/absolute/from/docs/root)
+[Link Text](./page#section-anchor)
+```
+
+Common cross-reference patterns in Bifrost docs:
+- Governance docs link to each other (virtual-keys, routing, budget-and-limits, mcp-tools)
+- Feature docs link to the quickstart guides
+- Enterprise docs link to the OSS equivalents
+- All config docs link to the architecture pages
+- Provider docs link to the supported providers overview
+
+---
+
+## Step 7: Update Navigation
+
+After writing the doc, add it to `docs/docs.json` in the appropriate location.
+
+### Determining Placement
+
+The navigation structure in `docs.json` follows this hierarchy:
+
+```
+tabs (Documentation, Developer Guides, Deployment Guides, API Reference, Architecture, Benchmarks, Changelogs)
+  └── groups
+      └── pages (strings or nested groups)
+```
+
+**Placement rules:**
+- OSS features go under: `Documentation > Open Source Features`
+- Enterprise features go under: `Documentation > Enterprise Features`
+- Provider-specific guides go under: `Documentation > Providers & Guides`
+- MCP features go under: `Documentation > MCP Gateway`
+- Architecture docs go under: `Architecture` tab
+- Deployment guides go under: `Deployment Guides` tab
+
+**Example: Adding a new governance feature doc:**
+Find the governance group in docs.json:
+```json
+{
+  "group": "Governance",
+  "icon": "user-lock",
+  "pages": [
+    "features/governance/virtual-keys",
+    "features/governance/routing",
+    "features/governance/budget-and-limits",
+    "features/governance/mcp-tools",
+    "features/governance/NEW-PAGE-HERE"
+  ]
+}
+```
+
+**Page path format:** The path in docs.json is relative to `docs/` and omits the `.mdx` extension.
+
+---
+
+## Step 8: Present for Review
+
+After writing the doc and updating navigation, present a summary:
+
+```
+## Documentation Complete: <Feature Name>
+
+### Files Created/Modified
+- **Created:** `docs/<path>/<filename>.mdx` (<line count> lines)
+- **Modified:** `docs/docs.json` (added to <group> navigation)
+
+### Screenshots Needed
+The following screenshot placeholders were added. You will need to capture these:
+1. `docs/media/ui-<feature>-<element>.png` -- <description of what to capture>
+2. ...
+
+### Config.json Validation
+- All config.json examples validated against `transports/config.schema.json`
+- Fields verified: <list of key fields>
+- Schema sections referenced: <list>
+
+### Cross-References Added
+- Links TO this doc from: <none yet / list>
+- Links FROM this doc to: <list of referenced pages>
+
+### Review Checklist
+- [ ] Frontmatter title and description are accurate
+- [ ] All three config methods covered (Web UI / API / config.json)
+- [ ] config.json examples match schema
+- [ ] Screenshot placeholders have correct paths and descriptive alt text
+- [ ] Internal links are valid
+- [ ] Navigation placement in docs.json is correct
+- [ ] No duplicate content with existing docs
+
+**Would you like any changes?**
+```
+
+---
+
+## Mandatory Rules
+
+### Content Rules
+- **ALWAYS** research the codebase before writing -- never write from assumptions
+- **ALWAYS** validate config.json examples against `transports/config.schema.json`
+- **ALWAYS** include all three config methods (Web UI, API, config.json) for configurable features
+- **ALWAYS** use the established screenshot naming convention (`ui-<feature>-<element>.png`)
+- **ALWAYS** present an outline and wait for approval before writing
+- **NEVER** invent API endpoints -- verify them in the handler files
+- **NEVER** guess config field names -- verify them in the schema
+- **NEVER** write documentation for features that do not exist in the codebase
+- **NEVER** copy content from existing docs without adapting it to the specific feature
+
+### Style Rules
+- Use sentence case for headings (e.g., "Budget management" not "Budget Management") -- but follow existing doc convention if different
+- Use `**bold**` for UI element names (e.g., **Virtual Keys**, **Save** button)
+- Use backticks for code references (field names, values, commands)
+- Use numbered steps for sequential instructions (1, 2, 3...)
+- Use bullet points for non-sequential lists
+- Keep paragraphs short (2-4 sentences max)
+- Use horizontal rules (`---`) to separate major sections
+- Use Info/Note/Warning boxes sparingly -- only when the information is genuinely important
+
+### Technical Rules
+- API examples should use `http://localhost:8080` as the base URL
+- API examples should use `curl` with proper flags (`-X`, `-H`, `-d`)
+- config.json examples should be minimal but complete (include all required fields)
+- Field description tables should include: Field, Type, Required, Description
+- Error responses should include the HTTP status code and error body
+- Reset duration format: `1m`, `5m`, `1h`, `1d`, `1w`, `1M`, `1Y`
+
+### Navigation Rules
+- Page paths in docs.json omit the `.mdx` extension
+- Page paths are relative to the `docs/` directory
+- New pages should be added at the end of their group unless order matters logically
+- Group icons use FontAwesome icon names
+
+---
+
+## Project Directory Reference
+
+```
+bifrost/
+├── docs/                              # Mintlify documentation
+│   ├── docs.json                      # Navigation configuration
+│   ├── openapi/                       # OpenAPI spec (auto-generated API reference)
+│   ├── media/                         # Screenshots and images
+│   │   └── ui-*.png                   # UI screenshots (naming convention)
+│   ├── features/                      # Feature documentation
+│   │   ├── governance/                # Virtual keys, routing, budgets, MCP tools
+│   │   ├── observability/             # Logging, tracing, Prometheus, OTel
+│   │   └── plugins/                   # Mocker, JSON parser
+│   ├── enterprise/                    # Enterprise feature docs
+│   ├── mcp/                           # MCP Gateway documentation
+│   ├── providers/                     # Provider guides and supported providers
+│   ├── quickstart/                    # Getting started guides
+│   ├── integrations/                  # SDK integration guides
+│   ├── plugins/                       # Custom plugin development
+│   ├── architecture/                  # Architecture documentation
+│   ├── deployment-guides/             # Deployment guides
+│   ├── contributing/                  # Developer contribution guides
+│   ├── benchmarking/                  # Performance benchmarks
+│   └── changelogs/                    # Version changelogs
+├── ui/                                # React + Vite UI application
+│   └── app/workspace/                 # Feature pages
+│       ├── providers/                 # Provider management
+│       ├── virtual-keys/              # Virtual key management
+│       ├── routing-rules/             # Routing rules
+│       ├── logs/                      # LLM request logs
+│       ├── mcp-registry/              # MCP server registry
+│       ├── mcp-logs/                  # MCP tool execution logs
+│       ├── mcp-tool-groups/           # MCP tool groups
+│       ├── mcp-auth-config/           # MCP OAuth configuration
+│       ├── plugins/                   # Plugin configuration
+│       ├── observability/             # Observability settings
+│       ├── dashboard/                 # Analytics dashboard
+│       ├── config/                    # System configuration
+│       ├── guardrails/                # Guardrails (enterprise)
+│       ├── adaptive-routing/          # Adaptive load balancing (enterprise)
+│       ├── cluster/                   # Clustering (enterprise)
+│       ├── rbac/                      # RBAC (enterprise)
+│       ├── scim/                      # SCIM provisioning (enterprise)
+│       ├── user-groups/               # User groups (enterprise)
+│       ├── audit-logs/                # Audit logs (enterprise)
+│       ├── model-limits/              # Model limits
+│       ├── alert-channels/            # Alert channels
+│       └── prompt-repo/               # Prompt repository
+├── transports/
+│   ├── config.schema.json             # Configuration schema (source of truth)
+│   └── bifrost-http/
+│       └── handlers/                  # HTTP handlers
+│           ├── governance.go          # Governance CRUD
+│           ├── providers.go           # Provider CRUD + keys + models
+│           ├── mcp.go                 # MCP client management
+│           ├── logging.go             # Log queries
+│           ├── plugins.go             # Plugin CRUD
+│           ├── config.go              # Config get/update
+│           ├── cache.go               # Cache management
+│           ├── session.go             # Session/auth management
+│           ├── oauth2.go              # OAuth callback handling
+│           ├── inference.go           # LLM inference routing
+│           ├── mcpinference.go        # MCP inference
+│           ├── mcpserver.go           # MCP server (SSE/streamable)
+│           ├── integrations.go        # SDK integrations
+│           ├── health.go              # Health check
+│           └── websocket.go           # WebSocket handler
+├── core/                              # Go core library
+│   ├── schemas/                       # All Go types/schemas
+│   ├── providers/                     # Provider implementations
+│   └── mcp/                           # MCP protocol implementation
+├── framework/                         # Framework layer
+│   ├── configstore/                   # Config storage backends
+│   └── logstore/                      # Log storage backends
+└── plugins/                           # Go plugins
+```
+
+## Error Handling
+
+### Feature Not Found in Codebase
+If the feature name does not map to any UI page or handler:
+1. Search more broadly: `grep -ri "<feature>" ui/ transports/ core/ --include='*.go' --include='*.tsx' -l`
+2. Ask the user: "I could not find a UI page or API handler for '<feature>'. Can you point me to the relevant code?"
+3. The feature might be code-only (Go SDK) with no UI -- adjust the doc accordingly (no Web UI tab)
+
+### Config Schema Section Not Found
+If the feature does not have a config.json section:
+1. Some features are API/UI-only (no file-based config)
+2. Skip the config.json tab in that case
+3. Note in the doc: "This feature is configured via the Web UI or API only."
+
+### Existing Doc Conflicts
+If a doc already exists for the feature:
+1. Read the existing doc thoroughly
+2. Ask the user: "A doc already exists at `<path>`. Should I update it or create a new companion page?"
+3. If updating, make targeted edits rather than rewriting the entire doc
--- a/.claude/skills/e2e-test/SKILL.md
+++ b/.claude/skills/e2e-test/SKILL.md
@@ -0,0 +1,950 @@
+
+---
+name: e2e-test
+description: Write, run, debug, audit, and auto-update Playwright E2E tests for the Bifrost UI. Use when asked to create new E2E tests, add test coverage, fix flaky tests, debug failing tests, audit test correctness, update tests after UI changes, or sync tests with modified components. Invoked with /e2e-test <FEATURE_NAME>, /e2e-test fix <SPEC_FILE>, /e2e-test sync, or /e2e-test audit.
+allowed-tools: Read, Grep, Glob, Bash, Edit, Write, Task, AskUserQuestion, TodoWrite
+---
+
+# Playwright E2E Testing
+
+Write, run, debug, and auto-update Playwright E2E tests following Bifrost's established patterns and conventions. Automatically detects UI changes and updates affected tests.
+
+## Usage
+
+```
+/e2e-test <FEATURE_NAME>              # Create or update tests for a feature
+/e2e-test fix <SPEC_FILE>             # Debug and fix a failing test
+/e2e-test run <FEATURE_NAME>          # Run tests for a specific feature
+/e2e-test run                         # Run all E2E tests
+/e2e-test sync                        # Detect UI changes and update affected tests
+/e2e-test sync <FEATURE_NAME>         # Sync tests for a specific feature with UI changes
+/e2e-test audit                       # Audit all specs for incorrect/weak assertions
+/e2e-test audit <FEATURE_NAME>        # Audit a specific feature's specs
+```
+
+## Workflow Overview
+
+1. **Understand the feature** - Read the UI code to understand what needs testing
+2. **Check existing tests** - Review existing test patterns for the feature or similar features
+3. **Identify data-testid attributes** - Find selectors in the UI code
+4. **Write/update tests** - Follow the established patterns (page objects, data factories, fixtures)
+5. **Run the tests** - Execute and verify they pass
+6. **Fix failures** - Debug and fix any issues
+
+## Auto-Update Workflow (sync mode)
+
+When invoked with `sync`, or when UI changes are detected, automatically update E2E tests:
+
+### Step 0: Detect What Changed
+
+Detect UI changes by checking git diff against the base branch:
+
+```bash
+# Get all changed UI files
+git diff main --name-only -- 'ui/'
+
+# Get changed files with diff content for analysis
+git diff main -- 'ui/app/workspace/' 'ui/components/'
+```
+
+Categorize changes into:
+- **data-testid changes** - Renamed, removed, or added test IDs
+- **Route/URL changes** - Modified page routes or navigation paths
+- **Component structure changes** - New/removed form fields, buttons, tables, dialogs
+- **API endpoint changes** - Modified API calls that tests rely on
+- **New features** - Entirely new pages or components that need test coverage
+
+### Step 1: Map UI Changes to Affected Tests
+
+For each changed UI file, find the corresponding test files:
+
+```
+UI File Path                              → Test Feature Folder
+ui/app/workspace/providers/**             → tests/e2e/features/providers/
+ui/app/workspace/virtual-keys/**          → tests/e2e/features/virtual-keys/
+ui/app/workspace/dashboard/**             → tests/e2e/features/dashboard/
+ui/app/workspace/logs/**                  → tests/e2e/features/logs/
+ui/app/workspace/mcp-logs/**              → tests/e2e/features/mcp-logs/
+ui/app/workspace/mcp-registry/**          → tests/e2e/features/mcp-registry/
+ui/app/workspace/routing-rules/**         → tests/e2e/features/routing-rules/
+ui/app/workspace/observability/**         → tests/e2e/features/observability/
+ui/app/workspace/config/**                → tests/e2e/features/config/
+ui/app/workspace/plugins/**              → tests/e2e/features/plugins/
+ui/components/sidebar.tsx                 → tests/e2e/core/pages/sidebar.page.ts
+ui/components/**                          → May affect multiple test features
+```
+
+### Step 2: Analyze Each Change Type and Update
+
+**A. data-testid renamed or removed:**
+1. Search the old testid across all test files: `grep -r 'old-testid' tests/e2e/`
+2. Update every reference in page objects, selectors, and spec files
+3. If a testid was removed without replacement, check if the element still exists with a different selector and add a new testid to the UI
+
+**B. Form fields added/removed:**
+1. If a new form field was added to a create/edit form:
+   - Add the field to the page object's interface (e.g., `FeatureConfig`)
+   - Add a locator for the field in the page object constructor
+   - Update the `createFeature()` / `editFeature()` methods to fill the field
+   - Update the data factory to include the new field with a default value
+   - Add a test case that exercises the new field
+2. If a form field was removed:
+   - Remove it from the interface, locators, and methods
+   - Remove or update test cases that depended on it
+
+**C. New buttons/actions added:**
+1. Add locators to the page object
+2. Add methods to interact with the new action
+3. Add test cases covering the new action
+
+**D. Route changes:**
+1. Update `goto()` methods in page objects
+2. Update navigation helpers in `core/actions/navigation.ts`
+3. Update any hardcoded URLs in test specs
+
+**E. API endpoint changes:**
+1. Update `core/actions/api.ts` helpers
+2. Update any `waitForResponse()` URL patterns in page objects
+
+**F. New page/feature added:**
+1. Create the full test structure (page object, data factory, spec, fixture registration)
+2. Follow Step 3 from the main workflow
+
+### Step 3: Validate Updates
+
+After making changes, run the affected tests to verify:
+
+```bash
+# Run only the affected feature tests
+npx playwright test features/<affected-feature> --reporter=list
+
+# If multiple features affected, run them all
+npx playwright test features/providers features/virtual-keys --reporter=list
+```
+
+### Step 4: Report Changes
+
+Present a summary to the user:
+```
+## E2E Test Sync Summary
+
+### UI Changes Detected
+- <list of changed UI files>
+
+### Tests Updated
+- **<feature>.page.ts**: Updated locators for renamed data-testid, added new field method
+- **<feature>.spec.ts**: Added test for new "export" button, updated form fill sequence
+- **<feature>.data.ts**: Added new field to factory defaults
+
+### Tests Created
+- **<new-feature>/**: Full test suite for new feature (page object, data, spec)
+
+### Tests Requiring Manual Review
+- <any changes that couldn't be auto-resolved, e.g., complex interaction flow changes>
+
+### Verification
+- Ran `npx playwright test features/<feature>` → X passed, Y failed
+```
+
+### Important: Proactive Sync Triggers
+
+**ALWAYS** check for and sync E2E tests when ANY of these happen:
+
+1. **You modify a UI component** that has `data-testid` attributes — check if tests reference those IDs
+2. **You add a new `data-testid`** to a component — consider if existing tests should use it
+3. **You rename or remove a component/prop** — search for test references and update them
+4. **You change a form's fields** (add/remove inputs, change validation) — update page object methods and test data
+5. **You modify an API route** that tests call — update API helpers and response wait patterns
+6. **You change page navigation/routing** — update `goto()` methods and navigation helpers
+
+To check quickly if tests are affected by your UI change:
+```bash
+# Find test files that reference any testid from the changed component
+grep -rl 'data-testid-from-component' tests/e2e/
+```
+
+---
+
+## Audit Workflow (audit mode) — Fix Incorrect Specs
+
+Tests that pass but validate the wrong things are worse than no tests — they give false confidence. When invoked with `audit`, systematically scan specs for correctness issues and fix them.
+
+### Step 0: Scope the Audit
+
+```bash
+# Audit all specs
+/e2e-test audit
+
+# Audit a specific feature
+/e2e-test audit virtual-keys
+```
+
+If a specific feature is given, read only that feature's spec, page object, and data files. Otherwise, scan all `features/**/*.spec.ts` files.
+
+### Step 1: Read the UI Code First
+
+For every spec file being audited, **read the actual UI component code** to understand what the UI really does. This is the source of truth — tests must match real behavior, not assumed behavior.
+
+```bash
+# For each feature, read the UI code
+# e.g. for virtual-keys:
+grep -r 'data-testid' ui/app/workspace/virtual-keys/ --include='*.tsx'
+```
+
+Understand:
+- What fields are in the create/edit form (from the UI code, not the test)
+- What the save button actually does (API call, toast, sheet close)
+- What the table actually renders (columns, row content, empty state)
+- What validation rules exist (required fields, format checks)
+- What error states exist (API failures, permission errors)
+
+### Step 2: Scan for Incorrect Assertion Patterns
+
+Check each spec file for these **anti-patterns** (ordered by severity):
+
+#### P0 — Tests That Can Never Fail (always-true assertions)
+
+These are the most dangerous — they provide zero coverage while appearing green.
+
+```typescript
+// WRONG: Always true — count is always >= 0
+const count = await page.getCount()
+expect(count >= 0).toBe(true)
+
+// WRONG: Always true — empty string is a string
+const text = await element.textContent()
+expect(typeof text).toBe('string')
+
+// WRONG: Always true — isVisible returns a boolean, not asserted correctly
+const visible = await element.isVisible()
+// (no assertion at all, or expect(visible).toBeDefined())
+
+// WRONG: Catching error means it never fails
+try { await doSomething(); expect(true).toBe(true) } catch { expect(true).toBe(true) }
+```
+
+**Fix:** Replace with deterministic assertions that verify actual expected state.
+
+#### P1 — Tests That Assert Existence But Not Correctness
+
+The test creates an item and checks it exists, but never verifies the item has the right data.
+
+```typescript
+// WEAK: Only checks existence, not content
+await page.createVirtualKey({ name: 'Test VK', description: 'My desc', budget: { maxLimit: 100 } })
+const exists = await page.virtualKeyExists('Test VK')
+expect(exists).toBe(true)
+// Never checks that description, budget, or other fields were actually saved correctly
+```
+
+**Fix:** After create, open/view the item and verify its fields match what was submitted. Compare against the actual UI state:
+```typescript
+// BETTER: Verify the data was saved correctly
+await page.viewVirtualKey(vkData.name)
+await expect(page.descriptionInput).toHaveValue('My desc')
+await expect(page.page.locator('#budgetMaxLimit')).toHaveValue('100')
+```
+
+#### P2 — Tests That Assert the Wrong Thing
+
+The test name says one thing but asserts something unrelated.
+
+```typescript
+// WRONG: Test says "should validate email" but only checks button state
+test('should validate email format', async ({ page }) => {
+  await page.fillEmail('invalid')
+  await expect(page.saveBtn).toBeDisabled() // Tests button, not validation message
+})
+
+// WRONG: Test says "should delete" but only checks toast, not actual deletion
+test('should delete item', async ({ page }) => {
+  await page.deleteItem('foo')
+  await page.waitForSuccessToast()
+  // Never checks the item is actually gone from the table
+})
+```
+
+**Fix:** Align assertions with test intent. A delete test must verify the item is gone. A validation test must check the validation message.
+
+#### P3 — Tests With Swallowed Errors / Catch-All Handlers
+
+Tests that catch errors and silently continue, hiding real failures.
+
+```typescript
+// WRONG: Error is caught and ignored — test always passes
+const isVisible = await element.isVisible().catch(() => false)
+if (isVisible) {
+  expect(isVisible).toBe(true) // Only asserts when visible, silently passes when not
+}
+
+// WRONG: Optional assertion that can be skipped entirely
+const providerSection = page.getByText(/Providers/i).first()
+const isProviderVisible = await providerSection.isVisible().catch(() => false)
+if (isProviderVisible) {
+  expect(isProviderVisible).toBe(true) // Tautology when reached, skipped when not
+}
+```
+
+**Fix:** Remove the catch/conditional and make the assertion unconditional. If the element should be visible, assert it directly:
+```typescript
+await expect(element).toBeVisible()
+```
+
+If the state genuinely depends on external factors, use count-based branching with **both** branches making meaningful assertions:
+```typescript
+const count = await page.getCount()
+if (count === 0) {
+  await expect(page.emptyState).toBeVisible()
+} else {
+  expect(count).toBeGreaterThan(0)
+  await expect(page.emptyState).not.toBeVisible()
+}
+```
+
+#### P4 — Tests That Don't Assert After Actions
+
+Tests that perform actions (clicks, form fills, navigation) but have no assertion afterward.
+
+```typescript
+// WRONG: Action with no assertion — test only checks nothing throws
+test('should toggle visibility', async ({ page }) => {
+  await page.toggleKeyVisibility('my-key')
+  await page.toggleKeyVisibility('my-key')
+  // No assertion that the visibility state actually changed
+})
+```
+
+**Fix:** Add assertions verifying the action's observable effect:
+```typescript
+test('should toggle visibility', async ({ page }) => {
+  await page.toggleKeyVisibility('my-key')
+  // Verify key value is now visible
+  await expect(page.getKeyValueText('my-key')).toBeVisible()
+
+  await page.toggleKeyVisibility('my-key')
+  // Verify key value is now hidden again
+  await expect(page.getKeyValueText('my-key')).not.toBeVisible()
+})
+```
+
+#### P5 — Tests Asserting Against Stale State
+
+Tests that read state before an action completes, or compare to a stale snapshot.
+
+```typescript
+// WRONG: Reads count before table finishes refreshing
+await page.deleteItem('foo')
+const count = await page.getCount() // Table hasn't refreshed yet!
+expect(count).toBe(previousCount - 1) // May pass by coincidence
+```
+
+**Fix:** Wait for the state change to complete before asserting:
+```typescript
+await page.deleteItem('foo')
+await page.waitForItemGone('foo') // Wait for table refresh
+const count = await page.getCount()
+expect(count).toBe(previousCount - 1)
+```
+
+#### P6 — Tests That Duplicate Other Tests Without Additional Value
+
+Multiple tests covering the exact same code path with only cosmetic differences (different name strings, same logic).
+
+```typescript
+// REDUNDANT: These three tests do the same thing with different budget values
+test('should create VK with small budget', ...)   // creates + checks exists
+test('should create VK with medium budget', ...)  // creates + checks exists
+test('should create VK with daily budget', ...)   // creates + checks exists
+// None of them verify the budget value was saved correctly
+```
+
+**Fix:** Either consolidate into a parameterized test, or make each test verify something unique (e.g., verify the actual budget value appears in the UI).
+
+### Step 3: Cross-Check Against UI Code
+
+For each test, verify:
+
+1. **Form fields match** - Does the test fill all required fields from the UI? Does it test fields that actually exist?
+2. **Selectors are correct** - Does `data-testid="vk-name-input"` actually exist in the current UI code? Has it been renamed?
+3. **Behavior matches** - If the UI shows a confirmation dialog on delete, does the test handle it? If the form has validation, do tests exercise it?
+4. **Error paths exist** - Does the UI show error toasts? Are there tests for error scenarios?
+5. **New UI capabilities untested** - Has the UI added new features (export, filter, sort, pagination) that have no test coverage?
+
+```bash
+# Compare what testids exist in UI vs what tests reference
+grep -roh 'data-testid="[^"]*"' ui/app/workspace/<feature>/ | sort -u > /tmp/ui-testids.txt
+grep -roh "getByTestId('[^']*')\|getByTestId(\"[^\"]*\")" tests/e2e/features/<feature>/ | sort -u > /tmp/test-testids.txt
+# Diff to find gaps
+diff /tmp/ui-testids.txt /tmp/test-testids.txt
+```
+
+### Step 4: Fix and Strengthen
+
+For each issue found:
+
+1. **Read the UI code** for that specific component/interaction
+2. **Understand the expected behavior** from the UI implementation
+3. **Rewrite the assertion** to verify actual, observable, correct state
+4. **Run the fixed test** to ensure it passes with correct behavior and **would fail** if the behavior broke
+
+### Step 5: Report Findings
+
+Present results to the user:
+
+```
+## E2E Audit Report — <feature>
+
+### Issues Found: X
+
+#### P0 — Always-True Assertions (X found)
+| Test | File:Line | Issue | Fix |
+|------|-----------|-------|-----|
+| "should show empty state" | vk.spec.ts:374 | `count >= 0` always true | Use count-based branching |
+
+#### P1 — Missing Data Verification (X found)
+| Test | File:Line | Issue | Fix |
+|------|-----------|-------|-----|
+| "should create with budget" | vk.spec.ts:108 | Only checks `exists`, not budget value | Add field verification after create |
+
+#### P2 — Wrong Assertion Target (X found)
+...
+
+#### P3 — Swallowed Errors (X found)
+...
+
+#### P4 — Missing Assertions (X found)
+...
+
+#### P5 — Stale State (X found)
+...
+
+#### P6 — Redundant Tests (X found)
+...
+
+### Coverage Gaps
+- No test for: <UI feature that exists but has no test>
+- Missing error path test for: <error scenario>
+
+### Summary
+- X tests fixed
+- X tests need manual review (complex interaction flows)
+- X new tests added for coverage gaps
+```
+
+---
+
+## Project Structure
+
+All E2E tests live in `tests/e2e/`:
+
+```
+tests/e2e/
+├── playwright.config.ts           # Playwright configuration
+├── global-setup.ts                # Global setup (plugin build, MCP servers, Bifrost connectivity)
+├── core/                          # Shared utilities & fixtures
+│   ├── fixtures/
+│   │   ├── base.fixture.ts        # Main fixture - exports `test` and `expect` with all page objects
+│   │   └── test-data.fixture.ts   # TestDataFactory for generating unique test data
+│   ├── pages/
+│   │   ├── base.page.ts           # BasePage class with common methods (toasts, forms, waits)
+│   │   └── sidebar.page.ts        # Sidebar navigation
+│   ├── actions/
+│   │   ├── navigation.ts          # Navigation helpers (goToProviders, goToVirtualKeys, etc.)
+│   │   └── api.ts                 # API helpers for setup/cleanup (providersApi, virtualKeysApi, etc.)
+│   └── utils/
+│       ├── selectors.ts           # Centralized selector definitions
+│       └── test-helpers.ts        # Utilities: waitForNetworkIdle, retry, fillSelect, assertToast, etc.
+└── features/                      # One folder per feature
+    └── <feature>/
+        ├── <feature>.spec.ts      # Test cases
+        ├── <feature>.data.ts      # Test data factories & sample constants
+        └── pages/
+            └── <feature>.page.ts  # Page object extending BasePage
+```
+
+## Step 1: Understand the Feature
+
+Before writing tests, read the relevant UI code to understand:
+
+- What pages/routes exist for the feature
+- What `data-testid` attributes are already in the UI components
+- What CRUD operations are available
+- What form fields, buttons, and interactive elements exist
+- What API endpoints the UI calls
+
+**Search for data-testid in UI code:**
+```bash
+# Find all data-testid attributes for a feature
+grep -r 'data-testid' ui/app/workspace/<feature>/ --include='*.tsx' --include='*.ts'
+```
+
+**Check what routes exist:**
+```bash
+ls ui/app/workspace/
+```
+
+## Step 2: Check Existing Tests
+
+Always review existing patterns before writing new tests:
+
+```bash
+# List all existing feature test folders
+ls tests/e2e/features/
+
+# Read an existing spec for patterns
+cat tests/e2e/features/virtual-keys/virtual-keys.spec.ts
+
+# Read existing page objects for patterns
+cat tests/e2e/features/virtual-keys/pages/virtual-keys.page.ts
+```
+
+## Step 3: Create the Feature Test Structure
+
+For a new feature, create these files:
+
+### 3a. Page Object (`features/<feature>/pages/<feature>.page.ts`)
+
+**CRITICAL RULES:**
+- Always extend `BasePage`
+- Define all locators in the constructor using `page.getByTestId()`
+- Use `data-testid` attributes as the primary selector strategy
+- Use `page.getByRole()` as a secondary strategy
+- NEVER use brittle CSS selectors or chained parent locators (`.locator('..')`)
+- Methods should be async and use semantic waits
+- Include `goto()`, CRUD methods, and a `cleanup` method
+
+**Template:**
+```typescript
+import { Locator, Page, expect } from '@playwright/test'
+import { BasePage } from '../../../core/pages/base.page'
+import { waitForNetworkIdle } from '../../../core/utils/test-helpers'
+
+// Define interfaces for the feature's data
+export interface FeatureConfig {
+  name: string
+  description?: string
+  // ... other fields
+}
+
+export class FeaturePage extends BasePage {
+  // Main page elements
+  readonly createBtn: Locator
+  readonly table: Locator
+
+  // Sheet/form elements
+  readonly sheet: Locator
+  readonly nameInput: Locator
+  readonly saveBtn: Locator
+  readonly cancelBtn: Locator
+
+  constructor(page: Page) {
+    super(page)
+
+    // Use getByTestId for all locators
+    this.createBtn = page.getByTestId('create-feature-btn')
+    this.table = page.getByTestId('feature-table')
+    this.sheet = page.getByTestId('feature-sheet')
+    this.nameInput = page.getByTestId('feature-name-input')
+    this.saveBtn = page.getByTestId('feature-save-btn')
+    this.cancelBtn = page.getByTestId('feature-cancel-btn')
+  }
+
+  async goto(): Promise<void> {
+    await this.page.goto('/workspace/<feature>')
+    await waitForNetworkIdle(this.page)
+  }
+
+  async createFeature(config: FeatureConfig): Promise<void> {
+    await this.createBtn.click()
+    await expect(this.sheet).toBeVisible()
+    await this.waitForSheetAnimation()
+
+    // Fill form fields
+    await this.nameInput.fill(config.name)
+
+    // Save
+    await this.saveBtn.click()
+    await this.waitForSuccessToast()
+    await this.dismissToasts()
+    await expect(this.sheet).not.toBeVisible({ timeout: 5000 })
+  }
+
+  async featureExists(name: string): Promise<boolean> {
+    const row = this.page.getByTestId(`feature-row-${name}`)
+    return (await row.count()) > 0
+  }
+
+  async deleteFeature(name: string): Promise<void> {
+    const deleteBtn = this.page.getByTestId(`feature-delete-btn-${name}`)
+    await deleteBtn.click()
+
+    // Handle confirmation dialog
+    const confirmDialog = this.page.locator('[role="alertdialog"]')
+    await confirmDialog.waitFor({ state: 'visible', timeout: 5000 })
+    const confirmBtn = confirmDialog.getByRole('button', { name: /Delete/i })
+    await confirmBtn.click()
+
+    await this.waitForSuccessToast()
+    await this.dismissToasts()
+  }
+
+  async closeSheet(): Promise<void> {
+    const isSheetVisible = await this.sheet.isVisible().catch(() => false)
+    if (isSheetVisible) {
+      const closeBtn = this.sheet.locator('button[aria-label*="close"], button:has(svg.lucide-x)').first()
+      if (await closeBtn.isVisible()) {
+        await closeBtn.click()
+      }
+      await expect(this.sheet).not.toBeVisible({ timeout: 5000 }).catch(() => {})
+    }
+  }
+
+  async cleanupFeatures(names: string[]): Promise<void> {
+    if (names.length === 0) return
+    await this.goto()
+    await this.closeSheet()
+    await this.dismissToasts()
+    for (const name of names) {
+      try {
+        const exists = await this.featureExists(name)
+        if (!exists) continue
+        await this.closeSheet()
+        await this.deleteFeature(name)
+      } catch (error) {
+        console.error(`[CLEANUP] Failed to delete: ${name}`)
+      }
+    }
+  }
+}
+```
+
+### 3b. Test Data Factory (`features/<feature>/<feature>.data.ts`)
+
+**CRITICAL RULES:**
+- Use `Date.now()` for unique test names
+- Provide factory functions with sensible defaults
+- Allow partial overrides via the spread pattern
+- Create sample constant objects for reusable configurations
+- **Never marshal payloads to a `Record`/`Map` and re-serialize** — field ordering matters for backend validation and snapshot comparisons. Always construct payloads as object literals with fields in the intended order. Do NOT use `Object.fromEntries()`, `JSON.parse(JSON.stringify(...))` round-trips, or destructure into an intermediate `Record<string, unknown>` — these can reorder fields.
+
+**Template:**
+```typescript
+import { FeatureConfig } from './pages/feature.page'
+
+export function createFeatureData(overrides: Partial<FeatureConfig> = {}): FeatureConfig {
+  const timestamp = Date.now()
+  return {
+    name: `Test Feature ${timestamp}`,
+    description: 'E2E test feature',
+    // ... sensible defaults
+    ...overrides,
+  }
+}
+
+// Sample configurations for different scenarios
+export const SAMPLE_CONFIGS = {
+  basic: { /* ... */ },
+  advanced: { /* ... */ },
+} as const
+```
+
+### 3c. Test Spec (`features/<feature>/<feature>.spec.ts`)
+
+**CRITICAL RULES:**
+- Import `test` and `expect` from `../../core/fixtures/base.fixture`
+- Track created resources in arrays for cleanup in `afterEach`
+- Use `test.describe()` blocks for logical grouping
+- Use `test.beforeEach()` to navigate to the page
+- Use `test.afterEach()` to clean up resources
+- Use unique names with `Date.now()` for test data
+- Write deterministic assertions (never `expect(count >= 0).toBe(true)`)
+- Use `test.describe.configure({ mode: 'serial' })` when tests have write ordering dependencies
+- **Never marshal API payloads to a `Record`/`Map`** — pass object literals directly to Playwright's `request.post({ data })`. Marshaling through an intermediate map can reorder fields, which breaks backend validation and snapshot comparisons.
+
+**Template:**
+```typescript
+import { expect, test } from '../../core/fixtures/base.fixture'
+import { createFeatureData } from './feature.data'
+
+const createdItems: string[] = []
+
+test.describe('Feature Name', () => {
+  test.beforeEach(async ({ featurePage }) => {
+    await featurePage.goto()
+  })
+
+  test.afterEach(async ({ featurePage }) => {
+    await featurePage.closeSheet()
+    if (createdItems.length > 0) {
+      await featurePage.cleanupFeatures([...createdItems])
+      createdItems.length = 0
+    }
+  })
+
+  test.describe('Creation', () => {
+    test('should display create button', async ({ featurePage }) => {
+      await expect(featurePage.createBtn).toBeVisible()
+    })
+
+    test('should create a basic item', async ({ featurePage }) => {
+      const data = createFeatureData({ name: `Basic Test ${Date.now()}` })
+      createdItems.push(data.name)
+
+      await featurePage.createFeature(data)
+
+      const exists = await featurePage.featureExists(data.name)
+      expect(exists).toBe(true)
+    })
+  })
+
+  test.describe('Deletion', () => {
+    test('should delete item', async ({ featurePage }) => {
+      const data = createFeatureData({ name: `Delete Test ${Date.now()}` })
+      await featurePage.createFeature(data)
+
+      let exists = await featurePage.featureExists(data.name)
+      expect(exists).toBe(true)
+
+      await featurePage.deleteFeature(data.name)
+      // No need to track for cleanup since we just deleted it
+
+      exists = await featurePage.featureExists(data.name)
+      expect(exists).toBe(false)
+    })
+  })
+})
+```
+
+### 3d. Register the Page Object in Fixtures
+
+If creating a brand new feature, add it to `core/fixtures/base.fixture.ts`:
+
+1. Import the page class
+2. Add to `BifrostFixtures` type
+3. Add fixture definition
+
+```typescript
+// In base.fixture.ts
+import { FeaturePage } from '../../features/<feature>/pages/<feature>.page'
+
+type BifrostFixtures = {
+  // ... existing
+  featurePage: FeaturePage
+}
+
+export const test = base.extend<BifrostFixtures>({
+  // ... existing
+  featurePage: async ({ page }, use) => {
+    await use(new FeaturePage(page))
+  },
+})
+```
+
+### 3e. Add npm Script (optional)
+
+In `tests/e2e/package.json`, add a feature-specific test script:
+```json
+{
+  "scripts": {
+    "test:<feature>": "playwright test features/<feature>"
+  }
+}
+```
+
+## Step 4: Run Tests
+
+```bash
+# From tests/e2e/ directory
+cd tests/e2e
+
+# Run all tests
+npx playwright test
+
+# Run specific feature
+npx playwright test features/<feature>
+
+# Run in headed mode (see the browser)
+npx playwright test features/<feature> --headed
+
+# Run with Playwright UI (interactive)
+npx playwright test --ui
+
+# Run with debug inspector
+npx playwright test features/<feature> --debug
+
+# Run a single test by title
+npx playwright test -g "should create a basic item"
+
+# From project root via Makefile
+make run-e2e FLOW=<feature>
+make run-e2e-headed FLOW=<feature>
+```
+
+**Environment variables:**
+- `BASE_URL` - Override app URL (default: http://localhost:3000)
+- `BIFROST_BASE_URL` - Override Bifrost API URL (default: http://localhost:8080)
+- `SKIP_WEB_SERVER=1` - Skip auto-starting Vite dev server
+- `CI=1` - Enable CI mode (retries, serial execution)
+
+## Step 5: Debug Failing Tests
+
+### Common Issues and Fixes
+
+**1. Element not found / timeout:**
+- Check if the `data-testid` attribute exists in the UI component
+- Verify the page has loaded: add `await waitForNetworkIdle(page)` after navigation
+- Check for loading spinners: `await page.locator('[data-testid="loading-spinner"]').waitFor({ state: 'hidden' })`
+
+**2. Toast interfering with clicks:**
+- Dismiss toasts before interactions: `await page.dismissToasts()` or `await page.forceCloseToasts()`
+
+**3. Sheet/dialog animation not complete:**
+- Wait for animation: `await page.waitForSheetAnimation()`
+- Wait for sheet visibility: `await expect(sheet).toBeVisible()`
+
+**4. Stale element after table refresh:**
+- Re-query the locator after data changes (don't reuse old locator references)
+- Use polling patterns like `waitForVirtualKeyGone()` for eventual consistency
+
+**5. Tests pass individually but fail together:**
+- Check cleanup: ensure `afterEach` deletes all created resources
+- Use unique names with `Date.now()` timestamps
+- Check for state pollution between test files
+
+**6. Flaky Radix/Shadcn Select:**
+- Use the `fillSelect()` helper from `core/utils/test-helpers.ts`
+- Wait for `[role="listbox"]` to appear before clicking options
+- Wait for `[role="listbox"]` to disappear after selection
+
+### Debugging Tools
+
+```bash
+# View the HTML report of the last run
+npx playwright show-report
+
+# Generate test code by recording browser actions
+npx playwright codegen http://localhost:3000
+
+# Run with trace viewer (records every action)
+npx playwright test --trace on
+```
+
+### Trace and Screenshots
+
+- **Screenshots:** Taken automatically on failure, saved to `test-results/`
+- **Traces:** Captured on first retry, viewable via `npx playwright show-trace <trace.zip>`
+- **Videos:** Retained on failure, saved to `test-results/`
+
+## Mandatory Rules
+
+These rules MUST be followed at all times:
+
+### Selectors
+- **ALWAYS** use `data-testid` attributes as the primary selector strategy
+- **ALWAYS** use `page.getByTestId()` or `page.getByRole()` - never raw CSS selectors for interactive elements
+- **NEVER** use chained parent locators (`.locator('..')`)
+- **NEVER** use `{ force: true }` on clicks - fix the underlying visibility issue instead
+- If a needed `data-testid` doesn't exist in the UI, add it to the UI component first
+
+### Waits
+- **ALWAYS** use semantic waits (`waitFor()`, `expect().toBeVisible()`, `waitForLoadState()`)
+- **NEVER** use `page.waitForTimeout()` except as last resort in cleanup/polling (and document why)
+- **ALWAYS** wait for page load after navigation: `await waitForNetworkIdle(page)`
+- **ALWAYS** wait for sheet animations before interacting with sheet contents
+
+### Test Data
+- **ALWAYS** use `Date.now()` or `TestDataFactory.uniqueId()` for unique test names
+- **NEVER** use static/hardcoded test data names (causes collisions in parallel runs)
+- **ALWAYS** create test data factory functions in `<feature>.data.ts`
+
+### Cleanup
+- **ALWAYS** track created resources in arrays and delete them in `afterEach`
+- **ALWAYS** close open sheets before cleanup: `await page.closeSheet()`
+- **ALWAYS** dismiss toasts before interactive operations
+- Cleanup failures should `console.error` and continue, never throw
+
+### Assertions
+- **ALWAYS** write deterministic assertions that can actually fail
+- **NEVER** write `expect(count >= 0).toBe(true)` - this always passes
+- Use count-based branching for state-dependent assertions:
+  ```typescript
+  if (count === 0) {
+    await expect(emptyState).toBeVisible()
+  } else {
+    expect(count).toBeGreaterThan(0)
+  }
+  ```
+
+### Imports
+- **ALWAYS** import `test` and `expect` from `../../core/fixtures/base.fixture`
+- **NEVER** import directly from `@playwright/test` in spec files (use the custom fixture)
+
+### Adding data-testid to UI
+
+When a UI component is missing a required `data-testid`, add it directly. Convention:
+```
+data-testid="<entity>-<element>-<qualifier>"
+
+Examples:
+  data-testid="vk-row-{name}"           # Virtual key table row
+  data-testid="vk-edit-btn-{name}"      # Edit button for specific VK
+  data-testid="vk-delete-btn-{name}"    # Delete button for specific VK
+  data-testid="create-vk-btn"           # Create button
+  data-testid="vk-sheet"                # Virtual key form sheet
+  data-testid="vk-name-input"           # Name input in VK form
+  data-testid="vk-save-btn"             # Save button in VK form
+```
+
+## Available BasePage Methods
+
+Every page object inherits these from `BasePage` (`core/pages/base.page.ts`):
+
+| Method | Description |
+|--------|-------------|
+| `waitForPageLoad()` | Wait for `networkidle` load state |
+| `waitForChartsToLoad()` | Wait for charts/data and skeletons to disappear |
+| `getToast(type?)` | Get toast locator (success/error/loading/default) |
+| `waitForSuccessToast(message?)` | Wait for success toast, optionally match message |
+| `waitForErrorToast(message?)` | Wait for error toast, optionally match message |
+| `waitForToastsToDisappear(timeout?)` | Wait for all toasts to be gone |
+| `dismissToasts()` | Wait for toasts to auto-dismiss |
+| `forceCloseToasts()` | Click away + wait to force-dismiss toasts |
+| `waitForSheetAnimation()` | Wait for sheet/dialog open animation to complete |
+| `waitForStateChange(locator, attr, val)` | Wait for element attribute to match |
+| `fillByLabel(label, value)` | Fill input by its label |
+| `fillByPlaceholder(placeholder, value)` | Fill input by its placeholder |
+| `fillByTestId(testId, value)` | Fill input by data-testid |
+| `clickButton(text)` | Click button by visible text |
+| `clickByTestId(testId)` | Click element by data-testid |
+| `closeDevProfiler()` | Dismiss the Dev Profiler overlay if visible |
+
+## Available Test Helpers (`core/utils/test-helpers.ts`)
+
+| Helper | Description |
+|--------|-------------|
+| `waitForNetworkIdle(page, timeout?)` | Wait for network idle state |
+| `wait(ms)` | Sleep for ms (use sparingly) |
+| `retry(fn, { retries, delay })` | Retry async function with backoff |
+| `randomString(length?)` | Generate random alphanumeric string |
+| `uniqueTestName(prefix)` | Generate unique name with timestamp + random suffix |
+| `assertToast(page, text, type)` | Assert toast appears with text |
+| `assertUrl(page, pattern)` | Assert page URL matches pattern |
+| `fillSelect(page, triggerSelector, optionText)` | Fill Radix/Shadcn Select component |
+| `fillMultiSelect(page, inputSelector, values)` | Fill multi-select with array of values |
+| `clearAndFill(page, selector, value)` | Atomically clear and fill input |
+| `getTableRowCount(page, tableSelector)` | Get count of table body rows |
+| `tableContainsRow(page, tableSelector, text)` | Check if table has row with text |
+| `waitForTableLoad(page, tableSelector)` | Wait for table visible + spinner gone |
+| `screenshotOnError(page, testName, fn)` | Auto-screenshot wrapper for debugging |
+
+## Available API Helpers (`core/actions/api.ts`)
+
+For programmatic setup/cleanup via API (bypassing UI):
+
+| API | Methods |
+|-----|---------|
+| `providersApi` | `getAll`, `get`, `create`, `update`, `delete` |
+| `virtualKeysApi` | `getAll`, `get`, `create`, `update`, `delete` |
+| `teamsApi` | `getAll`, `create`, `delete` |
+| `customersApi` | `getAll`, `create`, `delete` |
+| `cleanupTestData(request, { virtualKeyIds, teamIds, customerIds, providerNames })` | Bulk cleanup |
--- a/.claude/skills/expect
+++ b/.claude/skills/expect
@@ -0,0 +1 @@
+../../.agents/skills/expect
--- a/.claude/skills/investigate-issue/SKILL.md
+++ b/.claude/skills/investigate-issue/SKILL.md
@@ -0,0 +1,648 @@
+---
+name: investigate-issue
+description: Investigate a GitHub issue by fetching details, analyzing the codebase, researching documentation, and presenting an actionable implementation plan with test guidance. Use when asked to investigate, analyze, triage, or plan work for a GitHub issue. Invoked with /investigate-issue <ISSUE_ID> or /investigate-issue (prompts for ID).
+allowed-tools: Read, Grep, Glob, Bash, WebSearch, WebFetch, mcp__context7__resolve-library-id, mcp__context7__query-docs, Task, AskUserQuestion, TodoWrite, Edit, Write
+---
+
+# Investigate GitHub Issue
+
+Fetch a GitHub issue, analyze the report, search the codebase for relevant code, research external documentation, and present a comprehensive implementation plan with side-effect analysis and test guidance. 
+
+**Your final report MUST contain all of these sections:**
+1. Issue Details (from Step 1)
+2. Classification (from Step 2)
+3. Codebase Analysis + Documentation Research (from Step 3, including sub-step 3e)
+4. Impact Analysis (from Step 4)
+5. Test Plan (from Step 5)
+6. Full Presentation (Step 6 template)
+
+If any section is missing, go back and complete it before presenting the report.
+
+## Usage
+
+```
+/investigate-issue <ISSUE_ID>           # Investigate issue by number
+/investigate-issue                      # Prompts for issue ID
+```
+
+## Workflow Overview
+
+1. **Get the issue** -- Fetch full issue details from GitHub
+2. **Classify the issue** -- Determine type (bug, feature, docs) and affected areas
+3. **Search the codebase and research docs** -- Find relevant code, then research the libraries it depends on via Context7 and WebSearch
+4. **Analyze impact** -- Cross-reference codebase findings with documentation to identify side effects, dependencies, and breaking changes
+5. **Suggest tests** -- If changes touch `core/`, recommend specific LLM and MCP test additions
+6. **Present the plan** -- Show findings and recommended changes to the user
+7. **Implement with approval** -- After plan approval, make changes one at a time with user confirmation
+
+## Step 1: Fetch the Issue
+
+If no issue ID is provided, ask the user:
+```
+What is the GitHub issue number you want to investigate?
+```
+
+The repository is always `maximhq/bifrost`.
+
+Fetch full issue details:
+```bash
+# Get issue with all metadata
+gh issue view <ISSUE_ID> --repo maximhq/bifrost --json number,title,body,labels,assignees,state,comments,author,createdAt,updatedAt
+
+# Get issue comments for additional context
+gh issue view <ISSUE_ID> --repo maximhq/bifrost --json comments --jq '.comments[].body'
+```
+
+If the issue does not exist or `gh` fails:
+- Check authentication: `gh auth status`
+- Verify the issue number is valid: `gh issue list --repo maximhq/bifrost --limit 5 --json number,title`
+- Report the error and ask the user for a corrected issue ID
+
+## Step 2: Classify the Issue
+
+### 2a. Determine Issue Type
+
+Parse the issue title prefix and labels to classify:
+
+| Title Prefix | Label | Type | Investigation Focus |
+|---|---|---|---|
+| `[Bug]:` | `bug` | Bug | Reproduce, find root cause, identify fix |
+| `[Feature]:` | `enhancement` | Feature | Design approach, find insertion points |
+| `[Docs]:` | `documentation` | Docs | Find affected doc pages, verify accuracy |
+| (none) | (any) | General | Read body carefully, infer type from content |
+
+### 2b. Determine Affected Areas
+
+Use the issue's labels and body content to map to codebase areas. The issue templates include an "Affected area(s)" field with these values:
+
+| Area Label | Codebase Directories | Key Files |
+|---|---|---|
+| Core (Go) | `core/`, `core/schemas/`, `core/providers/`, `core/mcp/` | `core/bifrost.go`, `core/utils.go` |
+| Framework | `framework/`, `framework/configstore/`, `framework/logstore/` | `framework/config.go`, `framework/list.go` |
+| Transports (HTTP) | `transports/bifrost-http/` | `transports/bifrost-http/` |
+| Plugins | `plugins/` (governance, jsonparser, litellmcompat, etc.) | Plugin-specific `go.mod` files |
+| UI (React) | `ui/`, `ui/app/workspace/`, `ui/components/` | Feature-specific workspace pages |
+| Docs | `docs/` | `docs/docs.json`, feature-specific `.mdx` files |
+
+If the issue body mentions specific providers (e.g., "openai", "anthropic", "gemini"), also search:
+```bash
+ls core/providers/
+# Maps to: core/providers/<provider_name>/
+```
+
+If the issue mentions MCP, agents, or tools, also search:
+```bash
+ls core/mcp/
+# Key: core/mcp/agent.go, core/mcp/codemode.go, core/mcp/toolmanager.go
+```
+
+### 2c. Extract Key Information
+
+From the issue body, extract and summarize:
+- **What is reported** -- The specific problem or request
+- **Reproduction steps** -- If bug, how to trigger it
+- **Expected vs actual behavior** -- What should happen vs what happens
+- **Version info** -- Which version is affected
+- **Environment details** -- OS, Go version, Node version, etc.
+- **Code snippets** -- Any code the reporter included
+- **Error messages** -- Stack traces, logs, error output
+
+## Step 3: Search the Codebase
+
+Systematically search for all code relevant to the issue. Use multiple search strategies:
+
+### 3a. Keyword Search
+
+Extract key terms from the issue and search:
+```bash
+# Search for error messages mentioned in the issue
+grep -rn "exact error message" core/ framework/ transports/ ui/
+
+# Search for function/type names mentioned
+grep -rn "FunctionName\|TypeName" --include='*.go' core/
+grep -rn "componentName\|functionName" --include='*.ts' --include='*.tsx' ui/
+
+# Search for API endpoints mentioned
+grep -rn "/api/endpoint" transports/ ui/
+```
+
+### 3b. Structural Search by Area
+
+Based on the affected area, do targeted exploration:
+
+**For Core (Go) issues:**
+```bash
+# Find the specific provider if mentioned
+ls core/providers/<provider>/
+grep -rn "relevantFunction" core/providers/<provider>/
+
+# Check schemas for relevant types
+grep -rn "TypeName" core/schemas/ --include='*.go'
+
+# Check the main bifrost.go for relevant handlers
+grep -n "functionName\|handlerName" core/bifrost.go
+```
+
+**For MCP/Agent issues:**
+```bash
+# Search agent code
+grep -rn "keyword" core/mcp/ --include='*.go'
+
+# Check codemode if relevant
+ls core/mcp/codemode/
+grep -rn "keyword" core/mcp/codemode/ --include='*.go'
+```
+
+**For UI issues:**
+```bash
+# Find the workspace page
+ls ui/app/workspace/<feature>/
+
+# Search for components
+grep -rn "keyword" ui/app/workspace/<feature>/ --include='*.tsx' --include='*.ts'
+grep -rn "keyword" ui/components/ --include='*.tsx' --include='*.ts'
+
+# Check for data-testid attributes (relevant for E2E impact)
+grep -rn 'data-testid' ui/app/workspace/<feature>/ --include='*.tsx'
+```
+
+**For Framework issues:**
+```bash
+grep -rn "keyword" framework/ --include='*.go'
+ls framework/configstore/ framework/logstore/ framework/plugins/
+```
+
+**For Plugin issues:**
+```bash
+# Identify which plugin
+ls plugins/
+grep -rn "keyword" plugins/<plugin_name>/ --include='*.go'
+```
+
+**For Docs issues:**
+```bash
+# Find the affected documentation page
+find docs/ -name "*.mdx" | head -30
+grep -rn "keyword" docs/ --include='*.mdx'
+```
+
+### 3c. Dependency Tracing
+
+For any function or type identified as needing changes, trace its callers and dependents:
+```bash
+# Find all callers of a function
+grep -rn "FunctionName(" --include='*.go' core/ framework/ transports/ plugins/
+
+# Find all implementations of an interface
+grep -rn "InterfaceName" --include='*.go' core/ framework/
+
+# Find all imports of a package
+grep -rn '"github.com/maximhq/bifrost/core/schemas"' --include='*.go' .
+```
+
+### 3d. Find Related Tests
+
+```bash
+# Find existing tests for the affected code
+grep -rn "TestFunctionName\|Test.*Relevant" --include='*_test.go' core/ framework/ transports/
+
+# Check LLM tests
+grep -rn "keyword" core/internal/llmtests/ --include='*.go'
+
+# Check MCP tests
+grep -rn "keyword" core/internal/mcptests/ --include='*_test.go'
+
+# Check E2E tests if UI is affected
+grep -rn "keyword" tests/e2e/ --include='*.ts'
+```
+
+### 3e. Research External Documentation
+
+Now that you've found the relevant code, research the external libraries it depends on. This informs the impact analysis in Step 4.
+
+**Identify libraries from the code you found:**
+```bash
+# Check go.mod for dependencies relevant to the issue
+cat core/go.mod | grep -v "^$" | grep -v "//"
+
+# For UI issues, check package.json
+cat ui/package.json | jq '.dependencies, .devDependencies'
+```
+
+**Query Context7 for each relevant library:**
+```
+# Step 1: Resolve the library ID
+mcp__context7__resolve-library-id(
+  libraryName: "<library name from go.mod or package.json>",
+  query: "<issue title + key terms>"
+)
+
+# Step 2: Query the docs with the resolved ID
+mcp__context7__query-docs(
+  libraryId: "<result from step 1>",
+  query: "<specific question about behavior relevant to the issue>"
+)
+```
+
+Common libraries: `mark3labs/mcp-go` (MCP protocol), `stretchr/testify` (test assertions), `react` (UI framework), `playwright` (E2E testing), provider SDKs (OpenAI, Anthropic, etc.)
+
+**Search the web for additional context:**
+```
+WebSearch: "<error message or behavior from the issue> <library name>"
+WebSearch: "<library name> best practices <relevant pattern>"
+```
+
+**Record your findings in this table** (you will copy it into the final report in Step 6):
+
+| Source | Query Used | Key Finding | Link |
+|--------|-----------|-------------|------|
+| Context7: `<library>` | `<query>` | `<what was learned, or "No relevant docs found">` | `<link if available>` |
+| WebSearch | `<query>` | `<what was learned, or "No relevant results">` | `<URL>` |
+
+If no useful results are found for a library, still include a row with "No relevant documentation found" and note what was searched. This transparency helps the user understand research coverage.
+
+## Step 4: Analyze Impact
+
+Using both the codebase search results (Step 3a-3d) and the documentation research (Step 3e), analyze the impact of the proposed changes. For each file, note whether library documentation revealed any constraints or best practices that affect the implementation.
+
+### 4a. Direct Changes Required
+
+For each file that needs modification, document:
+- **File path** (absolute)
+- **What to change** (function, type, handler, component)
+- **Why** (ties back to the issue)
+- **How** (specific approach -- add parameter, modify logic, new function, etc.)
+- **Library constraints** (from Step 3e research -- any API contracts, deprecations, or version requirements)
+
+### 4b. Side Effects Analysis
+
+For each proposed change, trace the blast radius:
+
+**Code side effects:**
+```bash
+# Who calls this function?
+grep -rn "FunctionToChange(" --include='*.go' core/ framework/ transports/ plugins/
+
+# Who uses this type?
+grep -rn "TypeToChange" --include='*.go' core/ framework/ transports/ plugins/
+
+# What tests exercise this code?
+grep -rn "FunctionToChange\|TestRelated" --include='*_test.go' core/ framework/ transports/
+```
+
+**Schema side effects (if changing types in core/schemas/):**
+- Check all providers that implement the schema
+- Check framework code that serializes/deserializes the type
+- Check UI code that consumes the API response
+- Check plugin code that uses the schema
+
+**API side effects (if changing endpoints):**
+- Check all UI pages that call the endpoint
+- Check E2E tests that hit the endpoint
+- Check documentation that references the endpoint
+
+**UI side effects (if changing components):**
+- Check all pages that use the component
+- Check E2E tests with selectors targeting the component
+- Check if data-testid attributes change
+
+### 4c. Breaking Change Assessment
+
+Classify the change:
+- **Non-breaking** -- Internal refactor, bug fix with same API surface
+- **Minor breaking** -- New required parameter with default, deprecation
+- **Major breaking** -- Changed API signature, removed field, behavioral change
+
+## Step 5: Test Recommendations
+
+### 5a. General Test Guidance
+
+For ANY code change, identify:
+- Existing tests that need updating
+- New test cases that should be added
+- Edge cases to cover
+
+### 5b. LLM Tests (when changes touch `core/` or `core/providers/`)
+
+The LLM test infrastructure lives in `core/internal/llmtests/`. Key patterns:
+
+**Test structure:**
+- Each scenario is a self-contained Go file with a `Run{Scenario}Test()` function
+- Signature: `func Run{Scenario}Test(t *testing.T, client *bifrost.Bifrost, ctx context.Context, testConfig ComprehensiveTestConfig)`
+- Tests run against both Chat Completions and Responses APIs (dual-API testing)
+
+**How to add a new LLM test:**
+1. Create a new file in `core/internal/llmtests/` named after the scenario (e.g., `new_scenario.go`)
+2. Implement `RunNewScenarioTest()` following the signature pattern
+3. Register it in `core/internal/llmtests/tests.go` by adding to the `testScenarios` slice
+4. Add a `Scenarios` flag in `ComprehensiveTestConfig` to enable/disable it
+5. Use `validation_presets.go` expectations (e.g., `BasicChatExpectations()`, `ToolCallExpectations()`)
+6. Use the retry framework from `test_retry_framework.go` for validation failures
+
+**Example test recommendation format:**
+```
+New LLM test: Run{X}Test in core/internal/llmtests/{x}.go
+- Scenario: <what it tests>
+- Expectations: <which validation preset to use>
+- Dual-API: Yes, test both Chat Completions and Responses API paths
+- Register in: tests.go testScenarios slice
+- Config flag: Scenarios.{X} in ComprehensiveTestConfig
+```
+
+**Running LLM tests:**
+```bash
+make test-core PROVIDER=<provider> TESTCASE=<TestName>
+make test-core PROVIDER=<provider> PATTERN=<substring>
+```
+
+### 5c. MCP Tests (when changes touch `core/mcp/`)
+
+The MCP test infrastructure lives in `core/internal/mcptests/`. Key patterns:
+
+**Test structure:**
+- Standard Go test functions: `func TestScenario(t *testing.T)`
+- Setup via `SetupAgentTest(t, AgentTestConfig{...})` which returns `(manager, mocker, ctx)`
+- Mock LLM responses via `DynamicLLMMocker` with a response queue
+- Agent tests use `MockLLMCaller` with pre-defined response sequences
+
+**How to add a new MCP test:**
+1. Identify which test file category it belongs to:
+   - `agent_*_test.go` -- Agent loop behavior tests
+   - `tool_*_test.go` -- Tool execution tests
+   - `connection_*_test.go` -- Client connection tests
+   - `codemode_*_test.go` -- CodeMode-specific tests
+2. Create the test function following existing patterns
+3. Use `AgentTestConfig` for declarative setup:
+   ```go
+   manager, mocker, ctx := SetupAgentTest(t, AgentTestConfig{
+       InProcessTools:   []string{"echo", "calculator"},
+       AutoExecuteTools: []string{"*"},
+       MaxDepth:         5,
+   })
+   ```
+4. Use assertion helpers: `AssertAgentCompletedInTurns()`, `AssertToolExecutedInTurn()`
+5. Use fixture helpers from `fixtures.go` for mock responses
+
+**Example test recommendation format:**
+```
+New MCP test: Test{X} in core/internal/mcptests/{category}_test.go
+- Category: agent | tool | connection | codemode
+- Setup: AgentTestConfig with {tools} and {config}
+- Mock responses: {describe the LLM response sequence}
+- Assertions: {what to verify}
+```
+
+**Running MCP tests:**
+```bash
+make test-mcp TESTCASE=<TestName>
+make test-mcp TYPE=<category> PATTERN=<substring>
+```
+
+### 5d. E2E Tests (when changes touch `ui/`)
+
+If UI changes are involved, recommend E2E test updates following the patterns in `tests/e2e/`:
+- Page objects in `tests/e2e/features/<feature>/pages/`
+- Specs in `tests/e2e/features/<feature>/`
+- Reference the `/e2e-test` skill for full E2E test creation workflow
+- **Never marshal API payloads to a `Record`/`Map`** — construct payloads as object literals with fields in the intended order and pass directly to Playwright's `request.post({ data })`. Marshaling through intermediate maps can reorder fields, breaking backend validation and snapshot comparisons.
+
+```bash
+make run-e2e FLOW=<feature>
+```
+
+## Step 6: Present Findings
+
+Present everything to the user in this structured format:
+
+```
+## Issue Investigation: #<ID> -- <Title>
+
+### Issue Classification
+- **Type:** Bug / Feature / Docs
+- **Severity:** <from issue if present>
+- **Affected areas:** <list>
+- **Labels:** <list>
+
+### Summary
+<2-3 sentence summary of what the issue is about and what needs to happen>
+
+### Codebase Analysis
+
+#### Relevant Files Found
+| File | Relevance | What Needs to Change |
+|------|-----------|---------------------|
+| `<absolute path>` | <why this file matters> | <specific change needed> |
+| ... | ... | ... |
+
+#### Current Behavior
+<Describe what the code currently does, with code snippets>
+
+#### Root Cause (for bugs) / Design Gap (for features)
+<Explain why the issue exists>
+
+### Documentation Research
+
+Copy the research table from Step 3e here. If you skipped Step 3e, go back and do it now before presenting.
+
+#### Libraries & References Consulted
+| Source | Query Used | Key Finding | Link |
+|--------|-----------|-------------|------|
+| <from Step 3e table> | <query> | <finding> | <link> |
+
+#### How Documentation Informs the Plan
+<For each change above, note any library constraints, best practices, or API contracts discovered in Step 3e that shaped the approach. Reference specific Change numbers.>
+
+### Implementation Plan
+
+#### Changes Required (in order)
+
+**Change 1: <File path>**
+- **What:** <specific modification>
+- **Why:** <ties to issue>
+- **Function/Component:** `<name>`
+- **Approach:** <how to implement>
+
+**Change 2: <File path>**
+- ...
+
+#### Side Effects
+| Change | Affected Code | Risk | Mitigation |
+|--------|--------------|------|------------|
+| Change 1 | `<caller/dependent>` | <risk level> | <how to mitigate> |
+
+#### Breaking Changes
+- <list any breaking changes, or "None expected">
+
+### Test Plan
+
+#### Existing Tests to Update
+| Test | File | What to Change |
+|------|------|---------------|
+| `TestName` | `<path>` | <modification needed> |
+
+#### New Tests to Add
+| Test | File | What It Covers |
+|------|------|---------------|
+| `TestNewScenario` | `<path>` | <scenario description> |
+
+<If changes touch core/>
+#### LLM Test Additions
+<Specific LLM test recommendations per Section 5b format>
+
+#### MCP Test Additions
+<Specific MCP test recommendations per Section 5c format>
+</If>
+
+<If changes touch ui/>
+#### E2E Test Additions
+<Recommend using /e2e-test skill for full test creation>
+</If>
+
+### Estimated Complexity
+- **Scope:** Small (1-2 files) / Medium (3-5 files) / Large (6+ files)
+- **Risk:** Low / Medium / High
+
+---
+
+**Proceed with implementation?** (yes / no / modify plan)
+```
+
+## Step 7: Implement with Per-Change Approval
+
+Once the user approves the plan:
+
+### 7a. Create a Todo List
+
+Create a todo item for each change in the plan:
+```
+1. Change 1: <description> -- pending
+2. Change 2: <description> -- pending
+3. Update test: <description> -- pending
+4. Add new test: <description> -- pending
+5. Verify all tests pass -- pending
+```
+
+### 7b. For Each Change
+
+Before making any edit, present the change to the user:
+
+```
+## Change <N>/<Total>: <File path>
+
+**What:** <description of the change>
+
+**Current code:**
+<existing code that will be modified>
+
+**Proposed change:**
+<new code after modification>
+
+**Apply this change?** (yes / no / modify)
+```
+
+Wait for user approval before applying. If user says "no", skip and move to the next change. If user says "modify", discuss and adjust.
+
+### 7c. After All Changes
+
+Once all approved changes are applied:
+
+1. Run relevant tests:
+   ```bash
+   # For core changes
+   make test-core PROVIDER=<relevant_provider> PATTERN=<relevant_test>
+
+   # For MCP changes
+   make test-mcp PATTERN=<relevant_test>
+
+   # For framework changes
+   cd framework && go test ./...
+
+   # For UI changes
+   make run-e2e FLOW=<feature>
+   ```
+
+2. Report results to the user
+3. If tests fail, investigate and propose fixes (with approval)
+
+## Error Handling
+
+### Issue Not Found
+```
+Issue #<ID> was not found in maximhq/bifrost.
+- Verify the issue number is correct
+- Run: gh issue list --repo maximhq/bifrost --limit 10 --json number,title
+```
+
+### gh CLI Not Authenticated
+```
+GitHub CLI is not authenticated. Run:
+  gh auth login
+Then retry /investigate-issue <ID>
+```
+
+### No Relevant Code Found
+If codebase search yields no results:
+1. Broaden the search terms
+2. Search for related concepts instead of exact matches
+3. Ask the user for more context about where the code might live
+4. Check if this is a net-new feature with no existing code
+
+### External Documentation Not Found
+If Context7 or WebSearch yield no useful results in Step 3e:
+1. Still include a row in the research table with "No relevant documentation found" and note what was searched
+2. Proceed with codebase analysis alone
+3. Flag areas where documentation review might be needed before implementation
+
+## Project Directory Reference
+
+Quick reference for navigating the Bifrost codebase:
+
+```
+bifrost/
+├── core/                          # Go core library
+│   ├── bifrost.go                 # Main Bifrost implementation (~195K)
+│   ├── schemas/                   # All Go types/schemas
+│   ├── providers/                 # Provider implementations (openai, anthropic, gemini, etc.)
+│   ├── mcp/                       # MCP protocol implementation
+│   │   ├── agent.go               # Agent mode
+│   │   ├── codemode/              # CodeMode (Starlark-based)
+│   │   └── toolmanager.go         # Tool management
+│   └── internal/
+│       ├── llmtests/              # LLM integration tests (~48 files)
+│       │   ├── setup.go           # Test initialization
+│       │   ├── tests.go           # Test orchestrator (scenario registry)
+│       │   ├── validation_presets.go  # Reusable expectations
+│       │   └── test_retry_framework.go  # Retry logic
+│       └── mcptests/              # MCP/Agent tests (~40 files)
+│           ├── setup_test.go      # Test infrastructure
+│           ├── agent_test_helpers.go  # AgentTestConfig + SetupAgentTest
+│           └── fixtures.go        # Mock servers & fixtures
+├── framework/                     # Framework layer
+│   ├── configstore/               # Configuration storage
+│   ├── logstore/                  # Log storage
+│   ├── plugins/                   # Plugin system
+│   └── streaming/                 # Streaming utilities
+├── transports/
+│   └── bifrost-http/              # HTTP transport + Docker
+├── ui/                            # React + Vite UI
+│   ├── app/workspace/             # Feature pages
+│   └── components/                # Shared components
+├── plugins/                       # Go plugins (governance, otel, etc.)
+├── docs/                          # Mintlify documentation
+├── tests/e2e/                     # Playwright E2E tests
+└── Makefile                       # Build & test commands
+```
+
+## Makefile Test Commands Reference 
+**FOLLOW THIS EXACTLY TO RUN TESTS**
+
+```bash
+make test-core PROVIDER=<name>              # Run core tests for a provider
+make test-core PROVIDER=<name> TESTCASE=<X> # Run specific test
+make test-core PROVIDER=<name> PATTERN=<X>  # Run tests matching pattern
+make test-mcp                               # Run all MCP tests
+make test-mcp TYPE=<category>               # Run MCP tests by category
+make test-mcp TESTCASE=<TestName>           # Run specific MCP test
+make run-e2e FLOW=<feature>                 # Run E2E tests for feature
+make run-e2e                                # Run all E2E tests
+```
--- a/.claude/skills/resolve-pr-comments/SKILL.md
+++ b/.claude/skills/resolve-pr-comments/SKILL.md
@@ -0,0 +1,306 @@
+
+---
+name: resolve-pr-comments
+description: Resolve all unresolved PR comments interactively. Makes local edits only—NEVER commits or pushes. Use when asked to resolve PR comments, address review feedback, handle CodeRabbit comments, or fix PR review issues. Invoked with /resolve-pr-comments <PR_NUMBER> or /resolve-pr-comments <owner/repo> <PR_NUMBER>.
+allowed-tools: Read, Grep, Glob, Bash, Edit, Write, WebFetch, Task, AskUserQuestion, TodoWrite
+---
+
+# Resolve PR Comments
+
+An interactive workflow to systematically address all unresolved PR review comments.
+
+## Usage
+
+```
+/resolve-pr-comments <PR_NUMBER>
+/resolve-pr-comments <owner/repo> <PR_NUMBER>
+```
+
+If no repo is specified, uses the current git repository's remote origin.
+
+**Before starting the workflow** - if the flow is in Plan Model - ask if the user wants to move to default mode to solve the comments one by one. Mention that each PR resolve has planning attached to it.
+
+## Workflow Overview
+
+1. **Detect repository** - Get owner/repo from git remote or user input
+2. **Fetch unresolved comments** - Use GitHub GraphQL API (REST doesn't expose resolved status). Paginate through `reviewThreads` (cursor-based) so all pages are checked when a PR has more than 100 threads.
+3. **Create tracking file** - Maintain state across the session
+4. **For each comment**:
+   - Get full details and any existing replies
+   - Show the diff view of existing code in a proper diff view
+   - Before suggesting the fix - do the research via documentations. And present all that docs research and relevant links to the user with the fix. Use context 7. **MAKE SURE YOU DO THIS ALWAYS**
+   - Present to user with options (FIX, REPLY, SKIP)
+   - Wait for user decision
+   - Execute the action
+   - Update tracking
+5. **Verify resolution** - Check remaining unresolved count
+6. **Repeat until done** - Continue until all comments resolved
+
+## Step 1: Detect Repository
+
+If repository not provided, detect from git remote:
+
+```bash
+git remote get-url origin | sed -E 's|.*github.com[:/]([^/]+/[^/.]+)(\.git)?|\1|'
+```
+
+## Step 2: Fetch Unresolved Comments (GraphQL)
+
+The REST API does NOT expose resolved/unresolved status. Use GraphQL.
+
+**Important:** `reviewThreads` returns at most 100 threads per request. PRs with many review threads (e.g. large CodeRabbit reviews) need **pagination** or you will only see the first 100 threads and miss unresolved ones on later pages. Always paginate until `pageInfo.hasNextPage` is false so the count and list are complete.
+
+### Single-page query (first 100 threads only)
+
+```bash
+gh api graphql -f query='
+{
+  repository(owner: "OWNER", name: "REPO") {
+    pullRequest(number: PR_NUMBER) {
+      reviewThreads(first: 100) {
+        pageInfo { hasNextPage endCursor }
+        nodes {
+          isResolved
+          comments(first: 1) {
+            nodes {
+              databaseId
+              path
+              body
+              author { login }
+            }
+          }
+        }
+      }
+    }
+  }
+}'
+```
+
+Extract unresolved (single page):
+```bash
+... | jq -r '.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false) | "\(.comments.nodes[0].databaseId)|\(.comments.nodes[0].path)|\(.comments.nodes[0].author.login)"'
+```
+
+Avoid parsing `body` in the same jq pass if you paginate—comment bodies can contain control characters and break jq. Fetch full body per comment via REST when presenting (see Step 4).
+
+### Paginate to collect all unresolved threads
+
+Use cursor-based pagination so every thread is considered:
+
+1. First request: `reviewThreads(first: 100)` (no `after`).
+2. From the response: read `pageInfo.hasNextPage` and `pageInfo.endCursor`.
+3. Next request: `reviewThreads(first: 100, after: $endCursor)`.
+4. Append unresolved from this page to your list (id, path, author only).
+5. Repeat from step 2 until `hasNextPage` is false.
+
+Example loop (collects id|path|author for all unresolved; write to a file or variable):
+
+```bash
+CURSOR=""
+while true; do
+  if [ -z "$CURSOR" ]; then
+    RESP=$(gh api graphql -f query='
+      query {
+        repository(owner: "OWNER", name: "REPO") {
+          pullRequest(number: PR_NUMBER) {
+            reviewThreads(first: 100) {
+              pageInfo { hasNextPage endCursor }
+              nodes {
+                isResolved
+                comments(first: 1) {
+                  nodes { databaseId path author { login } }
+                }
+              }
+            }
+          }
+        }
+      }')
+  else
+    RESP=$(gh api graphql -f query='
+      query($after: String) {
+        repository(owner: "OWNER", name: "REPO") {
+          pullRequest(number: PR_NUMBER) {
+            reviewThreads(first: 100, after: $after) {
+              pageInfo { hasNextPage endCursor }
+              nodes {
+                isResolved
+                comments(first: 1) {
+                  nodes { databaseId path author { login } }
+                }
+              }
+            }
+          }
+        }
+      }' -f after="$CURSOR")
+  fi
+  # Append unresolved from this page (id|path|author only; no body to avoid control chars)
+  echo "$RESP" | jq -r '.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false) | .comments.nodes[0] | "\(.databaseId)|\(.path)|\(.author.login)"'
+  HAS_NEXT=$(echo "$RESP" | jq -r '.data.repository.pullRequest.reviewThreads.pageInfo.hasNextPage')
+  CURSOR=$(echo "$RESP" | jq -r '.data.repository.pullRequest.reviewThreads.pageInfo.endCursor // ""')
+  [ "$HAS_NEXT" != "true" ] || [ -z "$CURSOR" ] && break
+done
+```
+
+Total unresolved count = number of lines output. Use the comment `databaseId` values to fetch full body via REST when presenting each comment (Step 4).
+
+## Step 3: Create Tracking File
+
+Create at `/tmp/pr-review/pr-<NUMBER>-comments.md`:
+
+```markdown
+# PR #<NUMBER> Comment Review (<owner>/<repo>)
+
+## Summary
+- Total unresolved: <count>
+- Fixed: 0
+- Replied: 0
+- Skipped: 0
+
+## Comments to Address
+| # | ID | File | Issue | Status |
+|---|-----|------|-------|--------|
+| 1 | 12345 | src/foo.ts | Missing validation | pending |
+
+## Actions Taken
+| ID | Action | Details |
+|----|--------|---------|
+```
+
+## Step 4: Present Each Comment
+
+For each unresolved comment, present in this format:
+
+```
+**Comment #<N>/<TOTAL>: ID <ID> - <File>**
+
+**What it says:**
+<Summary of the comment's concern>
+
+**Current code state:**
+<Show relevant code snippet if applicable - READ THE FILE>
+
+**Documentations referred**
+For anything related to LLM calls (in /core module) - make sure you refer to the documentation. You have access to web_search and context7. and show that too 
+
+**Options:**
+1. **FIX** - <Describe what the fix would be>
+2. **REPLY** - <Describe the reply explaining why no fix needed>
+3. **SKIP** - Move on without action
+
+**My recommendation:** <OPTION> - <Brief reasoning>
+
+Go ahead?
+```
+
+### Getting Full Comment Details
+
+```bash
+gh api repos/OWNER/REPO/pulls/PR_NUMBER/comments --paginate | jq -r '.[] | select(.id == COMMENT_ID) | .body'
+```
+
+### Checking for Existing Replies
+
+```bash
+gh api repos/OWNER/REPO/pulls/PR_NUMBER/comments --paginate | jq '.[] | select(.id == COMMENT_ID or .in_reply_to_id == COMMENT_ID) | {id, user: .user.login, body: (.body | gsub("\n"; " ") | .[0:150])}'
+```
+
+## Step 5: Execute Actions
+
+**CRITICAL: Do NOT reply to PR comments until changes are pushed to the remote.** The reviewer cannot verify fixes until the code is pushed. Collect all fixes locally. This skill NEVER commits or pushes—the user handles that manually.
+
+### For FIX:
+1. Make the code change using Edit tool
+2. Before applying the changes take approval from the user. DO NOT DIRECTLY MAKE CHANGE BEFORE user says yes. Also give an option to suggest the changes to code.
+3. Track the fix locally in the tracking file (do NOT reply yet)
+4. Continue to next comment
+
+### For REPLY (non-code responses like "out of scope", "intentional design"):
+These can be posted immediately since they don't require code verification. Use the **replies** endpoint only (see below).
+
+## Reply endpoint (use this only)
+
+To reply to a review comment, use the dedicated replies endpoint. **Do not** use `POST .../pulls/PR_NUMBER/comments` with `in_reply_to` — that returns 422 (in_reply_to is not a permitted key for create review comment).
+
+```bash
+gh api repos/OWNER/REPO/pulls/PR_NUMBER/comments/COMMENT_ID/replies -X POST -f body="<your reply>"
+```
+
+- `COMMENT_ID` is the numeric comment id (same as GraphQL `databaseId` from the thread's first comment).
+- Request body: only `body` (string). No `in_reply_to`, `commit_id`, or path params.
+
+## Step 5b: Push and Reply to FIX comments
+
+After ALL comments have been addressed locally:
+
+1. Ask user if they have pushed these changes to remote. Yes/No
+2. **Only after push succeeds**, reply to each FIX comment using the replies endpoint:
+   ```bash
+   gh api repos/OWNER/REPO/pulls/PR_NUMBER/comments/COMMENT_ID/replies -X POST -f body="Fixed - <description of change>. See updated code."
+   ```
+
+### Batch workflow (fix all → push → post all)
+
+If the user says e.g. "resolve all comments then push then post", you may:
+1. Apply all FIX and REPLY decisions locally (with user approval per comment or bulk approval).
+2. Ask user to push.
+3. After push, post all replies in sequence: FIX replies first, then any REPLY-only replies, using the same `.../comments/COMMENT_ID/replies` endpoint for each.
+
+### Common Reply Templates
+
+**Out of scope:**
+```
+This is a valid improvement but out of scope for this PR. Tracked for future work.
+```
+
+**Already addressed:**
+```
+Already addressed - <variable/file> now has <fix>. See line <N>.
+```
+
+**Intentional design:**
+```
+This is intentional. <Explanation of why the current approach is correct>.
+```
+
+**Different module:**
+```
+This comment refers to <module> which is a different module not modified in this PR. It's working as-is.
+```
+
+**Asking bot to verify:**
+```
+This is solved, can you check and resolve if done properly?
+```
+
+## Step 6: Verify Resolution
+
+After addressing comments, check remaining unresolved count. If the PR has more than 100 review threads, use the same pagination loop as in Step 2 and count unresolved across all pages; a single-page query only sees the first 100 threads.
+
+Single-page check (first 100 threads only):
+```bash
+gh api graphql -f query='...' | jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length'
+```
+
+If count is 0 (across all pages), report success. If comments remain:
+- Some bots (like CodeRabbit) take time to auto-resolve
+- User may need to push code changes first
+- Re-run the workflow to address remaining comments
+
+## Important Notes
+
+1. **NEVER commit or push changes** - This skill only makes local edits. The user handles `git add`, `git commit`, and `git push` themselves. Do not run any git commit or git push commands.
+2. **NEVER reply "Fixed" until code is pushed** - The reviewer cannot verify fixes until they're on the remote. Make all fixes locally. Only reply to FIX comments after the user confirms they have pushed (the user pushes manually).
+3. **Always read the file** before suggesting fixes - understand context
+4. **Check for existing replies** in the thread before responding
+5. **Wait for user approval** on each action - never auto-fix without confirmation
+6. **Update tracking file** after each action
+7. **Some bots are slow** - CodeRabbit may take minutes to auto-resolve after push
+8. **User pushes manually** - This skill never commits or pushes; the user must push code changes before expecting auto-resolution of FIX actions
+
+## Error Handling
+
+- If `gh` not authenticated: `gh auth login`
+- If repo not found: verify owner/repo spelling
+- If PR not found: verify PR number exists
+- If comment ID invalid: re-fetch unresolved comments (may have been resolved)
+- If reply returns 422 "in_reply_to is not a permitted key": you are using the wrong endpoint. Use `POST .../pulls/PR_NUMBER/comments/COMMENT_ID/replies` with only `-f body="..."`, not the create-comment endpoint.