Files

southseact-3d 58bab1c5d8 Add PluginCompass Provider System documentation

This document describes the architecture and functionality of the PluginCompass AI provider management system, including:
- Admin panel structure and authentication
- Provider management with supported providers (OpenRouter, Mistral, Google, Groq, NVIDIA, Chutes, Ollama)
- Rate limiting system with per-provider and per-model limits
- Fallback system architecture with multi-level fallback chains
- Usage tracking and monitoring capabilities

The documentation covers both technical implementation details and operational guidance for managing the provider infrastructure.

2026-02-09 18:23:55 +00:00

11 KiB

Raw Blame History

PluginCompass Provider Management & Fallback System

Overview

This document describes the architecture and functionality of the PluginCompass AI provider management system, including the admin panel configuration, provider limits, usage tracking, and fallback mechanisms.

1. Admin Panel Structure

1.1 Admin Panel Sections

The PluginCompass admin panel is accessible at /admin and provides the following management areas:

Main Admin Areas:

Build Models (/admin/build) - Configure AI models available to users
Plan Models (/admin/plan) - Configure planning provider chain
Plans (/admin/plans) - Manage subscription plans and pricing
Accounts (/admin/accounts) - User account management
Affiliates (/admin/affiliates) - Affiliate program management
Withdrawals (/admin/withdrawals) - Affiliate payout management
Tracking (/admin/tracking) - Analytics and usage statistics
Resources (/admin/resources) - System resource monitoring
External Testing (/admin/external-testing) - WordPress testing configuration
Contact Messages (/admin/contact-messages) - Customer inquiries

1.2 Admin Authentication

The admin panel uses session-based authentication with the following security measures:

Credentials: Configured via environment variables ADMIN_USER and ADMIN_PASSWORD
Session Duration: 24 hours (configurable via ADMIN_SESSION_TTL_MS)
Rate Limiting: Maximum 5 login attempts per minute per IP
Account Lockout: 15 minutes after failed attempts

API Authentication: All admin API endpoints require authentication via session cookies. The endpoints include:

Login: POST /api/admin/login
Logout: POST /api/admin/logout
Session check: GET /api/admin/me

2. Provider Management

2.1 Supported Providers

PluginCompass supports multiple AI providers for both planning and building:

Build Providers:

OpenRouter (primary aggregator)
Mistral
Google (Gemini)
Groq
NVIDIA (NIM)
Chutes AI
OpenCode (internal/self-hosted)
Ollama (self-hosted planning)

Planning Providers:

OpenRouter
Mistral
Google (Gemini)
Groq
NVIDIA (NIM)
Ollama (self-hosted)

2.2 Provider Configuration

Each provider requires API credentials configured via environment variables:

Provider	Environment Variable	Default API URL
OpenRouter	`OPENROUTER_API_KEY`	`https://openrouter.ai/api/v1`
Mistral	`MISTRAL_API_KEY`	`https://api.mistral.ai/v1`
Google	`GOOGLE_API_KEY`	`https://generativelanguage.googleapis.com/v1beta2`
Groq	`GROQ_API_KEY`	`https://api.groq.com/openai/v1/chat/completions`
NVIDIA	`NVIDIA_API_KEY`	`https://api.nvidia.com/v1`
Chutes AI	`CHUTES_API_KEY` or `PLUGIN_COMPASS_CHUTES_API_KEY`	`https://api.chutes.ai/v1`
Ollama	`OLLAMA_API_URL`	Configurable self-hosted URL

2.3 Model Discovery

The system automatically discovers available models from each provider:

CLI-based Discovery: Queries OpenCode CLI for available models
Provider API Discovery: Fetches model lists directly from provider APIs
Manual Configuration: Admin can manually add model configurations

Model Configuration per Provider: Each configured model includes:

Model identifier (provider/model format)
Display label for users
Tier classification (free, plus, pro)
Icon association
Provider priority order
Media support flag (image uploads)

3. Provider Limits & Usage Tracking

3.1 Rate Limiting System

PluginCompass implements a flexible rate limiting system with the following components:

Limit Types:

Tokens per Minute (TPM): Token consumption rate limit
Tokens per Day (TPD): Daily token consumption limit
Requests per Minute (RPM): API call rate limit
Requests per Day (RPD): Daily API call limit

Scope Levels:

Per Provider: Limits apply to all models from that provider
Per Model: Limits apply to specific models only

Default Behavior:

All limits default to 0 (unlimited)
Limits are configurable per provider or per model
Usage is tracked independently for each provider

3.2 Usage Tracking

The system tracks usage in real-time with the following characteristics:

Tracked Metrics:

Tokens consumed per request
Number of API requests
Timestamps for rate window calculation
Per-model breakdown when scoped

Data Retention:

Usage data retained for 48 hours for rate limiting
Aggregated statistics persisted for reporting
Separate tracking for planning vs building

State Files:

provider-limits.json: Saved limit configurations
provider-usage.json: Recent usage data
token-usage.json: User token consumption

3.3 Limit Enforcement

When a request is made, the system:

Identifies the provider and model
Checks applicable limits (provider-level or model-level)
Compares current usage against limits
Returns rate limit error if exceeded
Records usage after successful requests

4. Fallback System

4.1 Fallback Architecture

PluginCompass implements a multi-level fallback system for reliability:

Fallback Levels:

Model-level Fallback: Alternative models within same provider
Provider-level Fallback: Switch to alternative providers
Ultimate Backup: Final fallback model when all providers fail

4.2 Model Fallback Chain

For Each Provider: Each provider has a configured fallback chain:

OpenRouter:

Primary Model → Backup 1 → Backup 2 → Backup 3 → Environment Fallbacks → Static Fallbacks

Mistral:

Primary Model → Backup 1 → Backup 2 → Backup 3 → Default (mistral-large-latest)

Groq:

llama-3.3-70b-versatile → mixtral-8x7b-32768 → llama-3.1-70b-versatile

Google (Gemini):

gemini-1.5-flash → gemini-1.5-pro → gemini-pro

NVIDIA (NIM):

meta/llama-3.1-70b-instruct → meta/llama-3.1-8b-instruct

4.3 Planning Chain Fallback

The planning system follows a configured priority chain:

Attempts first provider in chain
If rate limited or error occurs, moves to next provider
Continues through all configured providers
Returns error if all providers fail

Planning Chain Configuration:

Configurable via admin panel
Each entry: { provider, model }
Priority determines fallback order
Supports provider prefix in model names (e.g., "groq/compound-mini")

4.4 Build Fallback Chain

For building operations, the system follows this sequence:

Primary Model: User-selected or auto-assigned model
Provider Fallback: Alternative providers with same model
OpenCode Fallback: Internal OpenCode processing
Ultimate Backup: Configured backup model (last resort)

4.5 Error Classification & Fallback Decision

Errors are classified to determine fallback behavior:

Error Type	Example	Fallback Action
Rate Limit (429)	"Too many requests"	Wait 30s, switch provider
Server Error (5xx)	"Internal error"	Wait 30s, switch provider
Auth Error (401)	"Invalid API key"	Switch immediately
Billing Error (402)	"Insufficient credits"	Switch immediately
Model Not Found (404)	"Unknown model"	Wait 30s, switch model
User Error (400)	"Invalid request"	Return error, no fallback
Token Limit	"Context length exceeded"	Return error, no fallback

Continue Mechanism:

For early terminations, system sends "continue" message
Retries up to 3 times with same model
After 3 failures, switches to fallback model

5. Admin Configuration Interface

5.1 Model Management

The admin panel allows configuration of:

Add/Update Models: Select from discovered models or add custom
Provider Priority: Drag-and-drop reordering of provider fallback order
Tier Assignment: free (1x), plus (2x), pro (3x) multipliers
Icon Selection: Associate icons with models
Media Support: Enable/disable image upload capability

5.2 Provider Limits Configuration

The limits interface provides:

Provider Selection: Dropdown for each configured provider
Scope Selection: Per-provider or per-model limits
Limit Inputs: Numeric fields for TPM, TPD, RPM, RPD
Live Usage Display: Current usage statistics per provider
Save/Reset: Persist or revert limit changes

5.3 Ultimate Backup Configuration

The admin can configure a final fallback model that will be used when all other providers fail. This ensures system availability even during widespread outages.

6. Configuration Files

6.1 Environment Variables

Key configuration is done via environment variables:

# Provider API Keys
OPENROUTER_API_KEY=
MISTRAL_API_KEY=
GOOGLE_API_KEY=
GROQ_API_KEY=
NVIDIA_API_KEY=
CHUTES_API_KEY=

# Admin Authentication
ADMIN_USER=
ADMIN_PASSWORD=

# Rate Limiting
ADMIN_LOGIN_RATE_LIMIT=5
USER_LOGIN_RATE_LIMIT=10
API_RATE_LIMIT=100

6.2 Runtime State

The system maintains runtime state in:

.data/.opencode-chat/provider-limits.json - Persisted limits
.data/.opencode-chat/provider-usage.json - Recent usage
In-memory state for active sessions and rate tracking

7. Security Considerations

7.1 Rate Limiting

Login Protection: 5 attempts/minute, 15-minute lockout
API Protection: 100 requests/minute per user
Provider Protection: Configurable limits prevent abuse

7.2 Authentication

Session-based auth with secure cookies
OAuth support for Google and GitHub
Rate-limited login attempts
Session timeout enforcement

7.3 Data Protection

Provider API keys stored securely
Usage data retained only as needed
No sensitive data in logs
Encrypted session storage

8. Monitoring & Analytics

8.1 Tracking Metrics

The system tracks:

User Analytics: Session duration, feature usage, model preferences
Business Metrics: MRR, LTV, churn rate, CAC
Technical Metrics: AI response times, error rates, queue wait times
Provider Metrics: Per-provider usage and error rates

8.2 Admin Dashboard

The tracking page provides:

Daily/weekly/monthly active users
Revenue analytics
Conversion funnels
Error rate monitoring
Resource utilization

9. Summary

PluginCompass provides a robust, multi-provider AI infrastructure with:

Flexible Provider Management: Support for 6+ AI providers with automatic model discovery
Granular Rate Limiting: Per-provider and per-model limits with configurable thresholds
Intelligent Fallback: Multi-level fallback chains ensure high availability
Comprehensive Admin Control: Full configuration through web-based admin panel
Usage Tracking: Real-time monitoring of token consumption and API usage
Security Measures: Rate limiting, authentication, and session management

This architecture ensures reliable AI-powered development while maintaining control over costs and system availability.

11 KiB Raw Blame History