Add PluginCompass Provider System documentation

This document describes the architecture and functionality of the PluginCompass AI provider management system, including: - Admin panel structure and authentication - Provider management with supported providers (OpenRouter, Mistral, Google, Groq, NVIDIA, Chutes, Ollama) - Rate limiting system with per-provider and per-model limits - Fallback system architecture with multi-level fallback chains - Usage tracking and monitoring capabilities The documentation covers both technical implementation details and operational guidance for managing the provider infrastructure.
2026-02-09 18:23:55 +00:00
parent 91308fd061
commit 58bab1c5d8
1 changed files with 361 additions and 0 deletions
--- a/PLUGIN_COMPASS_PROVIDER_SYSTEM.md
+++ b/PLUGIN_COMPASS_PROVIDER_SYSTEM.md
@@ -0,0 +1,361 @@
 # PluginCompass Provider Management & Fallback System
 ## Overview
 This document describes the architecture and functionality of the PluginCompass AI provider management system, including the admin panel configuration, provider limits, usage tracking, and fallback mechanisms.
 ---
 ## 1. Admin Panel Structure
 ### 1.1 Admin Panel Sections
 The PluginCompass admin panel is accessible at `/admin` and provides the following management areas:
 **Main Admin Areas:**
 - **Build Models** (`/admin/build`) - Configure AI models available to users
 - **Plan Models** (`/admin/plan`) - Configure planning provider chain
 - **Plans** (`/admin/plans`) - Manage subscription plans and pricing
 - **Accounts** (`/admin/accounts`) - User account management
 - **Affiliates** (`/admin/affiliates`) - Affiliate program management
 - **Withdrawals** (`/admin/withdrawals`) - Affiliate payout management
 - **Tracking** (`/admin/tracking`) - Analytics and usage statistics
 - **Resources** (`/admin/resources`) - System resource monitoring
 - **External Testing** (`/admin/external-testing`) - WordPress testing configuration
 - **Contact Messages** (`/admin/contact-messages`) - Customer inquiries
 ### 1.2 Admin Authentication
 The admin panel uses session-based authentication with the following security measures:
 - **Credentials**: Configured via environment variables `ADMIN_USER` and `ADMIN_PASSWORD`
 - **Session Duration**: 24 hours (configurable via `ADMIN_SESSION_TTL_MS`)
 - **Rate Limiting**: Maximum 5 login attempts per minute per IP
 - **Account Lockout**: 15 minutes after failed attempts
 **API Authentication:**
 All admin API endpoints require authentication via session cookies. The endpoints include:
 - Login: `POST /api/admin/login`
 - Logout: `POST /api/admin/logout`
 - Session check: `GET /api/admin/me`
 ---
 ## 2. Provider Management
 ### 2.1 Supported Providers
 PluginCompass supports multiple AI providers for both planning and building:
 **Build Providers:**
 - OpenRouter (primary aggregator)
 - Mistral
 - Google (Gemini)
 - Groq
 - NVIDIA (NIM)
 - Chutes AI
 - OpenCode (internal/self-hosted)
 - Ollama (self-hosted planning)
 **Planning Providers:**
 - OpenRouter
 - Mistral
 - Google (Gemini)
 - Groq
 - NVIDIA (NIM)
 - Ollama (self-hosted)
 ### 2.2 Provider Configuration
 Each provider requires API credentials configured via environment variables:
 | Provider | Environment Variable | Default API URL |
 |---------|---------------------|----------------|
 | OpenRouter | `OPENROUTER_API_KEY` | `https://openrouter.ai/api/v1` |
 | Mistral | `MISTRAL_API_KEY` | `https://api.mistral.ai/v1` |
 | Google | `GOOGLE_API_KEY` | `https://generativelanguage.googleapis.com/v1beta2` |
 | Groq | `GROQ_API_KEY` | `https://api.groq.com/openai/v1/chat/completions` |
 | NVIDIA | `NVIDIA_API_KEY` | `https://api.nvidia.com/v1` |
 | Chutes AI | `CHUTES_API_KEY` or `PLUGIN_COMPASS_CHUTES_API_KEY` | `https://api.chutes.ai/v1` |
 | Ollama | `OLLAMA_API_URL` | Configurable self-hosted URL |
 ### 2.3 Model Discovery
 The system automatically discovers available models from each provider:
 1. **CLI-based Discovery**: Queries OpenCode CLI for available models
 2. **Provider API Discovery**: Fetches model lists directly from provider APIs
 3. **Manual Configuration**: Admin can manually add model configurations
 **Model Configuration per Provider:**
 Each configured model includes:
 - Model identifier (provider/model format)
 - Display label for users
 - Tier classification (free, plus, pro)
 - Icon association
 - Provider priority order
 - Media support flag (image uploads)
 ---
 ## 3. Provider Limits & Usage Tracking
 ### 3.1 Rate Limiting System
 PluginCompass implements a flexible rate limiting system with the following components:
 **Limit Types:**
 - **Tokens per Minute (TPM)**: Token consumption rate limit
 - **Tokens per Day (TPD)**: Daily token consumption limit
 - **Requests per Minute (RPM)**: API call rate limit
 - **Requests per Day (RPD)**: Daily API call limit
 **Scope Levels:**
 - **Per Provider**: Limits apply to all models from that provider
 - **Per Model**: Limits apply to specific models only
 **Default Behavior:**
 - All limits default to 0 (unlimited)
 - Limits are configurable per provider or per model
 - Usage is tracked independently for each provider
 ### 3.2 Usage Tracking
 The system tracks usage in real-time with the following characteristics:
 **Tracked Metrics:**
 - Tokens consumed per request
 - Number of API requests
 - Timestamps for rate window calculation
 - Per-model breakdown when scoped
 **Data Retention:**
 - Usage data retained for 48 hours for rate limiting
 - Aggregated statistics persisted for reporting
 - Separate tracking for planning vs building
 **State Files:**
 - `provider-limits.json`: Saved limit configurations
 - `provider-usage.json`: Recent usage data
 - `token-usage.json`: User token consumption
 ### 3.3 Limit Enforcement
 When a request is made, the system:
 1. Identifies the provider and model
 2. Checks applicable limits (provider-level or model-level)
 3. Compares current usage against limits
 4. Returns rate limit error if exceeded
 5. Records usage after successful requests
 ---
 ## 4. Fallback System
 ### 4.1 Fallback Architecture
 PluginCompass implements a multi-level fallback system for reliability:
 **Fallback Levels:**
 1. **Model-level Fallback**: Alternative models within same provider
 2. **Provider-level Fallback**: Switch to alternative providers
 3. **Ultimate Backup**: Final fallback model when all providers fail
 ### 4.2 Model Fallback Chain
 **For Each Provider:**
 Each provider has a configured fallback chain:
 **OpenRouter:**
 ```
 Primary Model → Backup 1 → Backup 2 → Backup 3 → Environment Fallbacks → Static Fallbacks
 ```
 **Mistral:**
 ```
 Primary Model → Backup 1 → Backup 2 → Backup 3 → Default (mistral-large-latest)
 ```
 **Groq:**
 ```
 llama-3.3-70b-versatile → mixtral-8x7b-32768 → llama-3.1-70b-versatile
 ```
 **Google (Gemini):**
 ```
 gemini-1.5-flash → gemini-1.5-pro → gemini-pro
 ```
 **NVIDIA (NIM):**
 ```
 meta/llama-3.1-70b-instruct → meta/llama-3.1-8b-instruct
 ```
 ### 4.3 Planning Chain Fallback
 The planning system follows a configured priority chain:
 1. Attempts first provider in chain
 2. If rate limited or error occurs, moves to next provider
 3. Continues through all configured providers
 4. Returns error if all providers fail
 **Planning Chain Configuration:**
 - Configurable via admin panel
 - Each entry: `{ provider, model }`
 - Priority determines fallback order
 - Supports provider prefix in model names (e.g., "groq/compound-mini")
 ### 4.4 Build Fallback Chain
 For building operations, the system follows this sequence:
 1. **Primary Model**: User-selected or auto-assigned model
 2. **Provider Fallback**: Alternative providers with same model
 3. **OpenCode Fallback**: Internal OpenCode processing
 4. **Ultimate Backup**: Configured backup model (last resort)
 ### 4.5 Error Classification & Fallback Decision
 Errors are classified to determine fallback behavior:
 | Error Type | Example | Fallback Action |
 |-----------|---------|-----------------|
 | Rate Limit (429) | "Too many requests" | Wait 30s, switch provider |
 | Server Error (5xx) | "Internal error" | Wait 30s, switch provider |
 | Auth Error (401) | "Invalid API key" | Switch immediately |
 | Billing Error (402) | "Insufficient credits" | Switch immediately |
 | Model Not Found (404) | "Unknown model" | Wait 30s, switch model |
 | User Error (400) | "Invalid request" | Return error, no fallback |
 | Token Limit | "Context length exceeded" | Return error, no fallback |
 **Continue Mechanism:**
 - For early terminations, system sends "continue" message
 - Retries up to 3 times with same model
 - After 3 failures, switches to fallback model
 ---
 ## 5. Admin Configuration Interface
 ### 5.1 Model Management
 The admin panel allows configuration of:
 - **Add/Update Models**: Select from discovered models or add custom
 - **Provider Priority**: Drag-and-drop reordering of provider fallback order
 - **Tier Assignment**: free (1x), plus (2x), pro (3x) multipliers
 - **Icon Selection**: Associate icons with models
 - **Media Support**: Enable/disable image upload capability
 ### 5.2 Provider Limits Configuration
 The limits interface provides:
 - **Provider Selection**: Dropdown for each configured provider
 - **Scope Selection**: Per-provider or per-model limits
 - **Limit Inputs**: Numeric fields for TPM, TPD, RPM, RPD
 - **Live Usage Display**: Current usage statistics per provider
 - **Save/Reset**: Persist or revert limit changes
 ### 5.3 Ultimate Backup Configuration
 The admin can configure a final fallback model that will be used when all other providers fail. This ensures system availability even during widespread outages.
 ---
 ## 6. Configuration Files
 ### 6.1 Environment Variables
 Key configuration is done via environment variables:
 ```bash
 # Provider API Keys
 OPENROUTER_API_KEY=
 MISTRAL_API_KEY=
 GOOGLE_API_KEY=
 GROQ_API_KEY=
 NVIDIA_API_KEY=
 CHUTES_API_KEY=
 # Admin Authentication
 ADMIN_USER=
 ADMIN_PASSWORD=
 # Rate Limiting
 ADMIN_LOGIN_RATE_LIMIT=5
 USER_LOGIN_RATE_LIMIT=10
 API_RATE_LIMIT=100
 ```
 ### 6.2 Runtime State
 The system maintains runtime state in:
 - `.data/.opencode-chat/provider-limits.json` - Persisted limits
 - `.data/.opencode-chat/provider-usage.json` - Recent usage
 - In-memory state for active sessions and rate tracking
 ---
 ## 7. Security Considerations
 ### 7.1 Rate Limiting
 - **Login Protection**: 5 attempts/minute, 15-minute lockout
 - **API Protection**: 100 requests/minute per user
 - **Provider Protection**: Configurable limits prevent abuse
 ### 7.2 Authentication
 - Session-based auth with secure cookies
 - OAuth support for Google and GitHub
 - Rate-limited login attempts
 - Session timeout enforcement
 ### 7.3 Data Protection
 - Provider API keys stored securely
 - Usage data retained only as needed
 - No sensitive data in logs
 - Encrypted session storage
 ---
 ## 8. Monitoring & Analytics
 ### 8.1 Tracking Metrics
 The system tracks:
 - **User Analytics**: Session duration, feature usage, model preferences
 - **Business Metrics**: MRR, LTV, churn rate, CAC
 - **Technical Metrics**: AI response times, error rates, queue wait times
 - **Provider Metrics**: Per-provider usage and error rates
 ### 8.2 Admin Dashboard
 The tracking page provides:
 - Daily/weekly/monthly active users
 - Revenue analytics
 - Conversion funnels
 - Error rate monitoring
 - Resource utilization
 ---
 ## 9. Summary
 PluginCompass provides a robust, multi-provider AI infrastructure with:
 1. **Flexible Provider Management**: Support for 6+ AI providers with automatic model discovery
 2. **Granular Rate Limiting**: Per-provider and per-model limits with configurable thresholds
 3. **Intelligent Fallback**: Multi-level fallback chains ensure high availability
 4. **Comprehensive Admin Control**: Full configuration through web-based admin panel
 5. **Usage Tracking**: Real-time monitoring of token consumption and API usage
 6. **Security Measures**: Rate limiting, authentication, and session management
 This architecture ensures reliable AI-powered development while maintaining control over costs and system availability.