Add comprehensive model fallback improvement plan

- Detailed error classification system for tool errors, early termination, and provider errors - Provider-specific error mappings for 11 LLM providers (OpenAI, Anthropic, OpenRouter, Chutes, NVIDIA, Together, Fireworks, Mistral, Groq, Google) - Continue message system with 3-attempt limit before model switch - 30-second wait for transient/rate limit errors before switching - Distinguishes tool errors (return to user) from provider errors (switch model) - Implementation plan with code examples for server.js and processor.ts
2026-02-08 14:16:32 +00:00
parent 9ef54cf6ee
commit 2dc94310a6
1 changed files with 744 additions and 0 deletions
--- a/MODEL_FALLBACK_IMPROVEMENT_PLAN.md
+++ b/MODEL_FALLBACK_IMPROVEMENT_PLAN.md
@@ -0,0 +1,744 @@
+# Model Fallback & Continue Functionality Improvement Plan
+
+## Executive Summary
+
+This plan outlines improvements to the model fallback system to handle three distinct error categories:
+1. **Bad tool calls** - Send error data back to user/model for retry
+2. **Request stops** - Send continue message, retry 3x, then switch model
+3. **Provider errors** - Wait 30 seconds, then switch to next model in fallback chain
+
+---
+
+## Current Implementation Analysis
+
+### Existing Components
+
+| File | Function | Current Behavior |
+|------|-----------|-----------------|
+| `chat/server.js:9347` | `shouldFallbackCliError()` | Decides if error warrants fallback based on patterns |
+| `chat/server.js:9502` | `sendToOpencodeWithFallback()` | Main fallback orchestration with model chain |
+| `chat/server.js:9296` | `buildOpencodeAttemptChain()` | Builds ordered provider/model chain |
+| `opencode/session/retry.ts` | `SessionRetry.delay()` | Calculates retry delays with exponential backoff |
+| `opencode/session/message-v2.ts:714` | `isOpenAiErrorRetryable()` | Determines if OpenAI errors are retryable |
+
+### Current Gaps
+
+1. **No explicit continue mechanism** - Early terminations counted but no "continue" message system
+2. **Tool errors not distinguished** - Tool errors treated same as provider errors
+3. **No 30-second wait for provider errors** - Immediate fallback on provider issues
+4. **Missing provider-specific error mappings** - Generic patterns only
+
+---
+
+## Proposed Architecture
+
+### Error Classification Flow
+
+```
+Error Occurs
+    │
+    ├── Tool Call Error?
+    │   ├── Yes → Check tool error type
+    │   │   ├── Validation/Schema error → Send error back, continue with same model
+    │   │   ├── Permission denied → Send error back, continue with same model
+    │   │   ├── Execution failure → Send error back, continue with same model
+    │   │   └── Tool timeout → Send error back, continue with same model
+    │
+    ├── Early Termination?
+    │   ├── Yes → Increment termination counter
+    │   │   ├── Count < 3? → Send "continue" message, retry same model
+    │   │   └── Count >= 3? → Switch to next model
+    │
+    └── Provider Error?
+        ├── Yes → Classify error type
+        │   ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
+        │   ├── Rate limit (429) → Wait 30s, switch model
+        │   ├── Auth/Billing (401, 402, 403) → Switch model immediately
+        │   ├── User error (400, 413) → Send error back, don't switch
+        │   └── Other (404, etc.) → Wait 30s, switch model
+```
+
+---
+
+## Provider-Specific Error Mappings
+
+### Error Categories & Actions
+
+| Category | Action | Wait Time | Switch Model? |
+|----------|--------|-----------|---------------|
+| **Tool Error (validation/schema)** | Return to user | 0s | No |
+| **Tool Error (execution/permission)** | Return to user | 0s | No |
+| **Early Termination** | Send continue | 0s | After 3 attempts |
+| **Transient Server Error (5xx)** | Wait | 30s | Yes |
+| **Rate Limit (429)** | Wait | 30s | Yes |
+| **Auth/Billing (401, 402)** | Switch immediately | 0s | Yes |
+| **Permission (403)** | Return to user | 0s | No |
+| **Not Found (404)** | Wait | 30s | Yes |
+| **User Error (400, 413)** | Return to user | 0s | No |
+| **Timeout (408)** | Wait | 30s | Yes |
+| **Overloaded (529)** | Wait | 30s | Yes |
+
+### Detailed Provider Error Codes
+
+#### OpenAI
+```
+400 (invalid_request_error) → User error - return to user
+401 (authentication_error) → Auth error - immediate switch
+402 (payment_required) → Billing error - immediate switch  
+403 (permission_error) → Permission error - return to user
+404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
+408 (timeout) → Timeout - wait 30s, switch
+429 (rate_limit_error) → Rate limit - wait 30s, switch
+500 (api_error) → Server error - wait 30s, switch
+529 → Overloaded - wait 30s, switch
+```
+
+#### Anthropic (Claude)
+```
+400 (invalid_request_error) → User error - return to user
+401 (authentication_error) → Auth error - immediate switch
+403 (permission_error) → Permission error - return to user
+404 (not_found_error) → Not found - wait 30s, switch
+413 (request_too_large) → User error - return to user
+429 (rate_limit_error) → Rate limit - wait 30s, switch
+500 (api_error) → Server error - wait 30s, switch
+529 (overloaded_error) → Overloaded - wait 30s, switch
+```
+
+#### OpenRouter
+```
+400 (bad_request) → User error - return to user
+401 (invalid_credentials) → Auth error - immediate switch
+402 (insufficient_credits) → Billing error - immediate switch
+403 (moderation_flagged) → Permission - return to user
+408 (timeout) → Timeout - wait 30s, switch
+429 (rate_limited) → Rate limit - wait 30s, switch
+502 (model_down) → Model down - wait 30s, switch
+503 (no_providers) → No providers - immediate switch
+```
+
+#### Chutes AI
+```
+MODEL_LOADING_FAILED → Transient - wait 30s, switch
+INFERENCE_TIMEOUT → Timeout - wait 30s, switch
+OUT_OF_MEMORY → Transient - wait 30s, switch
+INVALID_INPUT → User error - return to user
+MODEL_OVERLOADED → Overloaded - wait 30s, switch
+GENERATION_FAILED → Transient - wait 30s, switch
+CONTEXT_LENGTH_EXCEEDED → User error - return to user
+400 → Bad request - return to user
+429 → Rate limit - wait 30s, switch
+500+ → Server error - wait 30s, switch
+```
+
+#### NVIDIA NIM
+```
+401 → Auth error - immediate switch
+403 → Permission - return to user
+404 → Not found - wait 30s, switch
+429 (too_many_requests) → Rate limit - wait 30s, switch
+500+ → Server error - wait 30s, switch
+```
+
+#### Together AI
+```
+400 (invalid_request) → User error - return to user
+401 (authentication_error) → Auth error - immediate switch
+402 (payment_required) → Billing error - immediate switch
+403 (bad_request) → User error - return to user
+429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
+500+ → Server error - wait 30s, switch
+```
+
+#### Fireworks AI
+```
+400 → User error - return to user
+401 → Auth error - immediate switch
+429 → Rate limit - wait 30s, switch
+500+ → Server error - wait 30s, switch
+```
+
+#### Mistral
+```
+400 → Bad request - return to user
+401 → Unauthorized - immediate switch
+403 → Forbidden - return to user
+404 → Not found - wait 30s, switch
+429 → Too many requests - wait 30s, switch
+500+ → Server error - wait 30s, switch
+```
+
+#### Groq
+```
+400 → Bad request - return to user
+401 → Unauthorized - immediate switch
+402 → Payment required - immediate switch
+403 → Forbidden - return to user
+404 → Not found - wait 30s, switch
+413 → Payload too large - return to user
+429 → Rate limit - wait 30s, switch
+500+ → Server error - wait 30s, switch
+```
+
+#### Google (Gemini)
+```
+400 → Invalid request - return to user
+401 → Unauthorized - immediate switch
+403 → Permission denied - return to user
+404 → Not found - wait 30s, switch
+413 → Request too large - return to user
+429 (resource_exhausted) → Rate limit - wait 30s, switch
+500+ → Server error - wait 30s, switch
+```
+
+---
+
+## Implementation Plan
+
+### Phase 1: Error Classification System
+
+**File: `chat/server.js`**
+
+Create new function `classifyProviderError()`:
+
+```javascript
+function classifyProviderError(error, provider) {
+  // Extract HTTP status code
+  const statusCode = error.statusCode || error.code;
+  const errorMessage = (error.message || '').toLowerCase();
+  
+  const providerPatterns = {
+    openai: {
+      transient: [500, 502, 503, 504, 529],
+      rateLimit: 429,
+      auth: [401, 402],
+      permission: 403,
+      userError: [400],
+      notFound: 404,
+      timeout: 408
+    },
+    anthropic: {
+      transient: [500, 529],
+      rateLimit: 429,
+      auth: 401,
+      permission: 403,
+      userError: [400, 413],
+      notFound: 404
+    },
+    openrouter: {
+      transient: [502, 503],
+      rateLimit: 429,
+      auth: [401, 402],
+      permission: 403,
+      userError: [400],
+      timeout: 408,
+      notFound: 404
+    },
+    chutes: {
+      transient: [500, 502, 503],
+      rateLimit: 429,
+      auth: 401,
+      permission: 403,
+      userError: [400, 413],
+      notFound: 404
+    },
+    nvidia: {
+      transient: [500, 502, 503],
+      rateLimit: 429,
+      auth: 401,
+      permission: 403,
+      userError: [400],
+      notFound: 404
+    },
+    together: {
+      transient: [500, 502, 503],
+      rateLimit: 429,
+      auth: [401, 402],
+      permission: 403,
+      userError: [400],
+      notFound: 404
+    },
+    fireworks: {
+      transient: [500, 502, 503],
+      rateLimit: 429,
+      auth: 401,
+      userError: [400],
+      notFound: 404
+    },
+    mistral: {
+      transient: [500, 502, 503],
+      rateLimit: 429,
+      auth: 401,
+      permission: 403,
+      userError: [400],
+      notFound: 404
+    },
+    groq: {
+      transient: [500, 502, 503],
+      rateLimit: 429,
+      auth: [401, 402],
+      permission: 403,
+      userError: [400, 413],
+      notFound: 404
+    },
+    google: {
+      transient: [500, 502, 503],
+      rateLimit: 429,
+      auth: 401,
+      permission: 403,
+      userError: [400, 413],
+      notFound: 404
+    },
+    default: {
+      transient: [500, 502, 503, 529],
+      rateLimit: 429,
+      auth: [401, 402],
+      permission: 403,
+      userError: [400, 413],
+      notFound: 404
+    }
+  };
+  
+  const patterns = providerPatterns[provider] || providerPatterns.default;
+  
+  // Check for tool errors first (shouldn't happen here but just in case)
+  if (error.isToolError) {
+    return { category: 'toolError', action: 'return', waitTime: 0 };
+  }
+  
+  // Determine category based on status code
+  if (patterns.transient?.includes(statusCode)) {
+    return { category: 'transient', action: 'wait', waitTime: 30000 };
+  }
+  if (statusCode === patterns.rateLimit) {
+    return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
+  }
+  if (patterns.auth?.includes(statusCode)) {
+    return { category: 'auth', action: 'switch', waitTime: 0 };
+  }
+  if (statusCode === patterns.permission) {
+    return { category: 'permission', action: 'return', waitTime: 0 };
+  }
+  if (patterns.userError?.includes(statusCode)) {
+    return { category: 'userError', action: 'return', waitTime: 0 };
+  }
+  if (statusCode === patterns.timeout) {
+    return { category: 'timeout', action: 'wait', waitTime: 30000 };
+  }
+  if (statusCode === patterns.notFound) {
+    // Special case: OpenAI treats 404 as retryable
+    return { category: 'notFound', action: 'wait', waitTime: 30000 };
+  }
+  
+  // Default to transient for 5xx
+  if (statusCode >= 500) {
+    return { category: 'serverError', action: 'wait', waitTime: 30000 };
+  }
+  
+  // Check error message for additional patterns
+  if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
+    return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
+  }
+  if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
+    return { category: 'billing', action: 'switch', waitTime: 0 };
+  }
+  if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
+    return { category: 'userError', action: 'return', waitTime: 0 };
+  }
+  
+  // Unknown error - switch immediately
+  return { category: 'unknown', action: 'switch', waitTime: 0 };
+}
+```
+
+### Phase 2: Tool Error Handling
+
+**File: `opencode/packages/opencode/src/session/processor.ts`**
+
+Enhance tool-error handling to distinguish between tool error types:
+
+```typescript
+// Add tool error type classification
+enum ToolErrorType {
+  validation = 'validation',
+  permission = 'permission',
+  timeout = 'timeout',
+  notFound = 'notFound',
+  execution = 'execution'
+}
+
+function classifyToolError(error: unknown): ToolErrorType {
+  const message = String(error).toLowerCase();
+  
+  if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
+    return ToolErrorType.validation;
+  }
+  if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
+    return ToolErrorType.permission;
+  }
+  if (message.includes('timeout') || message.includes('timed out')) {
+    return ToolErrorType.timeout;
+  }
+  if (message.includes('not found') || message.includes('does not exist')) {
+    return ToolErrorType.notFound;
+  }
+  return ToolErrorType.execution;
+}
+
+// In the switch case for "tool-error"
+case "tool-error": {
+  const match = toolcalls[value.toolCallId];
+  if (match && match.state.status === "running") {
+    await Session.updatePart({
+      ...match,
+      state: {
+        status: "error",
+        input: value.input ?? match.state.input,
+        error: (value.error as any).toString(),
+        errorType: classifyToolError(value.error),
+        time: {
+          start: match.state.time.start,
+          end: Date.now(),
+        },
+      },
+    })
+
+    // Don't trigger fallback for tool errors - let model retry
+    // Only trigger fallback for permission rejections
+    if (
+      value.error instanceof PermissionNext.RejectedError ||
+      value.error instanceof Question.RejectedError
+    ) {
+      blocked = shouldBreak
+    }
+    
+    // Mark that this was a tool error (not provider error)
+    (value.error as any).isToolError = true;
+    
+    delete toolcalls[value.toolCallId]
+  }
+  break;
+}
+```
+
+**File: `chat/server.js`**
+
+Modify `shouldFallbackCliError()` to check for tool errors:
+
+```javascript
+function shouldFallbackCliError(err, message) {
+  if (!err) return false;
+  
+  // Don't fallback on tool errors - let model retry
+  if (err.isToolError) {
+    log('Tool error detected - no fallback needed', {
+      error: err.message,
+      toolError: true
+    });
+    return false;
+  }
+  
+  // ... rest of existing checks
+}
+```
+
+### Phase 3: Continue Message System
+
+Enhance `sendToOpencodeWithFallback()` with continue message tracking and provider error handling:
+
+```javascript
+async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) {
+  const cliName = normalizeCli(cli || session?.cli);
+  const preferredModel = model || session?.model;
+  const chain = buildOpencodeAttemptChain(cliName, preferredModel);
+  const tried = new Set();
+  const attempts = [];
+  let lastError = null;
+  let switchedToBackup = false;
+  
+  // Track continue attempts per model
+  const continueAttempts = new Map();
+  const MAX_CONTINUE_ATTEMPTS = 3;
+  const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.';
+  
+  // Track last error type to prevent infinite loops
+  const lastErrorTypes = new Map();
+  
+  log('Fallback sequence initiated', {
+    sessionId: session?.id,
+    messageId: message?.id,
+    primaryModel: preferredModel,
+    cliName,
+    chainLength: chain.length,
+    timestamp: new Date().toISOString()
+  });
+
+  const tryOption = async (option, isBackup = false) => {
+    const key = `${option.provider}:${option.model}`;
+    if (tried.has(key)) return null;
+    tried.add(key);
+    
+    const limit = isProviderLimited(option.provider, option.model);
+    if (limit.limited) {
+      attempts.push({ 
+        model: option.model, 
+        provider: option.provider, 
+        error: `limit: ${limit.reason}`,
+        classification: 'rateLimit'
+      });
+      return null;
+    }
+    
+    try {
+      resetMessageStreamingFields(message);
+      
+      // Handle continue messages
+      let messageContent = content;
+      const modelKey = `${option.provider}:${option.model}`;
+      const continueCount = continueAttempts.get(modelKey) || 0;
+      
+      if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) {
+        messageContent = `${CONTINUE_MESSAGE}\n\n${content}`;
+        log('Sending continue message', {
+          model: option.model,
+          provider: option.provider,
+          attempt: continueCount,
+          modelKey
+        });
+      }
+      
+      const result = await sendToOpencode({ 
+        session, 
+        model: option.model, 
+        content: messageContent, 
+        message, 
+        cli: cliName, 
+        streamCallback, 
+        opencodeSessionId 
+      });
+      
+      const normalizedResult = (result && typeof result === 'object') ? result : { reply: result };
+      
+      // Token usage tracking (existing code)
+      let tokensUsed = 0;
+      let tokenSource = 'none';
+      let tokenExtractionLog = [];
+      
+      if (result && typeof result === 'object' && result.tokensUsed > 0) {
+        tokensUsed = result.tokensUsed;
+        tokenSource = result.tokenSource || 'result';
+        tokenExtractionLog = result.tokenExtractionLog || [];
+      } else {
+        tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false });
+        if (tokensUsed > 0) {
+          tokenSource = 'response-extracted';
+          tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed });
+        }
+      }
+      
+      // Success: reset counters
+      continueAttempts.delete(modelKey);
+      lastErrorTypes.delete(modelKey);
+      
+      recordProviderUsage(option.provider, option.model, tokensUsed, 1);
+      
+      if (attempts.length) {
+        log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider });
+      }
+      
+      return { 
+        reply: normalizedResult.reply, 
+        model: option.model, 
+        attempts, 
+        provider: option.provider, 
+        raw: normalizedResult.raw, 
+        tokensUsed,
+        tokenSource,
+        tokenExtractionLog
+      };
+      
+    } catch (err) {
+      lastError = err;
+      
+      const errorData = {
+        model: option.model,
+        provider: option.provider,
+        error: err.message || String(err),
+        code: err.code || null,
+        timestamp: new Date().toISOString()
+      };
+      
+      // Check for early termination
+      if (err.earlyTermination) {
+        const partialOutputLength = (message?.partialOutput || '').length;
+        const hasSubstantialOutput = partialOutputLength > 500;
+        
+        if (hasSubstantialOutput) {
+          log('Blocking fallback - model has substantial output despite early termination', {
+            model: option.model,
+            provider: option.provider,
+            error: err.message,
+            partialOutputLength
+          });
+          return err;
+        }
+        
+        // Increment continue counter
+        const modelKey = `${option.provider}:${option.model}`;
+        const currentCount = continueAttempts.get(modelKey) || 0;
+        continueAttempts.set(modelKey, currentCount + 1);
+        
+        log('Early termination detected', {
+          model: option.model,
+          provider: option.provider,
+          continueAttempt: currentCount + 1,
+          maxAttempts: MAX_CONTINUE_ATTEMPTS
+        });
+        
+        // Retry with same model if under limit
+        if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) {
+          errorData.earlyTermination = true;
+          errorData.continueAttempt = currentCount + 1;
+          errorData.willContinue = true;
+          attempts.push(errorData);
+          
+          // Remove from tried set to allow retry with same option
+          tried.delete(key);
+          
+          return null;
+        }
+        
+        // Switch to next model after MAX_CONTINUE_ATTEMPTS
+        log('Max continue attempts reached, switching model', {
+          model: option.model,
+          provider: option.provider,
+          totalAttempts: MAX_CONTINUE_ATTEMPTS
+        });
+        
+        attempts.push(errorData);
+        return null;
+      }
+      
+      // Classify provider error
+      const classification = classifyProviderError(err, option.provider);
+      errorData.classification = classification.category;
+      
+      // Track error types to prevent infinite loops
+      const modelKey = `${option.provider}:${option.model}`;
+      const lastErrorType = lastErrorTypes.get(modelKey);
+      
+      if (lastErrorType === classification.category && 
+          classification.category !== 'unknown') {
+        // Same error type twice in a row - might be persistent error
+        log('Repeated error type detected, may need immediate switch', {
+          model: option.model,
+          provider: option.provider,
+          errorType: classification.category
+        });
+        lastErrorTypes.set(modelKey, classification.category);
+      }
+      
+      if (classification.action === 'return') {
+        // User/permission errors - return to user
+        log('User/permission error - returning to user', {
+          category: classification.category,
+          model: option.model,
+          provider: option.provider
+        });
+        err.willNotFallback = true;
+        return err;
+      }
+      
+      if (classification.action === 'wait') {
+        // Transient/rate limit errors - wait before switch
+        log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, {
+          model: option.model,
+          provider: option.provider,
+          category: classification.category,
+          waitTime: classification.waitTime
+        });
+        
+        errorData.willWait = true;
+        errorData.waitTime = classification.waitTime;
+        attempts.push(errorData);
+        
+        // Wait before allowing next attempt
+        await new Promise(resolve => setTimeout(resolve, classification.waitTime));
+        
+        return null;
+      }
+      
+      // Switch immediately for auth/unknown errors
+      errorData.immediateSwitch = true;
+      attempts.push(errorData);
+      
+      return null;
+    }
+  };
+  
+  // Try each option in chain
+  for (const option of chain) {
+    const result = await tryOption(option);
+    if (result instanceof Error) break;
+    if (result) return result;
+  }
+  
+  // Try backup model if configured
+  const backupModel = (providerLimits.opencodeBackupModel || '').trim();
+  if (backupModel) {
+    const backupChain = buildOpencodeAttemptChain(cliName, backupModel);
+    for (const option of backupChain) {
+      const result = await tryOption(option, true);
+      if (result instanceof Error) break;
+      if (result) return result;
+    }
+  }
+  
+  const err = new Error(`All ${cliName.toUpperCase()} models failed`);
+  err.attempts = attempts;
+  err.cause = lastError;
+  throw err;
+}
+```
+
+---
+
+## Summary of Files Modified
+
+1. **`MODEL_FALLBACK_IMPROVEMENT_PLAN.md`** (NEW) - This comprehensive plan document
+2. **`chat/server.js`** - Major modifications:
+   - Add `classifyProviderError()` function (~120 lines)
+   - Modify `sendToOpencodeWithFallback()` with continue message logic
+   - Update `shouldFallbackCliError()` to handle tool errors
+
+3. **`opencode/packages/opencode/src/session/processor.ts`** - Minor modifications:
+   - Add `classifyToolError()` function
+   - Enhance tool-error case with error type classification
+
+---
+
+## Testing Checklist
+
+- [ ] Tool errors don't trigger fallback
+- [ ] Early termination sends continue messages (max 3 attempts)
+- [ ] Provider errors with >=500 status wait 30s before switch
+- [ ] Rate limit errors (429) wait 30s before switch
+- [ ] Auth errors (401, 402) switch immediately
+- [ ] Permission errors (403) return to user without switch
+- [ ] User errors (400, 413) return to user without switch
+- [ ] Continue attempts reset on successful response
+- [ ] Fallback chain respects continue attempts per model
+- [ ] Logging captures all error classifications and actions
+
+---
+
+## Monitoring & Analytics
+
+Add tracking for:
+1. Error type distribution
+2. Continue message frequency
+3. Provider error wait times
+4. Model switch patterns
+5. Tool error vs provider error ratios
+
+Export to monitoring system for analysis and optimization.