# Model Fallback & Continue Functionality Improvement Plan ## Executive Summary This plan outlines improvements to the model fallback system to handle three distinct error categories: 1. **Bad tool calls** - Send error data back to user/model for retry 2. **Request stops** - Send continue message, retry 3x, then switch model 3. **Provider errors** - Wait 30 seconds, then switch to next model in fallback chain --- ## Current Implementation Analysis ### Existing Components | File | Function | Current Behavior | |------|-----------|-----------------| | `chat/server.js:9347` | `shouldFallbackCliError()` | Decides if error warrants fallback based on patterns | | `chat/server.js:9502` | `sendToOpencodeWithFallback()` | Main fallback orchestration with model chain | | `chat/server.js:9296` | `buildOpencodeAttemptChain()` | Builds ordered provider/model chain | | `opencode/session/retry.ts` | `SessionRetry.delay()` | Calculates retry delays with exponential backoff | | `opencode/session/message-v2.ts:714` | `isOpenAiErrorRetryable()` | Determines if OpenAI errors are retryable | ### Current Gaps 1. **No explicit continue mechanism** - Early terminations counted but no "continue" message system 2. **Tool errors not distinguished** - Tool errors treated same as provider errors 3. **No 30-second wait for provider errors** - Immediate fallback on provider issues 4. **Missing provider-specific error mappings** - Generic patterns only --- ## Proposed Architecture ### Error Classification Flow ``` Error Occurs │ ├── Tool Call Error? │ ├── Yes → Check tool error type │ │ ├── Validation/Schema error → Send error back, continue with same model │ │ ├── Permission denied → Send error back, continue with same model │ │ ├── Execution failure → Send error back, continue with same model │ │ └── Tool timeout → Send error back, continue with same model │ ├── Early Termination? │ ├── Yes → Increment termination counter │ │ ├── Count < 3? → Send "continue" message, retry same model │ │ └── Count >= 3? → Switch to next model │ └── Provider Error? ├── Yes → Classify error type │ ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model │ ├── Rate limit (429) → Wait 30s, switch model │ ├── Auth/Billing (401, 402, 403) → Switch model immediately │ ├── User error (400, 413) → Send error back, don't switch │ └── Other (404, etc.) → Wait 30s, switch model ``` --- ## Provider-Specific Error Mappings ### Error Categories & Actions | Category | Action | Wait Time | Switch Model? | |----------|--------|-----------|---------------| | **Tool Error (validation/schema)** | Return to user | 0s | No | | **Tool Error (execution/permission)** | Return to user | 0s | No | | **Early Termination** | Send continue | 0s | After 3 attempts | | **Transient Server Error (5xx)** | Wait | 30s | Yes | | **Rate Limit (429)** | Wait | 30s | Yes | | **Auth/Billing (401, 402)** | Switch immediately | 0s | Yes | | **Permission (403)** | Return to user | 0s | No | | **Not Found (404)** | Wait | 30s | Yes | | **User Error (400, 413)** | Return to user | 0s | No | | **Timeout (408)** | Wait | 30s | Yes | | **Overloaded (529)** | Wait | 30s | Yes | ### Detailed Provider Error Codes #### OpenAI ``` 400 (invalid_request_error) → User error - return to user 401 (authentication_error) → Auth error - immediate switch 402 (payment_required) → Billing error - immediate switch 403 (permission_error) → Permission error - return to user 404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch 408 (timeout) → Timeout - wait 30s, switch 429 (rate_limit_error) → Rate limit - wait 30s, switch 500 (api_error) → Server error - wait 30s, switch 529 → Overloaded - wait 30s, switch ``` #### Anthropic (Claude) ``` 400 (invalid_request_error) → User error - return to user 401 (authentication_error) → Auth error - immediate switch 403 (permission_error) → Permission error - return to user 404 (not_found_error) → Not found - wait 30s, switch 413 (request_too_large) → User error - return to user 429 (rate_limit_error) → Rate limit - wait 30s, switch 500 (api_error) → Server error - wait 30s, switch 529 (overloaded_error) → Overloaded - wait 30s, switch ``` #### OpenRouter ``` 400 (bad_request) → User error - return to user 401 (invalid_credentials) → Auth error - immediate switch 402 (insufficient_credits) → Billing error - immediate switch 403 (moderation_flagged) → Permission - return to user 408 (timeout) → Timeout - wait 30s, switch 429 (rate_limited) → Rate limit - wait 30s, switch 502 (model_down) → Model down - wait 30s, switch 503 (no_providers) → No providers - immediate switch ``` #### Chutes AI ``` MODEL_LOADING_FAILED → Transient - wait 30s, switch INFERENCE_TIMEOUT → Timeout - wait 30s, switch OUT_OF_MEMORY → Transient - wait 30s, switch INVALID_INPUT → User error - return to user MODEL_OVERLOADED → Overloaded - wait 30s, switch GENERATION_FAILED → Transient - wait 30s, switch CONTEXT_LENGTH_EXCEEDED → User error - return to user 400 → Bad request - return to user 429 → Rate limit - wait 30s, switch 500+ → Server error - wait 30s, switch ``` #### NVIDIA NIM ``` 401 → Auth error - immediate switch 403 → Permission - return to user 404 → Not found - wait 30s, switch 429 (too_many_requests) → Rate limit - wait 30s, switch 500+ → Server error - wait 30s, switch ``` #### Together AI ``` 400 (invalid_request) → User error - return to user 401 (authentication_error) → Auth error - immediate switch 402 (payment_required) → Billing error - immediate switch 403 (bad_request) → User error - return to user 429 (rate_limit_exceeded) → Rate limit - wait 30s, switch 500+ → Server error - wait 30s, switch ``` #### Fireworks AI ``` 400 → User error - return to user 401 → Auth error - immediate switch 429 → Rate limit - wait 30s, switch 500+ → Server error - wait 30s, switch ``` #### Mistral ``` 400 → Bad request - return to user 401 → Unauthorized - immediate switch 403 → Forbidden - return to user 404 → Not found - wait 30s, switch 429 → Too many requests - wait 30s, switch 500+ → Server error - wait 30s, switch ``` #### Groq ``` 400 → Bad request - return to user 401 → Unauthorized - immediate switch 402 → Payment required - immediate switch 403 → Forbidden - return to user 404 → Not found - wait 30s, switch 413 → Payload too large - return to user 429 → Rate limit - wait 30s, switch 500+ → Server error - wait 30s, switch ``` #### Google (Gemini) ``` 400 → Invalid request - return to user 401 → Unauthorized - immediate switch 403 → Permission denied - return to user 404 → Not found - wait 30s, switch 413 → Request too large - return to user 429 (resource_exhausted) → Rate limit - wait 30s, switch 500+ → Server error - wait 30s, switch ``` --- ## Implementation Plan ### Phase 1: Error Classification System **File: `chat/server.js`** Create new function `classifyProviderError()`: ```javascript function classifyProviderError(error, provider) { // Extract HTTP status code const statusCode = error.statusCode || error.code; const errorMessage = (error.message || '').toLowerCase(); const providerPatterns = { openai: { transient: [500, 502, 503, 504, 529], rateLimit: 429, auth: [401, 402], permission: 403, userError: [400], notFound: 404, timeout: 408 }, anthropic: { transient: [500, 529], rateLimit: 429, auth: 401, permission: 403, userError: [400, 413], notFound: 404 }, openrouter: { transient: [502, 503], rateLimit: 429, auth: [401, 402], permission: 403, userError: [400], timeout: 408, notFound: 404 }, chutes: { transient: [500, 502, 503], rateLimit: 429, auth: 401, permission: 403, userError: [400, 413], notFound: 404 }, nvidia: { transient: [500, 502, 503], rateLimit: 429, auth: 401, permission: 403, userError: [400], notFound: 404 }, together: { transient: [500, 502, 503], rateLimit: 429, auth: [401, 402], permission: 403, userError: [400], notFound: 404 }, fireworks: { transient: [500, 502, 503], rateLimit: 429, auth: 401, userError: [400], notFound: 404 }, mistral: { transient: [500, 502, 503], rateLimit: 429, auth: 401, permission: 403, userError: [400], notFound: 404 }, groq: { transient: [500, 502, 503], rateLimit: 429, auth: [401, 402], permission: 403, userError: [400, 413], notFound: 404 }, google: { transient: [500, 502, 503], rateLimit: 429, auth: 401, permission: 403, userError: [400, 413], notFound: 404 }, default: { transient: [500, 502, 503, 529], rateLimit: 429, auth: [401, 402], permission: 403, userError: [400, 413], notFound: 404 } }; const patterns = providerPatterns[provider] || providerPatterns.default; // Check for tool errors first (shouldn't happen here but just in case) if (error.isToolError) { return { category: 'toolError', action: 'return', waitTime: 0 }; } // Determine category based on status code if (patterns.transient?.includes(statusCode)) { return { category: 'transient', action: 'wait', waitTime: 30000 }; } if (statusCode === patterns.rateLimit) { return { category: 'rateLimit', action: 'wait', waitTime: 30000 }; } if (patterns.auth?.includes(statusCode)) { return { category: 'auth', action: 'switch', waitTime: 0 }; } if (statusCode === patterns.permission) { return { category: 'permission', action: 'return', waitTime: 0 }; } if (patterns.userError?.includes(statusCode)) { return { category: 'userError', action: 'return', waitTime: 0 }; } if (statusCode === patterns.timeout) { return { category: 'timeout', action: 'wait', waitTime: 30000 }; } if (statusCode === patterns.notFound) { // Special case: OpenAI treats 404 as retryable return { category: 'notFound', action: 'wait', waitTime: 30000 }; } // Default to transient for 5xx if (statusCode >= 500) { return { category: 'serverError', action: 'wait', waitTime: 30000 }; } // Check error message for additional patterns if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) { return { category: 'modelNotFound', action: 'wait', waitTime: 30000 }; } if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) { return { category: 'billing', action: 'switch', waitTime: 0 }; } if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) { return { category: 'userError', action: 'return', waitTime: 0 }; } // Unknown error - switch immediately return { category: 'unknown', action: 'switch', waitTime: 0 }; } ``` ### Phase 2: Tool Error Handling **File: `opencode/packages/opencode/src/session/processor.ts`** Enhance tool-error handling to distinguish between tool error types: ```typescript // Add tool error type classification enum ToolErrorType { validation = 'validation', permission = 'permission', timeout = 'timeout', notFound = 'notFound', execution = 'execution' } function classifyToolError(error: unknown): ToolErrorType { const message = String(error).toLowerCase(); if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) { return ToolErrorType.validation; } if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) { return ToolErrorType.permission; } if (message.includes('timeout') || message.includes('timed out')) { return ToolErrorType.timeout; } if (message.includes('not found') || message.includes('does not exist')) { return ToolErrorType.notFound; } return ToolErrorType.execution; } // In the switch case for "tool-error" case "tool-error": { const match = toolcalls[value.toolCallId]; if (match && match.state.status === "running") { await Session.updatePart({ ...match, state: { status: "error", input: value.input ?? match.state.input, error: (value.error as any).toString(), errorType: classifyToolError(value.error), time: { start: match.state.time.start, end: Date.now(), }, }, }) // Don't trigger fallback for tool errors - let model retry // Only trigger fallback for permission rejections if ( value.error instanceof PermissionNext.RejectedError || value.error instanceof Question.RejectedError ) { blocked = shouldBreak } // Mark that this was a tool error (not provider error) (value.error as any).isToolError = true; delete toolcalls[value.toolCallId] } break; } ``` **File: `chat/server.js`** Modify `shouldFallbackCliError()` to check for tool errors: ```javascript function shouldFallbackCliError(err, message) { if (!err) return false; // Don't fallback on tool errors - let model retry if (err.isToolError) { log('Tool error detected - no fallback needed', { error: err.message, toolError: true }); return false; } // ... rest of existing checks } ``` ### Phase 3: Continue Message System Enhance `sendToOpencodeWithFallback()` with continue message tracking and provider error handling: ```javascript async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) { const cliName = normalizeCli(cli || session?.cli); const preferredModel = model || session?.model; const chain = buildOpencodeAttemptChain(cliName, preferredModel); const tried = new Set(); const attempts = []; let lastError = null; let switchedToBackup = false; // Track continue attempts per model const continueAttempts = new Map(); const MAX_CONTINUE_ATTEMPTS = 3; const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.'; // Track last error type to prevent infinite loops const lastErrorTypes = new Map(); log('Fallback sequence initiated', { sessionId: session?.id, messageId: message?.id, primaryModel: preferredModel, cliName, chainLength: chain.length, timestamp: new Date().toISOString() }); const tryOption = async (option, isBackup = false) => { const key = `${option.provider}:${option.model}`; if (tried.has(key)) return null; tried.add(key); const limit = isProviderLimited(option.provider, option.model); if (limit.limited) { attempts.push({ model: option.model, provider: option.provider, error: `limit: ${limit.reason}`, classification: 'rateLimit' }); return null; } try { resetMessageStreamingFields(message); // Handle continue messages let messageContent = content; const modelKey = `${option.provider}:${option.model}`; const continueCount = continueAttempts.get(modelKey) || 0; if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) { messageContent = `${CONTINUE_MESSAGE}\n\n${content}`; log('Sending continue message', { model: option.model, provider: option.provider, attempt: continueCount, modelKey }); } const result = await sendToOpencode({ session, model: option.model, content: messageContent, message, cli: cliName, streamCallback, opencodeSessionId }); const normalizedResult = (result && typeof result === 'object') ? result : { reply: result }; // Token usage tracking (existing code) let tokensUsed = 0; let tokenSource = 'none'; let tokenExtractionLog = []; if (result && typeof result === 'object' && result.tokensUsed > 0) { tokensUsed = result.tokensUsed; tokenSource = result.tokenSource || 'result'; tokenExtractionLog = result.tokenExtractionLog || []; } else { tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false }); if (tokensUsed > 0) { tokenSource = 'response-extracted'; tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed }); } } // Success: reset counters continueAttempts.delete(modelKey); lastErrorTypes.delete(modelKey); recordProviderUsage(option.provider, option.model, tokensUsed, 1); if (attempts.length) { log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider }); } return { reply: normalizedResult.reply, model: option.model, attempts, provider: option.provider, raw: normalizedResult.raw, tokensUsed, tokenSource, tokenExtractionLog }; } catch (err) { lastError = err; const errorData = { model: option.model, provider: option.provider, error: err.message || String(err), code: err.code || null, timestamp: new Date().toISOString() }; // Check for early termination if (err.earlyTermination) { const partialOutputLength = (message?.partialOutput || '').length; const hasSubstantialOutput = partialOutputLength > 500; if (hasSubstantialOutput) { log('Blocking fallback - model has substantial output despite early termination', { model: option.model, provider: option.provider, error: err.message, partialOutputLength }); return err; } // Increment continue counter const modelKey = `${option.provider}:${option.model}`; const currentCount = continueAttempts.get(modelKey) || 0; continueAttempts.set(modelKey, currentCount + 1); log('Early termination detected', { model: option.model, provider: option.provider, continueAttempt: currentCount + 1, maxAttempts: MAX_CONTINUE_ATTEMPTS }); // Retry with same model if under limit if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) { errorData.earlyTermination = true; errorData.continueAttempt = currentCount + 1; errorData.willContinue = true; attempts.push(errorData); // Remove from tried set to allow retry with same option tried.delete(key); return null; } // Switch to next model after MAX_CONTINUE_ATTEMPTS log('Max continue attempts reached, switching model', { model: option.model, provider: option.provider, totalAttempts: MAX_CONTINUE_ATTEMPTS }); attempts.push(errorData); return null; } // Classify provider error const classification = classifyProviderError(err, option.provider); errorData.classification = classification.category; // Track error types to prevent infinite loops const modelKey = `${option.provider}:${option.model}`; const lastErrorType = lastErrorTypes.get(modelKey); if (lastErrorType === classification.category && classification.category !== 'unknown') { // Same error type twice in a row - might be persistent error log('Repeated error type detected, may need immediate switch', { model: option.model, provider: option.provider, errorType: classification.category }); lastErrorTypes.set(modelKey, classification.category); } if (classification.action === 'return') { // User/permission errors - return to user log('User/permission error - returning to user', { category: classification.category, model: option.model, provider: option.provider }); err.willNotFallback = true; return err; } if (classification.action === 'wait') { // Transient/rate limit errors - wait before switch log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, { model: option.model, provider: option.provider, category: classification.category, waitTime: classification.waitTime }); errorData.willWait = true; errorData.waitTime = classification.waitTime; attempts.push(errorData); // Wait before allowing next attempt await new Promise(resolve => setTimeout(resolve, classification.waitTime)); return null; } // Switch immediately for auth/unknown errors errorData.immediateSwitch = true; attempts.push(errorData); return null; } }; // Try each option in chain for (const option of chain) { const result = await tryOption(option); if (result instanceof Error) break; if (result) return result; } // Try backup model if configured const backupModel = (providerLimits.opencodeBackupModel || '').trim(); if (backupModel) { const backupChain = buildOpencodeAttemptChain(cliName, backupModel); for (const option of backupChain) { const result = await tryOption(option, true); if (result instanceof Error) break; if (result) return result; } } const err = new Error(`All ${cliName.toUpperCase()} models failed`); err.attempts = attempts; err.cause = lastError; throw err; } ``` --- ## Summary of Files Modified 1. **`MODEL_FALLBACK_IMPROVEMENT_PLAN.md`** (NEW) - This comprehensive plan document 2. **`chat/server.js`** - Major modifications: - Add `classifyProviderError()` function (~120 lines) - Modify `sendToOpencodeWithFallback()` with continue message logic - Update `shouldFallbackCliError()` to handle tool errors 3. **`opencode/packages/opencode/src/session/processor.ts`** - Minor modifications: - Add `classifyToolError()` function - Enhance tool-error case with error type classification --- ## Testing Checklist - [ ] Tool errors don't trigger fallback - [ ] Early termination sends continue messages (max 3 attempts) - [ ] Provider errors with >=500 status wait 30s before switch - [ ] Rate limit errors (429) wait 30s before switch - [ ] Auth errors (401, 402) switch immediately - [ ] Permission errors (403) return to user without switch - [ ] User errors (400, 413) return to user without switch - [ ] Continue attempts reset on successful response - [ ] Fallback chain respects continue attempts per model - [ ] Logging captures all error classifications and actions --- ## Monitoring & Analytics Add tracking for: 1. Error type distribution 2. Continue message frequency 3. Provider error wait times 4. Model switch patterns 5. Tool error vs provider error ratios Export to monitoring system for analysis and optimization.