diff --git a/MODEL_FALLBACK_IMPROVEMENT_PLAN.md b/MODEL_FALLBACK_IMPROVEMENT_PLAN.md new file mode 100644 index 0000000..044ce24 --- /dev/null +++ b/MODEL_FALLBACK_IMPROVEMENT_PLAN.md @@ -0,0 +1,744 @@ +# Model Fallback & Continue Functionality Improvement Plan + +## Executive Summary + +This plan outlines improvements to the model fallback system to handle three distinct error categories: +1. **Bad tool calls** - Send error data back to user/model for retry +2. **Request stops** - Send continue message, retry 3x, then switch model +3. **Provider errors** - Wait 30 seconds, then switch to next model in fallback chain + +--- + +## Current Implementation Analysis + +### Existing Components + +| File | Function | Current Behavior | +|------|-----------|-----------------| +| `chat/server.js:9347` | `shouldFallbackCliError()` | Decides if error warrants fallback based on patterns | +| `chat/server.js:9502` | `sendToOpencodeWithFallback()` | Main fallback orchestration with model chain | +| `chat/server.js:9296` | `buildOpencodeAttemptChain()` | Builds ordered provider/model chain | +| `opencode/session/retry.ts` | `SessionRetry.delay()` | Calculates retry delays with exponential backoff | +| `opencode/session/message-v2.ts:714` | `isOpenAiErrorRetryable()` | Determines if OpenAI errors are retryable | + +### Current Gaps + +1. **No explicit continue mechanism** - Early terminations counted but no "continue" message system +2. **Tool errors not distinguished** - Tool errors treated same as provider errors +3. **No 30-second wait for provider errors** - Immediate fallback on provider issues +4. **Missing provider-specific error mappings** - Generic patterns only + +--- + +## Proposed Architecture + +### Error Classification Flow + +``` +Error Occurs + │ + ├── Tool Call Error? + │ ├── Yes → Check tool error type + │ │ ├── Validation/Schema error → Send error back, continue with same model + │ │ ├── Permission denied → Send error back, continue with same model + │ │ ├── Execution failure → Send error back, continue with same model + │ │ └── Tool timeout → Send error back, continue with same model + │ + ├── Early Termination? + │ ├── Yes → Increment termination counter + │ │ ├── Count < 3? → Send "continue" message, retry same model + │ │ └── Count >= 3? → Switch to next model + │ + └── Provider Error? + ├── Yes → Classify error type + │ ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model + │ ├── Rate limit (429) → Wait 30s, switch model + │ ├── Auth/Billing (401, 402, 403) → Switch model immediately + │ ├── User error (400, 413) → Send error back, don't switch + │ └── Other (404, etc.) → Wait 30s, switch model +``` + +--- + +## Provider-Specific Error Mappings + +### Error Categories & Actions + +| Category | Action | Wait Time | Switch Model? | +|----------|--------|-----------|---------------| +| **Tool Error (validation/schema)** | Return to user | 0s | No | +| **Tool Error (execution/permission)** | Return to user | 0s | No | +| **Early Termination** | Send continue | 0s | After 3 attempts | +| **Transient Server Error (5xx)** | Wait | 30s | Yes | +| **Rate Limit (429)** | Wait | 30s | Yes | +| **Auth/Billing (401, 402)** | Switch immediately | 0s | Yes | +| **Permission (403)** | Return to user | 0s | No | +| **Not Found (404)** | Wait | 30s | Yes | +| **User Error (400, 413)** | Return to user | 0s | No | +| **Timeout (408)** | Wait | 30s | Yes | +| **Overloaded (529)** | Wait | 30s | Yes | + +### Detailed Provider Error Codes + +#### OpenAI +``` +400 (invalid_request_error) → User error - return to user +401 (authentication_error) → Auth error - immediate switch +402 (payment_required) → Billing error - immediate switch +403 (permission_error) → Permission error - return to user +404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch +408 (timeout) → Timeout - wait 30s, switch +429 (rate_limit_error) → Rate limit - wait 30s, switch +500 (api_error) → Server error - wait 30s, switch +529 → Overloaded - wait 30s, switch +``` + +#### Anthropic (Claude) +``` +400 (invalid_request_error) → User error - return to user +401 (authentication_error) → Auth error - immediate switch +403 (permission_error) → Permission error - return to user +404 (not_found_error) → Not found - wait 30s, switch +413 (request_too_large) → User error - return to user +429 (rate_limit_error) → Rate limit - wait 30s, switch +500 (api_error) → Server error - wait 30s, switch +529 (overloaded_error) → Overloaded - wait 30s, switch +``` + +#### OpenRouter +``` +400 (bad_request) → User error - return to user +401 (invalid_credentials) → Auth error - immediate switch +402 (insufficient_credits) → Billing error - immediate switch +403 (moderation_flagged) → Permission - return to user +408 (timeout) → Timeout - wait 30s, switch +429 (rate_limited) → Rate limit - wait 30s, switch +502 (model_down) → Model down - wait 30s, switch +503 (no_providers) → No providers - immediate switch +``` + +#### Chutes AI +``` +MODEL_LOADING_FAILED → Transient - wait 30s, switch +INFERENCE_TIMEOUT → Timeout - wait 30s, switch +OUT_OF_MEMORY → Transient - wait 30s, switch +INVALID_INPUT → User error - return to user +MODEL_OVERLOADED → Overloaded - wait 30s, switch +GENERATION_FAILED → Transient - wait 30s, switch +CONTEXT_LENGTH_EXCEEDED → User error - return to user +400 → Bad request - return to user +429 → Rate limit - wait 30s, switch +500+ → Server error - wait 30s, switch +``` + +#### NVIDIA NIM +``` +401 → Auth error - immediate switch +403 → Permission - return to user +404 → Not found - wait 30s, switch +429 (too_many_requests) → Rate limit - wait 30s, switch +500+ → Server error - wait 30s, switch +``` + +#### Together AI +``` +400 (invalid_request) → User error - return to user +401 (authentication_error) → Auth error - immediate switch +402 (payment_required) → Billing error - immediate switch +403 (bad_request) → User error - return to user +429 (rate_limit_exceeded) → Rate limit - wait 30s, switch +500+ → Server error - wait 30s, switch +``` + +#### Fireworks AI +``` +400 → User error - return to user +401 → Auth error - immediate switch +429 → Rate limit - wait 30s, switch +500+ → Server error - wait 30s, switch +``` + +#### Mistral +``` +400 → Bad request - return to user +401 → Unauthorized - immediate switch +403 → Forbidden - return to user +404 → Not found - wait 30s, switch +429 → Too many requests - wait 30s, switch +500+ → Server error - wait 30s, switch +``` + +#### Groq +``` +400 → Bad request - return to user +401 → Unauthorized - immediate switch +402 → Payment required - immediate switch +403 → Forbidden - return to user +404 → Not found - wait 30s, switch +413 → Payload too large - return to user +429 → Rate limit - wait 30s, switch +500+ → Server error - wait 30s, switch +``` + +#### Google (Gemini) +``` +400 → Invalid request - return to user +401 → Unauthorized - immediate switch +403 → Permission denied - return to user +404 → Not found - wait 30s, switch +413 → Request too large - return to user +429 (resource_exhausted) → Rate limit - wait 30s, switch +500+ → Server error - wait 30s, switch +``` + +--- + +## Implementation Plan + +### Phase 1: Error Classification System + +**File: `chat/server.js`** + +Create new function `classifyProviderError()`: + +```javascript +function classifyProviderError(error, provider) { + // Extract HTTP status code + const statusCode = error.statusCode || error.code; + const errorMessage = (error.message || '').toLowerCase(); + + const providerPatterns = { + openai: { + transient: [500, 502, 503, 504, 529], + rateLimit: 429, + auth: [401, 402], + permission: 403, + userError: [400], + notFound: 404, + timeout: 408 + }, + anthropic: { + transient: [500, 529], + rateLimit: 429, + auth: 401, + permission: 403, + userError: [400, 413], + notFound: 404 + }, + openrouter: { + transient: [502, 503], + rateLimit: 429, + auth: [401, 402], + permission: 403, + userError: [400], + timeout: 408, + notFound: 404 + }, + chutes: { + transient: [500, 502, 503], + rateLimit: 429, + auth: 401, + permission: 403, + userError: [400, 413], + notFound: 404 + }, + nvidia: { + transient: [500, 502, 503], + rateLimit: 429, + auth: 401, + permission: 403, + userError: [400], + notFound: 404 + }, + together: { + transient: [500, 502, 503], + rateLimit: 429, + auth: [401, 402], + permission: 403, + userError: [400], + notFound: 404 + }, + fireworks: { + transient: [500, 502, 503], + rateLimit: 429, + auth: 401, + userError: [400], + notFound: 404 + }, + mistral: { + transient: [500, 502, 503], + rateLimit: 429, + auth: 401, + permission: 403, + userError: [400], + notFound: 404 + }, + groq: { + transient: [500, 502, 503], + rateLimit: 429, + auth: [401, 402], + permission: 403, + userError: [400, 413], + notFound: 404 + }, + google: { + transient: [500, 502, 503], + rateLimit: 429, + auth: 401, + permission: 403, + userError: [400, 413], + notFound: 404 + }, + default: { + transient: [500, 502, 503, 529], + rateLimit: 429, + auth: [401, 402], + permission: 403, + userError: [400, 413], + notFound: 404 + } + }; + + const patterns = providerPatterns[provider] || providerPatterns.default; + + // Check for tool errors first (shouldn't happen here but just in case) + if (error.isToolError) { + return { category: 'toolError', action: 'return', waitTime: 0 }; + } + + // Determine category based on status code + if (patterns.transient?.includes(statusCode)) { + return { category: 'transient', action: 'wait', waitTime: 30000 }; + } + if (statusCode === patterns.rateLimit) { + return { category: 'rateLimit', action: 'wait', waitTime: 30000 }; + } + if (patterns.auth?.includes(statusCode)) { + return { category: 'auth', action: 'switch', waitTime: 0 }; + } + if (statusCode === patterns.permission) { + return { category: 'permission', action: 'return', waitTime: 0 }; + } + if (patterns.userError?.includes(statusCode)) { + return { category: 'userError', action: 'return', waitTime: 0 }; + } + if (statusCode === patterns.timeout) { + return { category: 'timeout', action: 'wait', waitTime: 30000 }; + } + if (statusCode === patterns.notFound) { + // Special case: OpenAI treats 404 as retryable + return { category: 'notFound', action: 'wait', waitTime: 30000 }; + } + + // Default to transient for 5xx + if (statusCode >= 500) { + return { category: 'serverError', action: 'wait', waitTime: 30000 }; + } + + // Check error message for additional patterns + if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) { + return { category: 'modelNotFound', action: 'wait', waitTime: 30000 }; + } + if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) { + return { category: 'billing', action: 'switch', waitTime: 0 }; + } + if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) { + return { category: 'userError', action: 'return', waitTime: 0 }; + } + + // Unknown error - switch immediately + return { category: 'unknown', action: 'switch', waitTime: 0 }; +} +``` + +### Phase 2: Tool Error Handling + +**File: `opencode/packages/opencode/src/session/processor.ts`** + +Enhance tool-error handling to distinguish between tool error types: + +```typescript +// Add tool error type classification +enum ToolErrorType { + validation = 'validation', + permission = 'permission', + timeout = 'timeout', + notFound = 'notFound', + execution = 'execution' +} + +function classifyToolError(error: unknown): ToolErrorType { + const message = String(error).toLowerCase(); + + if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) { + return ToolErrorType.validation; + } + if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) { + return ToolErrorType.permission; + } + if (message.includes('timeout') || message.includes('timed out')) { + return ToolErrorType.timeout; + } + if (message.includes('not found') || message.includes('does not exist')) { + return ToolErrorType.notFound; + } + return ToolErrorType.execution; +} + +// In the switch case for "tool-error" +case "tool-error": { + const match = toolcalls[value.toolCallId]; + if (match && match.state.status === "running") { + await Session.updatePart({ + ...match, + state: { + status: "error", + input: value.input ?? match.state.input, + error: (value.error as any).toString(), + errorType: classifyToolError(value.error), + time: { + start: match.state.time.start, + end: Date.now(), + }, + }, + }) + + // Don't trigger fallback for tool errors - let model retry + // Only trigger fallback for permission rejections + if ( + value.error instanceof PermissionNext.RejectedError || + value.error instanceof Question.RejectedError + ) { + blocked = shouldBreak + } + + // Mark that this was a tool error (not provider error) + (value.error as any).isToolError = true; + + delete toolcalls[value.toolCallId] + } + break; +} +``` + +**File: `chat/server.js`** + +Modify `shouldFallbackCliError()` to check for tool errors: + +```javascript +function shouldFallbackCliError(err, message) { + if (!err) return false; + + // Don't fallback on tool errors - let model retry + if (err.isToolError) { + log('Tool error detected - no fallback needed', { + error: err.message, + toolError: true + }); + return false; + } + + // ... rest of existing checks +} +``` + +### Phase 3: Continue Message System + +Enhance `sendToOpencodeWithFallback()` with continue message tracking and provider error handling: + +```javascript +async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) { + const cliName = normalizeCli(cli || session?.cli); + const preferredModel = model || session?.model; + const chain = buildOpencodeAttemptChain(cliName, preferredModel); + const tried = new Set(); + const attempts = []; + let lastError = null; + let switchedToBackup = false; + + // Track continue attempts per model + const continueAttempts = new Map(); + const MAX_CONTINUE_ATTEMPTS = 3; + const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.'; + + // Track last error type to prevent infinite loops + const lastErrorTypes = new Map(); + + log('Fallback sequence initiated', { + sessionId: session?.id, + messageId: message?.id, + primaryModel: preferredModel, + cliName, + chainLength: chain.length, + timestamp: new Date().toISOString() + }); + + const tryOption = async (option, isBackup = false) => { + const key = `${option.provider}:${option.model}`; + if (tried.has(key)) return null; + tried.add(key); + + const limit = isProviderLimited(option.provider, option.model); + if (limit.limited) { + attempts.push({ + model: option.model, + provider: option.provider, + error: `limit: ${limit.reason}`, + classification: 'rateLimit' + }); + return null; + } + + try { + resetMessageStreamingFields(message); + + // Handle continue messages + let messageContent = content; + const modelKey = `${option.provider}:${option.model}`; + const continueCount = continueAttempts.get(modelKey) || 0; + + if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) { + messageContent = `${CONTINUE_MESSAGE}\n\n${content}`; + log('Sending continue message', { + model: option.model, + provider: option.provider, + attempt: continueCount, + modelKey + }); + } + + const result = await sendToOpencode({ + session, + model: option.model, + content: messageContent, + message, + cli: cliName, + streamCallback, + opencodeSessionId + }); + + const normalizedResult = (result && typeof result === 'object') ? result : { reply: result }; + + // Token usage tracking (existing code) + let tokensUsed = 0; + let tokenSource = 'none'; + let tokenExtractionLog = []; + + if (result && typeof result === 'object' && result.tokensUsed > 0) { + tokensUsed = result.tokensUsed; + tokenSource = result.tokenSource || 'result'; + tokenExtractionLog = result.tokenExtractionLog || []; + } else { + tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false }); + if (tokensUsed > 0) { + tokenSource = 'response-extracted'; + tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed }); + } + } + + // Success: reset counters + continueAttempts.delete(modelKey); + lastErrorTypes.delete(modelKey); + + recordProviderUsage(option.provider, option.model, tokensUsed, 1); + + if (attempts.length) { + log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider }); + } + + return { + reply: normalizedResult.reply, + model: option.model, + attempts, + provider: option.provider, + raw: normalizedResult.raw, + tokensUsed, + tokenSource, + tokenExtractionLog + }; + + } catch (err) { + lastError = err; + + const errorData = { + model: option.model, + provider: option.provider, + error: err.message || String(err), + code: err.code || null, + timestamp: new Date().toISOString() + }; + + // Check for early termination + if (err.earlyTermination) { + const partialOutputLength = (message?.partialOutput || '').length; + const hasSubstantialOutput = partialOutputLength > 500; + + if (hasSubstantialOutput) { + log('Blocking fallback - model has substantial output despite early termination', { + model: option.model, + provider: option.provider, + error: err.message, + partialOutputLength + }); + return err; + } + + // Increment continue counter + const modelKey = `${option.provider}:${option.model}`; + const currentCount = continueAttempts.get(modelKey) || 0; + continueAttempts.set(modelKey, currentCount + 1); + + log('Early termination detected', { + model: option.model, + provider: option.provider, + continueAttempt: currentCount + 1, + maxAttempts: MAX_CONTINUE_ATTEMPTS + }); + + // Retry with same model if under limit + if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) { + errorData.earlyTermination = true; + errorData.continueAttempt = currentCount + 1; + errorData.willContinue = true; + attempts.push(errorData); + + // Remove from tried set to allow retry with same option + tried.delete(key); + + return null; + } + + // Switch to next model after MAX_CONTINUE_ATTEMPTS + log('Max continue attempts reached, switching model', { + model: option.model, + provider: option.provider, + totalAttempts: MAX_CONTINUE_ATTEMPTS + }); + + attempts.push(errorData); + return null; + } + + // Classify provider error + const classification = classifyProviderError(err, option.provider); + errorData.classification = classification.category; + + // Track error types to prevent infinite loops + const modelKey = `${option.provider}:${option.model}`; + const lastErrorType = lastErrorTypes.get(modelKey); + + if (lastErrorType === classification.category && + classification.category !== 'unknown') { + // Same error type twice in a row - might be persistent error + log('Repeated error type detected, may need immediate switch', { + model: option.model, + provider: option.provider, + errorType: classification.category + }); + lastErrorTypes.set(modelKey, classification.category); + } + + if (classification.action === 'return') { + // User/permission errors - return to user + log('User/permission error - returning to user', { + category: classification.category, + model: option.model, + provider: option.provider + }); + err.willNotFallback = true; + return err; + } + + if (classification.action === 'wait') { + // Transient/rate limit errors - wait before switch + log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, { + model: option.model, + provider: option.provider, + category: classification.category, + waitTime: classification.waitTime + }); + + errorData.willWait = true; + errorData.waitTime = classification.waitTime; + attempts.push(errorData); + + // Wait before allowing next attempt + await new Promise(resolve => setTimeout(resolve, classification.waitTime)); + + return null; + } + + // Switch immediately for auth/unknown errors + errorData.immediateSwitch = true; + attempts.push(errorData); + + return null; + } + }; + + // Try each option in chain + for (const option of chain) { + const result = await tryOption(option); + if (result instanceof Error) break; + if (result) return result; + } + + // Try backup model if configured + const backupModel = (providerLimits.opencodeBackupModel || '').trim(); + if (backupModel) { + const backupChain = buildOpencodeAttemptChain(cliName, backupModel); + for (const option of backupChain) { + const result = await tryOption(option, true); + if (result instanceof Error) break; + if (result) return result; + } + } + + const err = new Error(`All ${cliName.toUpperCase()} models failed`); + err.attempts = attempts; + err.cause = lastError; + throw err; +} +``` + +--- + +## Summary of Files Modified + +1. **`MODEL_FALLBACK_IMPROVEMENT_PLAN.md`** (NEW) - This comprehensive plan document +2. **`chat/server.js`** - Major modifications: + - Add `classifyProviderError()` function (~120 lines) + - Modify `sendToOpencodeWithFallback()` with continue message logic + - Update `shouldFallbackCliError()` to handle tool errors + +3. **`opencode/packages/opencode/src/session/processor.ts`** - Minor modifications: + - Add `classifyToolError()` function + - Enhance tool-error case with error type classification + +--- + +## Testing Checklist + +- [ ] Tool errors don't trigger fallback +- [ ] Early termination sends continue messages (max 3 attempts) +- [ ] Provider errors with >=500 status wait 30s before switch +- [ ] Rate limit errors (429) wait 30s before switch +- [ ] Auth errors (401, 402) switch immediately +- [ ] Permission errors (403) return to user without switch +- [ ] User errors (400, 413) return to user without switch +- [ ] Continue attempts reset on successful response +- [ ] Fallback chain respects continue attempts per model +- [ ] Logging captures all error classifications and actions + +--- + +## Monitoring & Analytics + +Add tracking for: +1. Error type distribution +2. Continue message frequency +3. Provider error wait times +4. Model switch patterns +5. Tool error vs provider error ratios + +Export to monitoring system for analysis and optimization.