Files

southseact-3d 2dc94310a6 Add comprehensive model fallback improvement plan

- Detailed error classification system for tool errors, early termination, and provider errors
- Provider-specific error mappings for 11 LLM providers (OpenAI, Anthropic, OpenRouter, Chutes, NVIDIA, Together, Fireworks, Mistral, Groq, Google)
- Continue message system with 3-attempt limit before model switch
- 30-second wait for transient/rate limit errors before switching
- Distinguishes tool errors (return to user) from provider errors (switch model)
- Implementation plan with code examples for server.js and processor.ts

2026-02-08 14:16:32 +00:00

23 KiB

Raw Blame History

Model Fallback & Continue Functionality Improvement Plan

Executive Summary

This plan outlines improvements to the model fallback system to handle three distinct error categories:

Bad tool calls - Send error data back to user/model for retry
Request stops - Send continue message, retry 3x, then switch model
Provider errors - Wait 30 seconds, then switch to next model in fallback chain

Current Implementation Analysis

Existing Components

File	Function	Current Behavior
`chat/server.js:9347`	`shouldFallbackCliError()`	Decides if error warrants fallback based on patterns
`chat/server.js:9502`	`sendToOpencodeWithFallback()`	Main fallback orchestration with model chain
`chat/server.js:9296`	`buildOpencodeAttemptChain()`	Builds ordered provider/model chain
`opencode/session/retry.ts`	`SessionRetry.delay()`	Calculates retry delays with exponential backoff
`opencode/session/message-v2.ts:714`	`isOpenAiErrorRetryable()`	Determines if OpenAI errors are retryable

Current Gaps

No explicit continue mechanism - Early terminations counted but no "continue" message system
Tool errors not distinguished - Tool errors treated same as provider errors
No 30-second wait for provider errors - Immediate fallback on provider issues
Missing provider-specific error mappings - Generic patterns only

Proposed Architecture

Error Classification Flow

Error Occurs
    │
    ├── Tool Call Error?
    │   ├── Yes → Check tool error type
    │   │   ├── Validation/Schema error → Send error back, continue with same model
    │   │   ├── Permission denied → Send error back, continue with same model
    │   │   ├── Execution failure → Send error back, continue with same model
    │   │   └── Tool timeout → Send error back, continue with same model
    │
    ├── Early Termination?
    │   ├── Yes → Increment termination counter
    │   │   ├── Count < 3? → Send "continue" message, retry same model
    │   │   └── Count >= 3? → Switch to next model
    │
    └── Provider Error?
        ├── Yes → Classify error type
        │   ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
        │   ├── Rate limit (429) → Wait 30s, switch model
        │   ├── Auth/Billing (401, 402, 403) → Switch model immediately
        │   ├── User error (400, 413) → Send error back, don't switch
        │   └── Other (404, etc.) → Wait 30s, switch model

Provider-Specific Error Mappings

Error Categories & Actions

Category	Action	Wait Time	Switch Model?
Tool Error (validation/schema)	Return to user	0s	No
Tool Error (execution/permission)	Return to user	0s	No
Early Termination	Send continue	0s	After 3 attempts
Transient Server Error (5xx)	Wait	30s	Yes
Rate Limit (429)	Wait	30s	Yes
Auth/Billing (401, 402)	Switch immediately	0s	Yes
Permission (403)	Return to user	0s	No
Not Found (404)	Wait	30s	Yes
User Error (400, 413)	Return to user	0s	No
Timeout (408)	Wait	30s	Yes
Overloaded (529)	Wait	30s	Yes

Detailed Provider Error Codes

OpenAI

400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch  
403 (permission_error) → Permission error - return to user
404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 → Overloaded - wait 30s, switch

Anthropic (Claude)

400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
403 (permission_error) → Permission error - return to user
404 (not_found_error) → Not found - wait 30s, switch
413 (request_too_large) → User error - return to user
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 (overloaded_error) → Overloaded - wait 30s, switch

OpenRouter

400 (bad_request) → User error - return to user
401 (invalid_credentials) → Auth error - immediate switch
402 (insufficient_credits) → Billing error - immediate switch
403 (moderation_flagged) → Permission - return to user
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limited) → Rate limit - wait 30s, switch
502 (model_down) → Model down - wait 30s, switch
503 (no_providers) → No providers - immediate switch

Chutes AI

MODEL_LOADING_FAILED → Transient - wait 30s, switch
INFERENCE_TIMEOUT → Timeout - wait 30s, switch
OUT_OF_MEMORY → Transient - wait 30s, switch
INVALID_INPUT → User error - return to user
MODEL_OVERLOADED → Overloaded - wait 30s, switch
GENERATION_FAILED → Transient - wait 30s, switch
CONTEXT_LENGTH_EXCEEDED → User error - return to user
400 → Bad request - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

NVIDIA NIM

401 → Auth error - immediate switch
403 → Permission - return to user
404 → Not found - wait 30s, switch
429 (too_many_requests) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Together AI

400 (invalid_request) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch
403 (bad_request) → User error - return to user
429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Fireworks AI

400 → User error - return to user
401 → Auth error - immediate switch
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Mistral

400 → Bad request - return to user
401 → Unauthorized - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
429 → Too many requests - wait 30s, switch
500+ → Server error - wait 30s, switch

Groq

400 → Bad request - return to user
401 → Unauthorized - immediate switch
402 → Payment required - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
413 → Payload too large - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Google (Gemini)

400 → Invalid request - return to user
401 → Unauthorized - immediate switch
403 → Permission denied - return to user
404 → Not found - wait 30s, switch
413 → Request too large - return to user
429 (resource_exhausted) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Implementation Plan

Phase 1: Error Classification System

File: chat/server.js

Create new function classifyProviderError():

function classifyProviderError(error, provider) {
  // Extract HTTP status code
  const statusCode = error.statusCode || error.code;
  const errorMessage = (error.message || '').toLowerCase();
  
  const providerPatterns = {
    openai: {
      transient: [500, 502, 503, 504, 529],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      notFound: 404,
      timeout: 408
    },
    anthropic: {
      transient: [500, 529],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    openrouter: {
      transient: [502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      timeout: 408,
      notFound: 404
    },
    chutes: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    nvidia: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400],
      notFound: 404
    },
    together: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      notFound: 404
    },
    fireworks: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      userError: [400],
      notFound: 404
    },
    mistral: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400],
      notFound: 404
    },
    groq: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    google: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    default: {
      transient: [500, 502, 503, 529],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400, 413],
      notFound: 404
    }
  };
  
  const patterns = providerPatterns[provider] || providerPatterns.default;
  
  // Check for tool errors first (shouldn't happen here but just in case)
  if (error.isToolError) {
    return { category: 'toolError', action: 'return', waitTime: 0 };
  }
  
  // Determine category based on status code
  if (patterns.transient?.includes(statusCode)) {
    return { category: 'transient', action: 'wait', waitTime: 30000 };
  }
  if (statusCode === patterns.rateLimit) {
    return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
  }
  if (patterns.auth?.includes(statusCode)) {
    return { category: 'auth', action: 'switch', waitTime: 0 };
  }
  if (statusCode === patterns.permission) {
    return { category: 'permission', action: 'return', waitTime: 0 };
  }
  if (patterns.userError?.includes(statusCode)) {
    return { category: 'userError', action: 'return', waitTime: 0 };
  }
  if (statusCode === patterns.timeout) {
    return { category: 'timeout', action: 'wait', waitTime: 30000 };
  }
  if (statusCode === patterns.notFound) {
    // Special case: OpenAI treats 404 as retryable
    return { category: 'notFound', action: 'wait', waitTime: 30000 };
  }
  
  // Default to transient for 5xx
  if (statusCode >= 500) {
    return { category: 'serverError', action: 'wait', waitTime: 30000 };
  }
  
  // Check error message for additional patterns
  if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
    return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
  }
  if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
    return { category: 'billing', action: 'switch', waitTime: 0 };
  }
  if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
    return { category: 'userError', action: 'return', waitTime: 0 };
  }
  
  // Unknown error - switch immediately
  return { category: 'unknown', action: 'switch', waitTime: 0 };
}

Phase 2: Tool Error Handling

File: opencode/packages/opencode/src/session/processor.ts

Enhance tool-error handling to distinguish between tool error types:

// Add tool error type classification
enum ToolErrorType {
  validation = 'validation',
  permission = 'permission',
  timeout = 'timeout',
  notFound = 'notFound',
  execution = 'execution'
}

function classifyToolError(error: unknown): ToolErrorType {
  const message = String(error).toLowerCase();
  
  if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
    return ToolErrorType.validation;
  }
  if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
    return ToolErrorType.permission;
  }
  if (message.includes('timeout') || message.includes('timed out')) {
    return ToolErrorType.timeout;
  }
  if (message.includes('not found') || message.includes('does not exist')) {
    return ToolErrorType.notFound;
  }
  return ToolErrorType.execution;
}

// In the switch case for "tool-error"
case "tool-error": {
  const match = toolcalls[value.toolCallId];
  if (match && match.state.status === "running") {
    await Session.updatePart({
      ...match,
      state: {
        status: "error",
        input: value.input ?? match.state.input,
        error: (value.error as any).toString(),
        errorType: classifyToolError(value.error),
        time: {
          start: match.state.time.start,
          end: Date.now(),
        },
      },
    })

    // Don't trigger fallback for tool errors - let model retry
    // Only trigger fallback for permission rejections
    if (
      value.error instanceof PermissionNext.RejectedError ||
      value.error instanceof Question.RejectedError
    ) {
      blocked = shouldBreak
    }
    
    // Mark that this was a tool error (not provider error)
    (value.error as any).isToolError = true;
    
    delete toolcalls[value.toolCallId]
  }
  break;
}