Files
shopify-ai-backup/MODEL_FALLBACK_IMPROVEMENT_PLAN.md
southseact-3d 2dc94310a6 Add comprehensive model fallback improvement plan
- Detailed error classification system for tool errors, early termination, and provider errors
- Provider-specific error mappings for 11 LLM providers (OpenAI, Anthropic, OpenRouter, Chutes, NVIDIA, Together, Fireworks, Mistral, Groq, Google)
- Continue message system with 3-attempt limit before model switch
- 30-second wait for transient/rate limit errors before switching
- Distinguishes tool errors (return to user) from provider errors (switch model)
- Implementation plan with code examples for server.js and processor.ts
2026-02-08 14:16:32 +00:00

23 KiB

Model Fallback & Continue Functionality Improvement Plan

Executive Summary

This plan outlines improvements to the model fallback system to handle three distinct error categories:

  1. Bad tool calls - Send error data back to user/model for retry
  2. Request stops - Send continue message, retry 3x, then switch model
  3. Provider errors - Wait 30 seconds, then switch to next model in fallback chain

Current Implementation Analysis

Existing Components

File Function Current Behavior
chat/server.js:9347 shouldFallbackCliError() Decides if error warrants fallback based on patterns
chat/server.js:9502 sendToOpencodeWithFallback() Main fallback orchestration with model chain
chat/server.js:9296 buildOpencodeAttemptChain() Builds ordered provider/model chain
opencode/session/retry.ts SessionRetry.delay() Calculates retry delays with exponential backoff
opencode/session/message-v2.ts:714 isOpenAiErrorRetryable() Determines if OpenAI errors are retryable

Current Gaps

  1. No explicit continue mechanism - Early terminations counted but no "continue" message system
  2. Tool errors not distinguished - Tool errors treated same as provider errors
  3. No 30-second wait for provider errors - Immediate fallback on provider issues
  4. Missing provider-specific error mappings - Generic patterns only

Proposed Architecture

Error Classification Flow

Error Occurs
    │
    ├── Tool Call Error?
    │   ├── Yes → Check tool error type
    │   │   ├── Validation/Schema error → Send error back, continue with same model
    │   │   ├── Permission denied → Send error back, continue with same model
    │   │   ├── Execution failure → Send error back, continue with same model
    │   │   └── Tool timeout → Send error back, continue with same model
    │
    ├── Early Termination?
    │   ├── Yes → Increment termination counter
    │   │   ├── Count < 3? → Send "continue" message, retry same model
    │   │   └── Count >= 3? → Switch to next model
    │
    └── Provider Error?
        ├── Yes → Classify error type
        │   ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
        │   ├── Rate limit (429) → Wait 30s, switch model
        │   ├── Auth/Billing (401, 402, 403) → Switch model immediately
        │   ├── User error (400, 413) → Send error back, don't switch
        │   └── Other (404, etc.) → Wait 30s, switch model

Provider-Specific Error Mappings

Error Categories & Actions

Category Action Wait Time Switch Model?
Tool Error (validation/schema) Return to user 0s No
Tool Error (execution/permission) Return to user 0s No
Early Termination Send continue 0s After 3 attempts
Transient Server Error (5xx) Wait 30s Yes
Rate Limit (429) Wait 30s Yes
Auth/Billing (401, 402) Switch immediately 0s Yes
Permission (403) Return to user 0s No
Not Found (404) Wait 30s Yes
User Error (400, 413) Return to user 0s No
Timeout (408) Wait 30s Yes
Overloaded (529) Wait 30s Yes

Detailed Provider Error Codes

OpenAI

400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch  
403 (permission_error) → Permission error - return to user
404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 → Overloaded - wait 30s, switch

Anthropic (Claude)

400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
403 (permission_error) → Permission error - return to user
404 (not_found_error) → Not found - wait 30s, switch
413 (request_too_large) → User error - return to user
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 (overloaded_error) → Overloaded - wait 30s, switch

OpenRouter

400 (bad_request) → User error - return to user
401 (invalid_credentials) → Auth error - immediate switch
402 (insufficient_credits) → Billing error - immediate switch
403 (moderation_flagged) → Permission - return to user
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limited) → Rate limit - wait 30s, switch
502 (model_down) → Model down - wait 30s, switch
503 (no_providers) → No providers - immediate switch

Chutes AI

MODEL_LOADING_FAILED → Transient - wait 30s, switch
INFERENCE_TIMEOUT → Timeout - wait 30s, switch
OUT_OF_MEMORY → Transient - wait 30s, switch
INVALID_INPUT → User error - return to user
MODEL_OVERLOADED → Overloaded - wait 30s, switch
GENERATION_FAILED → Transient - wait 30s, switch
CONTEXT_LENGTH_EXCEEDED → User error - return to user
400 → Bad request - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

NVIDIA NIM

401 → Auth error - immediate switch
403 → Permission - return to user
404 → Not found - wait 30s, switch
429 (too_many_requests) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Together AI

400 (invalid_request) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch
403 (bad_request) → User error - return to user
429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Fireworks AI

400 → User error - return to user
401 → Auth error - immediate switch
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Mistral

400 → Bad request - return to user
401 → Unauthorized - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
429 → Too many requests - wait 30s, switch
500+ → Server error - wait 30s, switch

Groq

400 → Bad request - return to user
401 → Unauthorized - immediate switch
402 → Payment required - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
413 → Payload too large - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Google (Gemini)

400 → Invalid request - return to user
401 → Unauthorized - immediate switch
403 → Permission denied - return to user
404 → Not found - wait 30s, switch
413 → Request too large - return to user
429 (resource_exhausted) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch

Implementation Plan

Phase 1: Error Classification System

File: chat/server.js

Create new function classifyProviderError():

function classifyProviderError(error, provider) {
  // Extract HTTP status code
  const statusCode = error.statusCode || error.code;
  const errorMessage = (error.message || '').toLowerCase();
  
  const providerPatterns = {
    openai: {
      transient: [500, 502, 503, 504, 529],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      notFound: 404,
      timeout: 408
    },
    anthropic: {
      transient: [500, 529],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    openrouter: {
      transient: [502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      timeout: 408,
      notFound: 404
    },
    chutes: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    nvidia: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400],
      notFound: 404
    },
    together: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      notFound: 404
    },
    fireworks: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      userError: [400],
      notFound: 404
    },
    mistral: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400],
      notFound: 404
    },
    groq: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    google: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    default: {
      transient: [500, 502, 503, 529],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400, 413],
      notFound: 404
    }
  };
  
  const patterns = providerPatterns[provider] || providerPatterns.default;
  
  // Check for tool errors first (shouldn't happen here but just in case)
  if (error.isToolError) {
    return { category: 'toolError', action: 'return', waitTime: 0 };
  }
  
  // Determine category based on status code
  if (patterns.transient?.includes(statusCode)) {
    return { category: 'transient', action: 'wait', waitTime: 30000 };
  }
  if (statusCode === patterns.rateLimit) {
    return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
  }
  if (patterns.auth?.includes(statusCode)) {
    return { category: 'auth', action: 'switch', waitTime: 0 };
  }
  if (statusCode === patterns.permission) {
    return { category: 'permission', action: 'return', waitTime: 0 };
  }
  if (patterns.userError?.includes(statusCode)) {
    return { category: 'userError', action: 'return', waitTime: 0 };
  }
  if (statusCode === patterns.timeout) {
    return { category: 'timeout', action: 'wait', waitTime: 30000 };
  }
  if (statusCode === patterns.notFound) {
    // Special case: OpenAI treats 404 as retryable
    return { category: 'notFound', action: 'wait', waitTime: 30000 };
  }
  
  // Default to transient for 5xx
  if (statusCode >= 500) {
    return { category: 'serverError', action: 'wait', waitTime: 30000 };
  }
  
  // Check error message for additional patterns
  if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
    return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
  }
  if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
    return { category: 'billing', action: 'switch', waitTime: 0 };
  }
  if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
    return { category: 'userError', action: 'return', waitTime: 0 };
  }
  
  // Unknown error - switch immediately
  return { category: 'unknown', action: 'switch', waitTime: 0 };
}

Phase 2: Tool Error Handling

File: opencode/packages/opencode/src/session/processor.ts

Enhance tool-error handling to distinguish between tool error types:

// Add tool error type classification
enum ToolErrorType {
  validation = 'validation',
  permission = 'permission',
  timeout = 'timeout',
  notFound = 'notFound',
  execution = 'execution'
}

function classifyToolError(error: unknown): ToolErrorType {
  const message = String(error).toLowerCase();
  
  if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
    return ToolErrorType.validation;
  }
  if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
    return ToolErrorType.permission;
  }
  if (message.includes('timeout') || message.includes('timed out')) {
    return ToolErrorType.timeout;
  }
  if (message.includes('not found') || message.includes('does not exist')) {
    return ToolErrorType.notFound;
  }
  return ToolErrorType.execution;
}

// In the switch case for "tool-error"
case "tool-error": {
  const match = toolcalls[value.toolCallId];
  if (match && match.state.status === "running") {
    await Session.updatePart({
      ...match,
      state: {
        status: "error",
        input: value.input ?? match.state.input,
        error: (value.error as any).toString(),
        errorType: classifyToolError(value.error),
        time: {
          start: match.state.time.start,
          end: Date.now(),
        },
      },
    })

    // Don't trigger fallback for tool errors - let model retry
    // Only trigger fallback for permission rejections
    if (
      value.error instanceof PermissionNext.RejectedError ||
      value.error instanceof Question.RejectedError
    ) {
      blocked = shouldBreak
    }
    
    // Mark that this was a tool error (not provider error)
    (value.error as any).isToolError = true;
    
    delete toolcalls[value.toolCallId]
  }
  break;
}

File: chat/server.js

Modify shouldFallbackCliError() to check for tool errors:

function shouldFallbackCliError(err, message) {
  if (!err) return false;
  
  // Don't fallback on tool errors - let model retry
  if (err.isToolError) {
    log('Tool error detected - no fallback needed', {
      error: err.message,
      toolError: true
    });
    return false;
  }
  
  // ... rest of existing checks
}

Phase 3: Continue Message System

Enhance sendToOpencodeWithFallback() with continue message tracking and provider error handling:

async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) {
  const cliName = normalizeCli(cli || session?.cli);
  const preferredModel = model || session?.model;
  const chain = buildOpencodeAttemptChain(cliName, preferredModel);
  const tried = new Set();
  const attempts = [];
  let lastError = null;
  let switchedToBackup = false;
  
  // Track continue attempts per model
  const continueAttempts = new Map();
  const MAX_CONTINUE_ATTEMPTS = 3;
  const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.';
  
  // Track last error type to prevent infinite loops
  const lastErrorTypes = new Map();
  
  log('Fallback sequence initiated', {
    sessionId: session?.id,
    messageId: message?.id,
    primaryModel: preferredModel,
    cliName,
    chainLength: chain.length,
    timestamp: new Date().toISOString()
  });

  const tryOption = async (option, isBackup = false) => {
    const key = `${option.provider}:${option.model}`;
    if (tried.has(key)) return null;
    tried.add(key);
    
    const limit = isProviderLimited(option.provider, option.model);
    if (limit.limited) {
      attempts.push({ 
        model: option.model, 
        provider: option.provider, 
        error: `limit: ${limit.reason}`,
        classification: 'rateLimit'
      });
      return null;
    }
    
    try {
      resetMessageStreamingFields(message);
      
      // Handle continue messages
      let messageContent = content;
      const modelKey = `${option.provider}:${option.model}`;
      const continueCount = continueAttempts.get(modelKey) || 0;
      
      if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) {
        messageContent = `${CONTINUE_MESSAGE}\n\n${content}`;
        log('Sending continue message', {
          model: option.model,
          provider: option.provider,
          attempt: continueCount,
          modelKey
        });
      }
      
      const result = await sendToOpencode({ 
        session, 
        model: option.model, 
        content: messageContent, 
        message, 
        cli: cliName, 
        streamCallback, 
        opencodeSessionId 
      });
      
      const normalizedResult = (result && typeof result === 'object') ? result : { reply: result };
      
      // Token usage tracking (existing code)
      let tokensUsed = 0;
      let tokenSource = 'none';
      let tokenExtractionLog = [];
      
      if (result && typeof result === 'object' && result.tokensUsed > 0) {
        tokensUsed = result.tokensUsed;
        tokenSource = result.tokenSource || 'result';
        tokenExtractionLog = result.tokenExtractionLog || [];
      } else {
        tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false });
        if (tokensUsed > 0) {
          tokenSource = 'response-extracted';
          tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed });
        }
      }
      
      // Success: reset counters
      continueAttempts.delete(modelKey);
      lastErrorTypes.delete(modelKey);
      
      recordProviderUsage(option.provider, option.model, tokensUsed, 1);
      
      if (attempts.length) {
        log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider });
      }
      
      return { 
        reply: normalizedResult.reply, 
        model: option.model, 
        attempts, 
        provider: option.provider, 
        raw: normalizedResult.raw, 
        tokensUsed,
        tokenSource,
        tokenExtractionLog
      };
      
    } catch (err) {
      lastError = err;
      
      const errorData = {
        model: option.model,
        provider: option.provider,
        error: err.message || String(err),
        code: err.code || null,
        timestamp: new Date().toISOString()
      };
      
      // Check for early termination
      if (err.earlyTermination) {
        const partialOutputLength = (message?.partialOutput || '').length;
        const hasSubstantialOutput = partialOutputLength > 500;
        
        if (hasSubstantialOutput) {
          log('Blocking fallback - model has substantial output despite early termination', {
            model: option.model,
            provider: option.provider,
            error: err.message,
            partialOutputLength
          });
          return err;
        }
        
        // Increment continue counter
        const modelKey = `${option.provider}:${option.model}`;
        const currentCount = continueAttempts.get(modelKey) || 0;
        continueAttempts.set(modelKey, currentCount + 1);
        
        log('Early termination detected', {
          model: option.model,
          provider: option.provider,
          continueAttempt: currentCount + 1,
          maxAttempts: MAX_CONTINUE_ATTEMPTS
        });
        
        // Retry with same model if under limit
        if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) {
          errorData.earlyTermination = true;
          errorData.continueAttempt = currentCount + 1;
          errorData.willContinue = true;
          attempts.push(errorData);
          
          // Remove from tried set to allow retry with same option
          tried.delete(key);
          
          return null;
        }
        
        // Switch to next model after MAX_CONTINUE_ATTEMPTS
        log('Max continue attempts reached, switching model', {
          model: option.model,
          provider: option.provider,
          totalAttempts: MAX_CONTINUE_ATTEMPTS
        });
        
        attempts.push(errorData);
        return null;
      }
      
      // Classify provider error
      const classification = classifyProviderError(err, option.provider);
      errorData.classification = classification.category;
      
      // Track error types to prevent infinite loops
      const modelKey = `${option.provider}:${option.model}`;
      const lastErrorType = lastErrorTypes.get(modelKey);
      
      if (lastErrorType === classification.category && 
          classification.category !== 'unknown') {
        // Same error type twice in a row - might be persistent error
        log('Repeated error type detected, may need immediate switch', {
          model: option.model,
          provider: option.provider,
          errorType: classification.category
        });
        lastErrorTypes.set(modelKey, classification.category);
      }
      
      if (classification.action === 'return') {
        // User/permission errors - return to user
        log('User/permission error - returning to user', {
          category: classification.category,
          model: option.model,
          provider: option.provider
        });
        err.willNotFallback = true;
        return err;
      }
      
      if (classification.action === 'wait') {
        // Transient/rate limit errors - wait before switch
        log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, {
          model: option.model,
          provider: option.provider,
          category: classification.category,
          waitTime: classification.waitTime
        });
        
        errorData.willWait = true;
        errorData.waitTime = classification.waitTime;
        attempts.push(errorData);
        
        // Wait before allowing next attempt
        await new Promise(resolve => setTimeout(resolve, classification.waitTime));
        
        return null;
      }
      
      // Switch immediately for auth/unknown errors
      errorData.immediateSwitch = true;
      attempts.push(errorData);
      
      return null;
    }
  };
  
  // Try each option in chain
  for (const option of chain) {
    const result = await tryOption(option);
    if (result instanceof Error) break;
    if (result) return result;
  }
  
  // Try backup model if configured
  const backupModel = (providerLimits.opencodeBackupModel || '').trim();
  if (backupModel) {
    const backupChain = buildOpencodeAttemptChain(cliName, backupModel);
    for (const option of backupChain) {
      const result = await tryOption(option, true);
      if (result instanceof Error) break;
      if (result) return result;
    }
  }
  
  const err = new Error(`All ${cliName.toUpperCase()} models failed`);
  err.attempts = attempts;
  err.cause = lastError;
  throw err;
}

Summary of Files Modified

  1. MODEL_FALLBACK_IMPROVEMENT_PLAN.md (NEW) - This comprehensive plan document

  2. chat/server.js - Major modifications:

    • Add classifyProviderError() function (~120 lines)
    • Modify sendToOpencodeWithFallback() with continue message logic
    • Update shouldFallbackCliError() to handle tool errors
  3. opencode/packages/opencode/src/session/processor.ts - Minor modifications:

    • Add classifyToolError() function
    • Enhance tool-error case with error type classification

Testing Checklist

  • Tool errors don't trigger fallback
  • Early termination sends continue messages (max 3 attempts)
  • Provider errors with >=500 status wait 30s before switch
  • Rate limit errors (429) wait 30s before switch
  • Auth errors (401, 402) switch immediately
  • Permission errors (403) return to user without switch
  • User errors (400, 413) return to user without switch
  • Continue attempts reset on successful response
  • Fallback chain respects continue attempts per model
  • Logging captures all error classifications and actions

Monitoring & Analytics

Add tracking for:

  1. Error type distribution
  2. Continue message frequency
  3. Provider error wait times
  4. Model switch patterns
  5. Tool error vs provider error ratios

Export to monitoring system for analysis and optimization.