# Model Fallback & Continue Functionality Improvement Plan

## Executive Summary

This plan outlines improvements to the model fallback system to handle three distinct error categories:
1. **Bad tool calls** - Send error data back to user/model for retry
2. **Request stops** - Send continue message, retry 3x, then switch model
3. **Provider errors** - Wait 30 seconds, then switch to next model in fallback chain

---

## Current Implementation Analysis

### Existing Components

| File | Function | Current Behavior |
|------|-----------|-----------------|
| `chat/server.js:9347` | `shouldFallbackCliError()` | Decides if error warrants fallback based on patterns |
| `chat/server.js:9502` | `sendToOpencodeWithFallback()` | Main fallback orchestration with model chain |
| `chat/server.js:9296` | `buildOpencodeAttemptChain()` | Builds ordered provider/model chain |
| `opencode/session/retry.ts` | `SessionRetry.delay()` | Calculates retry delays with exponential backoff |
| `opencode/session/message-v2.ts:714` | `isOpenAiErrorRetryable()` | Determines if OpenAI errors are retryable |

### Current Gaps

1. **No explicit continue mechanism** - Early terminations counted but no "continue" message system
2. **Tool errors not distinguished** - Tool errors treated same as provider errors
3. **No 30-second wait for provider errors** - Immediate fallback on provider issues
4. **Missing provider-specific error mappings** - Generic patterns only

---

## Proposed Architecture

### Error Classification Flow

```
Error Occurs
    │
    ├── Tool Call Error?
    │   ├── Yes → Check tool error type
    │   │   ├── Validation/Schema error → Send error back, continue with same model
    │   │   ├── Permission denied → Send error back, continue with same model
    │   │   ├── Execution failure → Send error back, continue with same model
    │   │   └── Tool timeout → Send error back, continue with same model
    │
    ├── Early Termination?
    │   ├── Yes → Increment termination counter
    │   │   ├── Count < 3? → Send "continue" message, retry same model
    │   │   └── Count >= 3? → Switch to next model
    │
    └── Provider Error?
        ├── Yes → Classify error type
        │   ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
        │   ├── Rate limit (429) → Wait 30s, switch model
        │   ├── Auth/Billing (401, 402, 403) → Switch model immediately
        │   ├── User error (400, 413) → Send error back, don't switch
        │   └── Other (404, etc.) → Wait 30s, switch model
```

---

## Provider-Specific Error Mappings

### Error Categories & Actions

| Category | Action | Wait Time | Switch Model? |
|----------|--------|-----------|---------------|
| **Tool Error (validation/schema)** | Return to user | 0s | No |
| **Tool Error (execution/permission)** | Return to user | 0s | No |
| **Early Termination** | Send continue | 0s | After 3 attempts |
| **Transient Server Error (5xx)** | Wait | 30s | Yes |
| **Rate Limit (429)** | Wait | 30s | Yes |
| **Auth/Billing (401, 402)** | Switch immediately | 0s | Yes |
| **Permission (403)** | Return to user | 0s | No |
| **Not Found (404)** | Wait | 30s | Yes |
| **User Error (400, 413)** | Return to user | 0s | No |
| **Timeout (408)** | Wait | 30s | Yes |
| **Overloaded (529)** | Wait | 30s | Yes |

### Detailed Provider Error Codes

#### OpenAI
```
400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch  
403 (permission_error) → Permission error - return to user
404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 → Overloaded - wait 30s, switch
```

#### Anthropic (Claude)
```
400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
403 (permission_error) → Permission error - return to user
404 (not_found_error) → Not found - wait 30s, switch
413 (request_too_large) → User error - return to user
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 (overloaded_error) → Overloaded - wait 30s, switch
```

#### OpenRouter
```
400 (bad_request) → User error - return to user
401 (invalid_credentials) → Auth error - immediate switch
402 (insufficient_credits) → Billing error - immediate switch
403 (moderation_flagged) → Permission - return to user
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limited) → Rate limit - wait 30s, switch
502 (model_down) → Model down - wait 30s, switch
503 (no_providers) → No providers - immediate switch
```

#### Chutes AI
```
MODEL_LOADING_FAILED → Transient - wait 30s, switch
INFERENCE_TIMEOUT → Timeout - wait 30s, switch
OUT_OF_MEMORY → Transient - wait 30s, switch
INVALID_INPUT → User error - return to user
MODEL_OVERLOADED → Overloaded - wait 30s, switch
GENERATION_FAILED → Transient - wait 30s, switch
CONTEXT_LENGTH_EXCEEDED → User error - return to user
400 → Bad request - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```

#### NVIDIA NIM
```
401 → Auth error - immediate switch
403 → Permission - return to user
404 → Not found - wait 30s, switch
429 (too_many_requests) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```

#### Together AI
```
400 (invalid_request) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch
403 (bad_request) → User error - return to user
429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```

#### Fireworks AI
```
400 → User error - return to user
401 → Auth error - immediate switch
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```

#### Mistral
```
400 → Bad request - return to user
401 → Unauthorized - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
429 → Too many requests - wait 30s, switch
500+ → Server error - wait 30s, switch
```

#### Groq
```
400 → Bad request - return to user
401 → Unauthorized - immediate switch
402 → Payment required - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
413 → Payload too large - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```

#### Google (Gemini)
```
400 → Invalid request - return to user
401 → Unauthorized - immediate switch
403 → Permission denied - return to user
404 → Not found - wait 30s, switch
413 → Request too large - return to user
429 (resource_exhausted) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```

---

## Implementation Plan

### Phase 1: Error Classification System

**File: `chat/server.js`**

Create new function `classifyProviderError()`:

```javascript
function classifyProviderError(error, provider) {
  // Extract HTTP status code
  const statusCode = error.statusCode || error.code;
  const errorMessage = (error.message || '').toLowerCase();
  
  const providerPatterns = {
    openai: {
      transient: [500, 502, 503, 504, 529],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      notFound: 404,
      timeout: 408
    },
    anthropic: {
      transient: [500, 529],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    openrouter: {
      transient: [502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      timeout: 408,
      notFound: 404
    },
    chutes: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    nvidia: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400],
      notFound: 404
    },
    together: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400],
      notFound: 404
    },
    fireworks: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      userError: [400],
      notFound: 404
    },
    mistral: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400],
      notFound: 404
    },
    groq: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    google: {
      transient: [500, 502, 503],
      rateLimit: 429,
      auth: 401,
      permission: 403,
      userError: [400, 413],
      notFound: 404
    },
    default: {
      transient: [500, 502, 503, 529],
      rateLimit: 429,
      auth: [401, 402],
      permission: 403,
      userError: [400, 413],
      notFound: 404
    }
  };
  
  const patterns = providerPatterns[provider] || providerPatterns.default;
  
  // Check for tool errors first (shouldn't happen here but just in case)
  if (error.isToolError) {
    return { category: 'toolError', action: 'return', waitTime: 0 };
  }
  
  // Determine category based on status code
  if (patterns.transient?.includes(statusCode)) {
    return { category: 'transient', action: 'wait', waitTime: 30000 };
  }
  if (statusCode === patterns.rateLimit) {
    return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
  }
  if (patterns.auth?.includes(statusCode)) {
    return { category: 'auth', action: 'switch', waitTime: 0 };
  }
  if (statusCode === patterns.permission) {
    return { category: 'permission', action: 'return', waitTime: 0 };
  }
  if (patterns.userError?.includes(statusCode)) {
    return { category: 'userError', action: 'return', waitTime: 0 };
  }
  if (statusCode === patterns.timeout) {
    return { category: 'timeout', action: 'wait', waitTime: 30000 };
  }
  if (statusCode === patterns.notFound) {
    // Special case: OpenAI treats 404 as retryable
    return { category: 'notFound', action: 'wait', waitTime: 30000 };
  }
  
  // Default to transient for 5xx
  if (statusCode >= 500) {
    return { category: 'serverError', action: 'wait', waitTime: 30000 };
  }
  
  // Check error message for additional patterns
  if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
    return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
  }
  if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
    return { category: 'billing', action: 'switch', waitTime: 0 };
  }
  if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
    return { category: 'userError', action: 'return', waitTime: 0 };
  }
  
  // Unknown error - switch immediately
  return { category: 'unknown', action: 'switch', waitTime: 0 };
}
```

### Phase 2: Tool Error Handling

**File: `opencode/packages/opencode/src/session/processor.ts`**

Enhance tool-error handling to distinguish between tool error types:

```typescript
// Add tool error type classification
enum ToolErrorType {
  validation = 'validation',
  permission = 'permission',
  timeout = 'timeout',
  notFound = 'notFound',
  execution = 'execution'
}

function classifyToolError(error: unknown): ToolErrorType {
  const message = String(error).toLowerCase();
  
  if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
    return ToolErrorType.validation;
  }
  if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
    return ToolErrorType.permission;
  }
  if (message.includes('timeout') || message.includes('timed out')) {
    return ToolErrorType.timeout;
  }
  if (message.includes('not found') || message.includes('does not exist')) {
    return ToolErrorType.notFound;
  }
  return ToolErrorType.execution;
}

// In the switch case for "tool-error"
case "tool-error": {
  const match = toolcalls[value.toolCallId];
  if (match && match.state.status === "running") {
    await Session.updatePart({
      ...match,
      state: {
        status: "error",
        input: value.input ?? match.state.input,
        error: (value.error as any).toString(),
        errorType: classifyToolError(value.error),
        time: {
          start: match.state.time.start,
          end: Date.now(),
        },
      },
    })

    // Don't trigger fallback for tool errors - let model retry
    // Only trigger fallback for permission rejections
    if (
      value.error instanceof PermissionNext.RejectedError ||
      value.error instanceof Question.RejectedError
    ) {
      blocked = shouldBreak
    }
    
    // Mark that this was a tool error (not provider error)
    (value.error as any).isToolError = true;
    
    delete toolcalls[value.toolCallId]
  }
  break;
}
```

**File: `chat/server.js`**

Modify `shouldFallbackCliError()` to check for tool errors:

```javascript
function shouldFallbackCliError(err, message) {
  if (!err) return false;
  
  // Don't fallback on tool errors - let model retry
  if (err.isToolError) {
    log('Tool error detected - no fallback needed', {
      error: err.message,
      toolError: true
    });
    return false;
  }
  
  // ... rest of existing checks
}
```

### Phase 3: Continue Message System

Enhance `sendToOpencodeWithFallback()` with continue message tracking and provider error handling:

```javascript
async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) {
  const cliName = normalizeCli(cli || session?.cli);
  const preferredModel = model || session?.model;
  const chain = buildOpencodeAttemptChain(cliName, preferredModel);
  const tried = new Set();
  const attempts = [];
  let lastError = null;
  let switchedToBackup = false;
  
  // Track continue attempts per model
  const continueAttempts = new Map();
  const MAX_CONTINUE_ATTEMPTS = 3;
  const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.';
  
  // Track last error type to prevent infinite loops
  const lastErrorTypes = new Map();
  
  log('Fallback sequence initiated', {
    sessionId: session?.id,
    messageId: message?.id,
    primaryModel: preferredModel,
    cliName,
    chainLength: chain.length,
    timestamp: new Date().toISOString()
  });

  const tryOption = async (option, isBackup = false) => {
    const key = `${option.provider}:${option.model}`;
    if (tried.has(key)) return null;
    tried.add(key);
    
    const limit = isProviderLimited(option.provider, option.model);
    if (limit.limited) {
      attempts.push({ 
        model: option.model, 
        provider: option.provider, 
        error: `limit: ${limit.reason}`,
        classification: 'rateLimit'
      });
      return null;
    }
    
    try {
      resetMessageStreamingFields(message);
      
      // Handle continue messages
      let messageContent = content;
      const modelKey = `${option.provider}:${option.model}`;
      const continueCount = continueAttempts.get(modelKey) || 0;
      
      if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) {
        messageContent = `${CONTINUE_MESSAGE}\n\n${content}`;
        log('Sending continue message', {
          model: option.model,
          provider: option.provider,
          attempt: continueCount,
          modelKey
        });
      }
      
      const result = await sendToOpencode({ 
        session, 
        model: option.model, 
        content: messageContent, 
        message, 
        cli: cliName, 
        streamCallback, 
        opencodeSessionId 
      });
      
      const normalizedResult = (result && typeof result === 'object') ? result : { reply: result };
      
      // Token usage tracking (existing code)
      let tokensUsed = 0;
      let tokenSource = 'none';
      let tokenExtractionLog = [];
      
      if (result && typeof result === 'object' && result.tokensUsed > 0) {
        tokensUsed = result.tokensUsed;
        tokenSource = result.tokenSource || 'result';
        tokenExtractionLog = result.tokenExtractionLog || [];
      } else {
        tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false });
        if (tokensUsed > 0) {
          tokenSource = 'response-extracted';
          tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed });
        }
      }
      
      // Success: reset counters
      continueAttempts.delete(modelKey);
      lastErrorTypes.delete(modelKey);
      
      recordProviderUsage(option.provider, option.model, tokensUsed, 1);
      
      if (attempts.length) {
        log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider });
      }
      
      return { 
        reply: normalizedResult.reply, 
        model: option.model, 
        attempts, 
        provider: option.provider, 
        raw: normalizedResult.raw, 
        tokensUsed,
        tokenSource,
        tokenExtractionLog
      };
      
    } catch (err) {
      lastError = err;
      
      const errorData = {
        model: option.model,
        provider: option.provider,
        error: err.message || String(err),
        code: err.code || null,
        timestamp: new Date().toISOString()
      };
      
      // Check for early termination
      if (err.earlyTermination) {
        const partialOutputLength = (message?.partialOutput || '').length;
        const hasSubstantialOutput = partialOutputLength > 500;
        
        if (hasSubstantialOutput) {
          log('Blocking fallback - model has substantial output despite early termination', {
            model: option.model,
            provider: option.provider,
            error: err.message,
            partialOutputLength
          });
          return err;
        }
        
        // Increment continue counter
        const modelKey = `${option.provider}:${option.model}`;
        const currentCount = continueAttempts.get(modelKey) || 0;
        continueAttempts.set(modelKey, currentCount + 1);
        
        log('Early termination detected', {
          model: option.model,
          provider: option.provider,
          continueAttempt: currentCount + 1,
          maxAttempts: MAX_CONTINUE_ATTEMPTS
        });
        
        // Retry with same model if under limit
        if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) {
          errorData.earlyTermination = true;
          errorData.continueAttempt = currentCount + 1;
          errorData.willContinue = true;
          attempts.push(errorData);
          
          // Remove from tried set to allow retry with same option
          tried.delete(key);
          
          return null;
        }
        
        // Switch to next model after MAX_CONTINUE_ATTEMPTS
        log('Max continue attempts reached, switching model', {
          model: option.model,
          provider: option.provider,
          totalAttempts: MAX_CONTINUE_ATTEMPTS
        });
        
        attempts.push(errorData);
        return null;
      }
      
      // Classify provider error
      const classification = classifyProviderError(err, option.provider);
      errorData.classification = classification.category;
      
      // Track error types to prevent infinite loops
      const modelKey = `${option.provider}:${option.model}`;
      const lastErrorType = lastErrorTypes.get(modelKey);
      
      if (lastErrorType === classification.category && 
          classification.category !== 'unknown') {
        // Same error type twice in a row - might be persistent error
        log('Repeated error type detected, may need immediate switch', {
          model: option.model,
          provider: option.provider,
          errorType: classification.category
        });
        lastErrorTypes.set(modelKey, classification.category);
      }
      
      if (classification.action === 'return') {
        // User/permission errors - return to user
        log('User/permission error - returning to user', {
          category: classification.category,
          model: option.model,
          provider: option.provider
        });
        err.willNotFallback = true;
        return err;
      }
      
      if (classification.action === 'wait') {
        // Transient/rate limit errors - wait before switch
        log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, {
          model: option.model,
          provider: option.provider,
          category: classification.category,
          waitTime: classification.waitTime
        });
        
        errorData.willWait = true;
        errorData.waitTime = classification.waitTime;
        attempts.push(errorData);
        
        // Wait before allowing next attempt
        await new Promise(resolve => setTimeout(resolve, classification.waitTime));
        
        return null;
      }
      
      // Switch immediately for auth/unknown errors
      errorData.immediateSwitch = true;
      attempts.push(errorData);
      
      return null;
    }
  };
  
  // Try each option in chain
  for (const option of chain) {
    const result = await tryOption(option);
    if (result instanceof Error) break;
    if (result) return result;
  }
  
  // Try backup model if configured
  const backupModel = (providerLimits.opencodeBackupModel || '').trim();
  if (backupModel) {
    const backupChain = buildOpencodeAttemptChain(cliName, backupModel);
    for (const option of backupChain) {
      const result = await tryOption(option, true);
      if (result instanceof Error) break;
      if (result) return result;
    }
  }
  
  const err = new Error(`All ${cliName.toUpperCase()} models failed`);
  err.attempts = attempts;
  err.cause = lastError;
  throw err;
}
```

---

## Summary of Files Modified

1. **`MODEL_FALLBACK_IMPROVEMENT_PLAN.md`** (NEW) - This comprehensive plan document
2. **`chat/server.js`** - Major modifications:
   - Add `classifyProviderError()` function (~120 lines)
   - Modify `sendToOpencodeWithFallback()` with continue message logic
   - Update `shouldFallbackCliError()` to handle tool errors

3. **`opencode/packages/opencode/src/session/processor.ts`** - Minor modifications:
   - Add `classifyToolError()` function
   - Enhance tool-error case with error type classification

---

## Testing Checklist

- [ ] Tool errors don't trigger fallback
- [ ] Early termination sends continue messages (max 3 attempts)
- [ ] Provider errors with >=500 status wait 30s before switch
- [ ] Rate limit errors (429) wait 30s before switch
- [ ] Auth errors (401, 402) switch immediately
- [ ] Permission errors (403) return to user without switch
- [ ] User errors (400, 413) return to user without switch
- [ ] Continue attempts reset on successful response
- [ ] Fallback chain respects continue attempts per model
- [ ] Logging captures all error classifications and actions

---

## Monitoring & Analytics

Add tracking for:
1. Error type distribution
2. Continue message frequency
3. Provider error wait times
4. Model switch patterns
5. Tool error vs provider error ratios

Export to monitoring system for analysis and optimization.