- Detailed error classification system for tool errors, early termination, and provider errors - Provider-specific error mappings for 11 LLM providers (OpenAI, Anthropic, OpenRouter, Chutes, NVIDIA, Together, Fireworks, Mistral, Groq, Google) - Continue message system with 3-attempt limit before model switch - 30-second wait for transient/rate limit errors before switching - Distinguishes tool errors (return to user) from provider errors (switch model) - Implementation plan with code examples for server.js and processor.ts
23 KiB
23 KiB
Model Fallback & Continue Functionality Improvement Plan
Executive Summary
This plan outlines improvements to the model fallback system to handle three distinct error categories:
- Bad tool calls - Send error data back to user/model for retry
- Request stops - Send continue message, retry 3x, then switch model
- Provider errors - Wait 30 seconds, then switch to next model in fallback chain
Current Implementation Analysis
Existing Components
| File | Function | Current Behavior |
|---|---|---|
chat/server.js:9347 |
shouldFallbackCliError() |
Decides if error warrants fallback based on patterns |
chat/server.js:9502 |
sendToOpencodeWithFallback() |
Main fallback orchestration with model chain |
chat/server.js:9296 |
buildOpencodeAttemptChain() |
Builds ordered provider/model chain |
opencode/session/retry.ts |
SessionRetry.delay() |
Calculates retry delays with exponential backoff |
opencode/session/message-v2.ts:714 |
isOpenAiErrorRetryable() |
Determines if OpenAI errors are retryable |
Current Gaps
- No explicit continue mechanism - Early terminations counted but no "continue" message system
- Tool errors not distinguished - Tool errors treated same as provider errors
- No 30-second wait for provider errors - Immediate fallback on provider issues
- Missing provider-specific error mappings - Generic patterns only
Proposed Architecture
Error Classification Flow
Error Occurs
│
├── Tool Call Error?
│ ├── Yes → Check tool error type
│ │ ├── Validation/Schema error → Send error back, continue with same model
│ │ ├── Permission denied → Send error back, continue with same model
│ │ ├── Execution failure → Send error back, continue with same model
│ │ └── Tool timeout → Send error back, continue with same model
│
├── Early Termination?
│ ├── Yes → Increment termination counter
│ │ ├── Count < 3? → Send "continue" message, retry same model
│ │ └── Count >= 3? → Switch to next model
│
└── Provider Error?
├── Yes → Classify error type
│ ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
│ ├── Rate limit (429) → Wait 30s, switch model
│ ├── Auth/Billing (401, 402, 403) → Switch model immediately
│ ├── User error (400, 413) → Send error back, don't switch
│ └── Other (404, etc.) → Wait 30s, switch model
Provider-Specific Error Mappings
Error Categories & Actions
| Category | Action | Wait Time | Switch Model? |
|---|---|---|---|
| Tool Error (validation/schema) | Return to user | 0s | No |
| Tool Error (execution/permission) | Return to user | 0s | No |
| Early Termination | Send continue | 0s | After 3 attempts |
| Transient Server Error (5xx) | Wait | 30s | Yes |
| Rate Limit (429) | Wait | 30s | Yes |
| Auth/Billing (401, 402) | Switch immediately | 0s | Yes |
| Permission (403) | Return to user | 0s | No |
| Not Found (404) | Wait | 30s | Yes |
| User Error (400, 413) | Return to user | 0s | No |
| Timeout (408) | Wait | 30s | Yes |
| Overloaded (529) | Wait | 30s | Yes |
Detailed Provider Error Codes
OpenAI
400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch
403 (permission_error) → Permission error - return to user
404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 → Overloaded - wait 30s, switch
Anthropic (Claude)
400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
403 (permission_error) → Permission error - return to user
404 (not_found_error) → Not found - wait 30s, switch
413 (request_too_large) → User error - return to user
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 (overloaded_error) → Overloaded - wait 30s, switch
OpenRouter
400 (bad_request) → User error - return to user
401 (invalid_credentials) → Auth error - immediate switch
402 (insufficient_credits) → Billing error - immediate switch
403 (moderation_flagged) → Permission - return to user
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limited) → Rate limit - wait 30s, switch
502 (model_down) → Model down - wait 30s, switch
503 (no_providers) → No providers - immediate switch
Chutes AI
MODEL_LOADING_FAILED → Transient - wait 30s, switch
INFERENCE_TIMEOUT → Timeout - wait 30s, switch
OUT_OF_MEMORY → Transient - wait 30s, switch
INVALID_INPUT → User error - return to user
MODEL_OVERLOADED → Overloaded - wait 30s, switch
GENERATION_FAILED → Transient - wait 30s, switch
CONTEXT_LENGTH_EXCEEDED → User error - return to user
400 → Bad request - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
NVIDIA NIM
401 → Auth error - immediate switch
403 → Permission - return to user
404 → Not found - wait 30s, switch
429 (too_many_requests) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
Together AI
400 (invalid_request) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch
403 (bad_request) → User error - return to user
429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
Fireworks AI
400 → User error - return to user
401 → Auth error - immediate switch
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
Mistral
400 → Bad request - return to user
401 → Unauthorized - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
429 → Too many requests - wait 30s, switch
500+ → Server error - wait 30s, switch
Groq
400 → Bad request - return to user
401 → Unauthorized - immediate switch
402 → Payment required - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
413 → Payload too large - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
Google (Gemini)
400 → Invalid request - return to user
401 → Unauthorized - immediate switch
403 → Permission denied - return to user
404 → Not found - wait 30s, switch
413 → Request too large - return to user
429 (resource_exhausted) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
Implementation Plan
Phase 1: Error Classification System
File: chat/server.js
Create new function classifyProviderError():
function classifyProviderError(error, provider) {
// Extract HTTP status code
const statusCode = error.statusCode || error.code;
const errorMessage = (error.message || '').toLowerCase();
const providerPatterns = {
openai: {
transient: [500, 502, 503, 504, 529],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400],
notFound: 404,
timeout: 408
},
anthropic: {
transient: [500, 529],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400, 413],
notFound: 404
},
openrouter: {
transient: [502, 503],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400],
timeout: 408,
notFound: 404
},
chutes: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400, 413],
notFound: 404
},
nvidia: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400],
notFound: 404
},
together: {
transient: [500, 502, 503],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400],
notFound: 404
},
fireworks: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
userError: [400],
notFound: 404
},
mistral: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400],
notFound: 404
},
groq: {
transient: [500, 502, 503],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400, 413],
notFound: 404
},
google: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400, 413],
notFound: 404
},
default: {
transient: [500, 502, 503, 529],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400, 413],
notFound: 404
}
};
const patterns = providerPatterns[provider] || providerPatterns.default;
// Check for tool errors first (shouldn't happen here but just in case)
if (error.isToolError) {
return { category: 'toolError', action: 'return', waitTime: 0 };
}
// Determine category based on status code
if (patterns.transient?.includes(statusCode)) {
return { category: 'transient', action: 'wait', waitTime: 30000 };
}
if (statusCode === patterns.rateLimit) {
return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
}
if (patterns.auth?.includes(statusCode)) {
return { category: 'auth', action: 'switch', waitTime: 0 };
}
if (statusCode === patterns.permission) {
return { category: 'permission', action: 'return', waitTime: 0 };
}
if (patterns.userError?.includes(statusCode)) {
return { category: 'userError', action: 'return', waitTime: 0 };
}
if (statusCode === patterns.timeout) {
return { category: 'timeout', action: 'wait', waitTime: 30000 };
}
if (statusCode === patterns.notFound) {
// Special case: OpenAI treats 404 as retryable
return { category: 'notFound', action: 'wait', waitTime: 30000 };
}
// Default to transient for 5xx
if (statusCode >= 500) {
return { category: 'serverError', action: 'wait', waitTime: 30000 };
}
// Check error message for additional patterns
if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
}
if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
return { category: 'billing', action: 'switch', waitTime: 0 };
}
if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
return { category: 'userError', action: 'return', waitTime: 0 };
}
// Unknown error - switch immediately
return { category: 'unknown', action: 'switch', waitTime: 0 };
}
Phase 2: Tool Error Handling
File: opencode/packages/opencode/src/session/processor.ts
Enhance tool-error handling to distinguish between tool error types:
// Add tool error type classification
enum ToolErrorType {
validation = 'validation',
permission = 'permission',
timeout = 'timeout',
notFound = 'notFound',
execution = 'execution'
}
function classifyToolError(error: unknown): ToolErrorType {
const message = String(error).toLowerCase();
if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
return ToolErrorType.validation;
}
if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
return ToolErrorType.permission;
}
if (message.includes('timeout') || message.includes('timed out')) {
return ToolErrorType.timeout;
}
if (message.includes('not found') || message.includes('does not exist')) {
return ToolErrorType.notFound;
}
return ToolErrorType.execution;
}
// In the switch case for "tool-error"
case "tool-error": {
const match = toolcalls[value.toolCallId];
if (match && match.state.status === "running") {
await Session.updatePart({
...match,
state: {
status: "error",
input: value.input ?? match.state.input,
error: (value.error as any).toString(),
errorType: classifyToolError(value.error),
time: {
start: match.state.time.start,
end: Date.now(),
},
},
})
// Don't trigger fallback for tool errors - let model retry
// Only trigger fallback for permission rejections
if (
value.error instanceof PermissionNext.RejectedError ||
value.error instanceof Question.RejectedError
) {
blocked = shouldBreak
}
// Mark that this was a tool error (not provider error)
(value.error as any).isToolError = true;
delete toolcalls[value.toolCallId]
}
break;
}
File: chat/server.js
Modify shouldFallbackCliError() to check for tool errors:
function shouldFallbackCliError(err, message) {
if (!err) return false;
// Don't fallback on tool errors - let model retry
if (err.isToolError) {
log('Tool error detected - no fallback needed', {
error: err.message,
toolError: true
});
return false;
}
// ... rest of existing checks
}
Phase 3: Continue Message System
Enhance sendToOpencodeWithFallback() with continue message tracking and provider error handling:
async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) {
const cliName = normalizeCli(cli || session?.cli);
const preferredModel = model || session?.model;
const chain = buildOpencodeAttemptChain(cliName, preferredModel);
const tried = new Set();
const attempts = [];
let lastError = null;
let switchedToBackup = false;
// Track continue attempts per model
const continueAttempts = new Map();
const MAX_CONTINUE_ATTEMPTS = 3;
const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.';
// Track last error type to prevent infinite loops
const lastErrorTypes = new Map();
log('Fallback sequence initiated', {
sessionId: session?.id,
messageId: message?.id,
primaryModel: preferredModel,
cliName,
chainLength: chain.length,
timestamp: new Date().toISOString()
});
const tryOption = async (option, isBackup = false) => {
const key = `${option.provider}:${option.model}`;
if (tried.has(key)) return null;
tried.add(key);
const limit = isProviderLimited(option.provider, option.model);
if (limit.limited) {
attempts.push({
model: option.model,
provider: option.provider,
error: `limit: ${limit.reason}`,
classification: 'rateLimit'
});
return null;
}
try {
resetMessageStreamingFields(message);
// Handle continue messages
let messageContent = content;
const modelKey = `${option.provider}:${option.model}`;
const continueCount = continueAttempts.get(modelKey) || 0;
if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) {
messageContent = `${CONTINUE_MESSAGE}\n\n${content}`;
log('Sending continue message', {
model: option.model,
provider: option.provider,
attempt: continueCount,
modelKey
});
}
const result = await sendToOpencode({
session,
model: option.model,
content: messageContent,
message,
cli: cliName,
streamCallback,
opencodeSessionId
});
const normalizedResult = (result && typeof result === 'object') ? result : { reply: result };
// Token usage tracking (existing code)
let tokensUsed = 0;
let tokenSource = 'none';
let tokenExtractionLog = [];
if (result && typeof result === 'object' && result.tokensUsed > 0) {
tokensUsed = result.tokensUsed;
tokenSource = result.tokenSource || 'result';
tokenExtractionLog = result.tokenExtractionLog || [];
} else {
tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false });
if (tokensUsed > 0) {
tokenSource = 'response-extracted';
tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed });
}
}
// Success: reset counters
continueAttempts.delete(modelKey);
lastErrorTypes.delete(modelKey);
recordProviderUsage(option.provider, option.model, tokensUsed, 1);
if (attempts.length) {
log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider });
}
return {
reply: normalizedResult.reply,
model: option.model,
attempts,
provider: option.provider,
raw: normalizedResult.raw,
tokensUsed,
tokenSource,
tokenExtractionLog
};
} catch (err) {
lastError = err;
const errorData = {
model: option.model,
provider: option.provider,
error: err.message || String(err),
code: err.code || null,
timestamp: new Date().toISOString()
};
// Check for early termination
if (err.earlyTermination) {
const partialOutputLength = (message?.partialOutput || '').length;
const hasSubstantialOutput = partialOutputLength > 500;
if (hasSubstantialOutput) {
log('Blocking fallback - model has substantial output despite early termination', {
model: option.model,
provider: option.provider,
error: err.message,
partialOutputLength
});
return err;
}
// Increment continue counter
const modelKey = `${option.provider}:${option.model}`;
const currentCount = continueAttempts.get(modelKey) || 0;
continueAttempts.set(modelKey, currentCount + 1);
log('Early termination detected', {
model: option.model,
provider: option.provider,
continueAttempt: currentCount + 1,
maxAttempts: MAX_CONTINUE_ATTEMPTS
});
// Retry with same model if under limit
if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) {
errorData.earlyTermination = true;
errorData.continueAttempt = currentCount + 1;
errorData.willContinue = true;
attempts.push(errorData);
// Remove from tried set to allow retry with same option
tried.delete(key);
return null;
}
// Switch to next model after MAX_CONTINUE_ATTEMPTS
log('Max continue attempts reached, switching model', {
model: option.model,
provider: option.provider,
totalAttempts: MAX_CONTINUE_ATTEMPTS
});
attempts.push(errorData);
return null;
}
// Classify provider error
const classification = classifyProviderError(err, option.provider);
errorData.classification = classification.category;
// Track error types to prevent infinite loops
const modelKey = `${option.provider}:${option.model}`;
const lastErrorType = lastErrorTypes.get(modelKey);
if (lastErrorType === classification.category &&
classification.category !== 'unknown') {
// Same error type twice in a row - might be persistent error
log('Repeated error type detected, may need immediate switch', {
model: option.model,
provider: option.provider,
errorType: classification.category
});
lastErrorTypes.set(modelKey, classification.category);
}
if (classification.action === 'return') {
// User/permission errors - return to user
log('User/permission error - returning to user', {
category: classification.category,
model: option.model,
provider: option.provider
});
err.willNotFallback = true;
return err;
}
if (classification.action === 'wait') {
// Transient/rate limit errors - wait before switch
log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, {
model: option.model,
provider: option.provider,
category: classification.category,
waitTime: classification.waitTime
});
errorData.willWait = true;
errorData.waitTime = classification.waitTime;
attempts.push(errorData);
// Wait before allowing next attempt
await new Promise(resolve => setTimeout(resolve, classification.waitTime));
return null;
}
// Switch immediately for auth/unknown errors
errorData.immediateSwitch = true;
attempts.push(errorData);
return null;
}
};
// Try each option in chain
for (const option of chain) {
const result = await tryOption(option);
if (result instanceof Error) break;
if (result) return result;
}
// Try backup model if configured
const backupModel = (providerLimits.opencodeBackupModel || '').trim();
if (backupModel) {
const backupChain = buildOpencodeAttemptChain(cliName, backupModel);
for (const option of backupChain) {
const result = await tryOption(option, true);
if (result instanceof Error) break;
if (result) return result;
}
}
const err = new Error(`All ${cliName.toUpperCase()} models failed`);
err.attempts = attempts;
err.cause = lastError;
throw err;
}
Summary of Files Modified
-
MODEL_FALLBACK_IMPROVEMENT_PLAN.md(NEW) - This comprehensive plan document -
chat/server.js- Major modifications:- Add
classifyProviderError()function (~120 lines) - Modify
sendToOpencodeWithFallback()with continue message logic - Update
shouldFallbackCliError()to handle tool errors
- Add
-
opencode/packages/opencode/src/session/processor.ts- Minor modifications:- Add
classifyToolError()function - Enhance tool-error case with error type classification
- Add
Testing Checklist
- Tool errors don't trigger fallback
- Early termination sends continue messages (max 3 attempts)
- Provider errors with >=500 status wait 30s before switch
- Rate limit errors (429) wait 30s before switch
- Auth errors (401, 402) switch immediately
- Permission errors (403) return to user without switch
- User errors (400, 413) return to user without switch
- Continue attempts reset on successful response
- Fallback chain respects continue attempts per model
- Logging captures all error classifications and actions
Monitoring & Analytics
Add tracking for:
- Error type distribution
- Continue message frequency
- Provider error wait times
- Model switch patterns
- Tool error vs provider error ratios
Export to monitoring system for analysis and optimization.