Add comprehensive model fallback improvement plan

- Detailed error classification system for tool errors, early termination, and provider errors
- Provider-specific error mappings for 11 LLM providers (OpenAI, Anthropic, OpenRouter, Chutes, NVIDIA, Together, Fireworks, Mistral, Groq, Google)
- Continue message system with 3-attempt limit before model switch
- 30-second wait for transient/rate limit errors before switching
- Distinguishes tool errors (return to user) from provider errors (switch model)
- Implementation plan with code examples for server.js and processor.ts
This commit is contained in:
southseact-3d
2026-02-08 14:16:32 +00:00
parent 9ef54cf6ee
commit 2dc94310a6

View File

@@ -0,0 +1,744 @@
# Model Fallback & Continue Functionality Improvement Plan
## Executive Summary
This plan outlines improvements to the model fallback system to handle three distinct error categories:
1. **Bad tool calls** - Send error data back to user/model for retry
2. **Request stops** - Send continue message, retry 3x, then switch model
3. **Provider errors** - Wait 30 seconds, then switch to next model in fallback chain
---
## Current Implementation Analysis
### Existing Components
| File | Function | Current Behavior |
|------|-----------|-----------------|
| `chat/server.js:9347` | `shouldFallbackCliError()` | Decides if error warrants fallback based on patterns |
| `chat/server.js:9502` | `sendToOpencodeWithFallback()` | Main fallback orchestration with model chain |
| `chat/server.js:9296` | `buildOpencodeAttemptChain()` | Builds ordered provider/model chain |
| `opencode/session/retry.ts` | `SessionRetry.delay()` | Calculates retry delays with exponential backoff |
| `opencode/session/message-v2.ts:714` | `isOpenAiErrorRetryable()` | Determines if OpenAI errors are retryable |
### Current Gaps
1. **No explicit continue mechanism** - Early terminations counted but no "continue" message system
2. **Tool errors not distinguished** - Tool errors treated same as provider errors
3. **No 30-second wait for provider errors** - Immediate fallback on provider issues
4. **Missing provider-specific error mappings** - Generic patterns only
---
## Proposed Architecture
### Error Classification Flow
```
Error Occurs
├── Tool Call Error?
│ ├── Yes → Check tool error type
│ │ ├── Validation/Schema error → Send error back, continue with same model
│ │ ├── Permission denied → Send error back, continue with same model
│ │ ├── Execution failure → Send error back, continue with same model
│ │ └── Tool timeout → Send error back, continue with same model
├── Early Termination?
│ ├── Yes → Increment termination counter
│ │ ├── Count < 3? → Send "continue" message, retry same model
│ │ └── Count >= 3? → Switch to next model
└── Provider Error?
├── Yes → Classify error type
│ ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
│ ├── Rate limit (429) → Wait 30s, switch model
│ ├── Auth/Billing (401, 402, 403) → Switch model immediately
│ ├── User error (400, 413) → Send error back, don't switch
│ └── Other (404, etc.) → Wait 30s, switch model
```
---
## Provider-Specific Error Mappings
### Error Categories & Actions
| Category | Action | Wait Time | Switch Model? |
|----------|--------|-----------|---------------|
| **Tool Error (validation/schema)** | Return to user | 0s | No |
| **Tool Error (execution/permission)** | Return to user | 0s | No |
| **Early Termination** | Send continue | 0s | After 3 attempts |
| **Transient Server Error (5xx)** | Wait | 30s | Yes |
| **Rate Limit (429)** | Wait | 30s | Yes |
| **Auth/Billing (401, 402)** | Switch immediately | 0s | Yes |
| **Permission (403)** | Return to user | 0s | No |
| **Not Found (404)** | Wait | 30s | Yes |
| **User Error (400, 413)** | Return to user | 0s | No |
| **Timeout (408)** | Wait | 30s | Yes |
| **Overloaded (529)** | Wait | 30s | Yes |
### Detailed Provider Error Codes
#### OpenAI
```
400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch
403 (permission_error) → Permission error - return to user
404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 → Overloaded - wait 30s, switch
```
#### Anthropic (Claude)
```
400 (invalid_request_error) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
403 (permission_error) → Permission error - return to user
404 (not_found_error) → Not found - wait 30s, switch
413 (request_too_large) → User error - return to user
429 (rate_limit_error) → Rate limit - wait 30s, switch
500 (api_error) → Server error - wait 30s, switch
529 (overloaded_error) → Overloaded - wait 30s, switch
```
#### OpenRouter
```
400 (bad_request) → User error - return to user
401 (invalid_credentials) → Auth error - immediate switch
402 (insufficient_credits) → Billing error - immediate switch
403 (moderation_flagged) → Permission - return to user
408 (timeout) → Timeout - wait 30s, switch
429 (rate_limited) → Rate limit - wait 30s, switch
502 (model_down) → Model down - wait 30s, switch
503 (no_providers) → No providers - immediate switch
```
#### Chutes AI
```
MODEL_LOADING_FAILED → Transient - wait 30s, switch
INFERENCE_TIMEOUT → Timeout - wait 30s, switch
OUT_OF_MEMORY → Transient - wait 30s, switch
INVALID_INPUT → User error - return to user
MODEL_OVERLOADED → Overloaded - wait 30s, switch
GENERATION_FAILED → Transient - wait 30s, switch
CONTEXT_LENGTH_EXCEEDED → User error - return to user
400 → Bad request - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```
#### NVIDIA NIM
```
401 → Auth error - immediate switch
403 → Permission - return to user
404 → Not found - wait 30s, switch
429 (too_many_requests) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```
#### Together AI
```
400 (invalid_request) → User error - return to user
401 (authentication_error) → Auth error - immediate switch
402 (payment_required) → Billing error - immediate switch
403 (bad_request) → User error - return to user
429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```
#### Fireworks AI
```
400 → User error - return to user
401 → Auth error - immediate switch
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```
#### Mistral
```
400 → Bad request - return to user
401 → Unauthorized - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
429 → Too many requests - wait 30s, switch
500+ → Server error - wait 30s, switch
```
#### Groq
```
400 → Bad request - return to user
401 → Unauthorized - immediate switch
402 → Payment required - immediate switch
403 → Forbidden - return to user
404 → Not found - wait 30s, switch
413 → Payload too large - return to user
429 → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```
#### Google (Gemini)
```
400 → Invalid request - return to user
401 → Unauthorized - immediate switch
403 → Permission denied - return to user
404 → Not found - wait 30s, switch
413 → Request too large - return to user
429 (resource_exhausted) → Rate limit - wait 30s, switch
500+ → Server error - wait 30s, switch
```
---
## Implementation Plan
### Phase 1: Error Classification System
**File: `chat/server.js`**
Create new function `classifyProviderError()`:
```javascript
function classifyProviderError(error, provider) {
// Extract HTTP status code
const statusCode = error.statusCode || error.code;
const errorMessage = (error.message || '').toLowerCase();
const providerPatterns = {
openai: {
transient: [500, 502, 503, 504, 529],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400],
notFound: 404,
timeout: 408
},
anthropic: {
transient: [500, 529],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400, 413],
notFound: 404
},
openrouter: {
transient: [502, 503],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400],
timeout: 408,
notFound: 404
},
chutes: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400, 413],
notFound: 404
},
nvidia: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400],
notFound: 404
},
together: {
transient: [500, 502, 503],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400],
notFound: 404
},
fireworks: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
userError: [400],
notFound: 404
},
mistral: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400],
notFound: 404
},
groq: {
transient: [500, 502, 503],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400, 413],
notFound: 404
},
google: {
transient: [500, 502, 503],
rateLimit: 429,
auth: 401,
permission: 403,
userError: [400, 413],
notFound: 404
},
default: {
transient: [500, 502, 503, 529],
rateLimit: 429,
auth: [401, 402],
permission: 403,
userError: [400, 413],
notFound: 404
}
};
const patterns = providerPatterns[provider] || providerPatterns.default;
// Check for tool errors first (shouldn't happen here but just in case)
if (error.isToolError) {
return { category: 'toolError', action: 'return', waitTime: 0 };
}
// Determine category based on status code
if (patterns.transient?.includes(statusCode)) {
return { category: 'transient', action: 'wait', waitTime: 30000 };
}
if (statusCode === patterns.rateLimit) {
return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
}
if (patterns.auth?.includes(statusCode)) {
return { category: 'auth', action: 'switch', waitTime: 0 };
}
if (statusCode === patterns.permission) {
return { category: 'permission', action: 'return', waitTime: 0 };
}
if (patterns.userError?.includes(statusCode)) {
return { category: 'userError', action: 'return', waitTime: 0 };
}
if (statusCode === patterns.timeout) {
return { category: 'timeout', action: 'wait', waitTime: 30000 };
}
if (statusCode === patterns.notFound) {
// Special case: OpenAI treats 404 as retryable
return { category: 'notFound', action: 'wait', waitTime: 30000 };
}
// Default to transient for 5xx
if (statusCode >= 500) {
return { category: 'serverError', action: 'wait', waitTime: 30000 };
}
// Check error message for additional patterns
if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
}
if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
return { category: 'billing', action: 'switch', waitTime: 0 };
}
if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
return { category: 'userError', action: 'return', waitTime: 0 };
}
// Unknown error - switch immediately
return { category: 'unknown', action: 'switch', waitTime: 0 };
}
```
### Phase 2: Tool Error Handling
**File: `opencode/packages/opencode/src/session/processor.ts`**
Enhance tool-error handling to distinguish between tool error types:
```typescript
// Add tool error type classification
enum ToolErrorType {
validation = 'validation',
permission = 'permission',
timeout = 'timeout',
notFound = 'notFound',
execution = 'execution'
}
function classifyToolError(error: unknown): ToolErrorType {
const message = String(error).toLowerCase();
if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
return ToolErrorType.validation;
}
if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
return ToolErrorType.permission;
}
if (message.includes('timeout') || message.includes('timed out')) {
return ToolErrorType.timeout;
}
if (message.includes('not found') || message.includes('does not exist')) {
return ToolErrorType.notFound;
}
return ToolErrorType.execution;
}
// In the switch case for "tool-error"
case "tool-error": {
const match = toolcalls[value.toolCallId];
if (match && match.state.status === "running") {
await Session.updatePart({
...match,
state: {
status: "error",
input: value.input ?? match.state.input,
error: (value.error as any).toString(),
errorType: classifyToolError(value.error),
time: {
start: match.state.time.start,
end: Date.now(),
},
},
})
// Don't trigger fallback for tool errors - let model retry
// Only trigger fallback for permission rejections
if (
value.error instanceof PermissionNext.RejectedError ||
value.error instanceof Question.RejectedError
) {
blocked = shouldBreak
}
// Mark that this was a tool error (not provider error)
(value.error as any).isToolError = true;
delete toolcalls[value.toolCallId]
}
break;
}
```
**File: `chat/server.js`**
Modify `shouldFallbackCliError()` to check for tool errors:
```javascript
function shouldFallbackCliError(err, message) {
if (!err) return false;
// Don't fallback on tool errors - let model retry
if (err.isToolError) {
log('Tool error detected - no fallback needed', {
error: err.message,
toolError: true
});
return false;
}
// ... rest of existing checks
}
```
### Phase 3: Continue Message System
Enhance `sendToOpencodeWithFallback()` with continue message tracking and provider error handling:
```javascript
async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) {
const cliName = normalizeCli(cli || session?.cli);
const preferredModel = model || session?.model;
const chain = buildOpencodeAttemptChain(cliName, preferredModel);
const tried = new Set();
const attempts = [];
let lastError = null;
let switchedToBackup = false;
// Track continue attempts per model
const continueAttempts = new Map();
const MAX_CONTINUE_ATTEMPTS = 3;
const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.';
// Track last error type to prevent infinite loops
const lastErrorTypes = new Map();
log('Fallback sequence initiated', {
sessionId: session?.id,
messageId: message?.id,
primaryModel: preferredModel,
cliName,
chainLength: chain.length,
timestamp: new Date().toISOString()
});
const tryOption = async (option, isBackup = false) => {
const key = `${option.provider}:${option.model}`;
if (tried.has(key)) return null;
tried.add(key);
const limit = isProviderLimited(option.provider, option.model);
if (limit.limited) {
attempts.push({
model: option.model,
provider: option.provider,
error: `limit: ${limit.reason}`,
classification: 'rateLimit'
});
return null;
}
try {
resetMessageStreamingFields(message);
// Handle continue messages
let messageContent = content;
const modelKey = `${option.provider}:${option.model}`;
const continueCount = continueAttempts.get(modelKey) || 0;
if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) {
messageContent = `${CONTINUE_MESSAGE}\n\n${content}`;
log('Sending continue message', {
model: option.model,
provider: option.provider,
attempt: continueCount,
modelKey
});
}
const result = await sendToOpencode({
session,
model: option.model,
content: messageContent,
message,
cli: cliName,
streamCallback,
opencodeSessionId
});
const normalizedResult = (result && typeof result === 'object') ? result : { reply: result };
// Token usage tracking (existing code)
let tokensUsed = 0;
let tokenSource = 'none';
let tokenExtractionLog = [];
if (result && typeof result === 'object' && result.tokensUsed > 0) {
tokensUsed = result.tokensUsed;
tokenSource = result.tokenSource || 'result';
tokenExtractionLog = result.tokenExtractionLog || [];
} else {
tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false });
if (tokensUsed > 0) {
tokenSource = 'response-extracted';
tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed });
}
}
// Success: reset counters
continueAttempts.delete(modelKey);
lastErrorTypes.delete(modelKey);
recordProviderUsage(option.provider, option.model, tokensUsed, 1);
if (attempts.length) {
log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider });
}
return {
reply: normalizedResult.reply,
model: option.model,
attempts,
provider: option.provider,
raw: normalizedResult.raw,
tokensUsed,
tokenSource,
tokenExtractionLog
};
} catch (err) {
lastError = err;
const errorData = {
model: option.model,
provider: option.provider,
error: err.message || String(err),
code: err.code || null,
timestamp: new Date().toISOString()
};
// Check for early termination
if (err.earlyTermination) {
const partialOutputLength = (message?.partialOutput || '').length;
const hasSubstantialOutput = partialOutputLength > 500;
if (hasSubstantialOutput) {
log('Blocking fallback - model has substantial output despite early termination', {
model: option.model,
provider: option.provider,
error: err.message,
partialOutputLength
});
return err;
}
// Increment continue counter
const modelKey = `${option.provider}:${option.model}`;
const currentCount = continueAttempts.get(modelKey) || 0;
continueAttempts.set(modelKey, currentCount + 1);
log('Early termination detected', {
model: option.model,
provider: option.provider,
continueAttempt: currentCount + 1,
maxAttempts: MAX_CONTINUE_ATTEMPTS
});
// Retry with same model if under limit
if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) {
errorData.earlyTermination = true;
errorData.continueAttempt = currentCount + 1;
errorData.willContinue = true;
attempts.push(errorData);
// Remove from tried set to allow retry with same option
tried.delete(key);
return null;
}
// Switch to next model after MAX_CONTINUE_ATTEMPTS
log('Max continue attempts reached, switching model', {
model: option.model,
provider: option.provider,
totalAttempts: MAX_CONTINUE_ATTEMPTS
});
attempts.push(errorData);
return null;
}
// Classify provider error
const classification = classifyProviderError(err, option.provider);
errorData.classification = classification.category;
// Track error types to prevent infinite loops
const modelKey = `${option.provider}:${option.model}`;
const lastErrorType = lastErrorTypes.get(modelKey);
if (lastErrorType === classification.category &&
classification.category !== 'unknown') {
// Same error type twice in a row - might be persistent error
log('Repeated error type detected, may need immediate switch', {
model: option.model,
provider: option.provider,
errorType: classification.category
});
lastErrorTypes.set(modelKey, classification.category);
}
if (classification.action === 'return') {
// User/permission errors - return to user
log('User/permission error - returning to user', {
category: classification.category,
model: option.model,
provider: option.provider
});
err.willNotFallback = true;
return err;
}
if (classification.action === 'wait') {
// Transient/rate limit errors - wait before switch
log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, {
model: option.model,
provider: option.provider,
category: classification.category,
waitTime: classification.waitTime
});
errorData.willWait = true;
errorData.waitTime = classification.waitTime;
attempts.push(errorData);
// Wait before allowing next attempt
await new Promise(resolve => setTimeout(resolve, classification.waitTime));
return null;
}
// Switch immediately for auth/unknown errors
errorData.immediateSwitch = true;
attempts.push(errorData);
return null;
}
};
// Try each option in chain
for (const option of chain) {
const result = await tryOption(option);
if (result instanceof Error) break;
if (result) return result;
}
// Try backup model if configured
const backupModel = (providerLimits.opencodeBackupModel || '').trim();
if (backupModel) {
const backupChain = buildOpencodeAttemptChain(cliName, backupModel);
for (const option of backupChain) {
const result = await tryOption(option, true);
if (result instanceof Error) break;
if (result) return result;
}
}
const err = new Error(`All ${cliName.toUpperCase()} models failed`);
err.attempts = attempts;
err.cause = lastError;
throw err;
}
```
---
## Summary of Files Modified
1. **`MODEL_FALLBACK_IMPROVEMENT_PLAN.md`** (NEW) - This comprehensive plan document
2. **`chat/server.js`** - Major modifications:
- Add `classifyProviderError()` function (~120 lines)
- Modify `sendToOpencodeWithFallback()` with continue message logic
- Update `shouldFallbackCliError()` to handle tool errors
3. **`opencode/packages/opencode/src/session/processor.ts`** - Minor modifications:
- Add `classifyToolError()` function
- Enhance tool-error case with error type classification
---
## Testing Checklist
- [ ] Tool errors don't trigger fallback
- [ ] Early termination sends continue messages (max 3 attempts)
- [ ] Provider errors with >=500 status wait 30s before switch
- [ ] Rate limit errors (429) wait 30s before switch
- [ ] Auth errors (401, 402) switch immediately
- [ ] Permission errors (403) return to user without switch
- [ ] User errors (400, 413) return to user without switch
- [ ] Continue attempts reset on successful response
- [ ] Fallback chain respects continue attempts per model
- [ ] Logging captures all error classifications and actions
---
## Monitoring & Analytics
Add tracking for:
1. Error type distribution
2. Continue message frequency
3. Provider error wait times
4. Model switch patterns
5. Tool error vs provider error ratios
Export to monitoring system for analysis and optimization.