Add comprehensive model fallback improvement plan
- Detailed error classification system for tool errors, early termination, and provider errors - Provider-specific error mappings for 11 LLM providers (OpenAI, Anthropic, OpenRouter, Chutes, NVIDIA, Together, Fireworks, Mistral, Groq, Google) - Continue message system with 3-attempt limit before model switch - 30-second wait for transient/rate limit errors before switching - Distinguishes tool errors (return to user) from provider errors (switch model) - Implementation plan with code examples for server.js and processor.ts
This commit is contained in:
744
MODEL_FALLBACK_IMPROVEMENT_PLAN.md
Normal file
744
MODEL_FALLBACK_IMPROVEMENT_PLAN.md
Normal file
@@ -0,0 +1,744 @@
|
||||
# Model Fallback & Continue Functionality Improvement Plan
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This plan outlines improvements to the model fallback system to handle three distinct error categories:
|
||||
1. **Bad tool calls** - Send error data back to user/model for retry
|
||||
2. **Request stops** - Send continue message, retry 3x, then switch model
|
||||
3. **Provider errors** - Wait 30 seconds, then switch to next model in fallback chain
|
||||
|
||||
---
|
||||
|
||||
## Current Implementation Analysis
|
||||
|
||||
### Existing Components
|
||||
|
||||
| File | Function | Current Behavior |
|
||||
|------|-----------|-----------------|
|
||||
| `chat/server.js:9347` | `shouldFallbackCliError()` | Decides if error warrants fallback based on patterns |
|
||||
| `chat/server.js:9502` | `sendToOpencodeWithFallback()` | Main fallback orchestration with model chain |
|
||||
| `chat/server.js:9296` | `buildOpencodeAttemptChain()` | Builds ordered provider/model chain |
|
||||
| `opencode/session/retry.ts` | `SessionRetry.delay()` | Calculates retry delays with exponential backoff |
|
||||
| `opencode/session/message-v2.ts:714` | `isOpenAiErrorRetryable()` | Determines if OpenAI errors are retryable |
|
||||
|
||||
### Current Gaps
|
||||
|
||||
1. **No explicit continue mechanism** - Early terminations counted but no "continue" message system
|
||||
2. **Tool errors not distinguished** - Tool errors treated same as provider errors
|
||||
3. **No 30-second wait for provider errors** - Immediate fallback on provider issues
|
||||
4. **Missing provider-specific error mappings** - Generic patterns only
|
||||
|
||||
---
|
||||
|
||||
## Proposed Architecture
|
||||
|
||||
### Error Classification Flow
|
||||
|
||||
```
|
||||
Error Occurs
|
||||
│
|
||||
├── Tool Call Error?
|
||||
│ ├── Yes → Check tool error type
|
||||
│ │ ├── Validation/Schema error → Send error back, continue with same model
|
||||
│ │ ├── Permission denied → Send error back, continue with same model
|
||||
│ │ ├── Execution failure → Send error back, continue with same model
|
||||
│ │ └── Tool timeout → Send error back, continue with same model
|
||||
│
|
||||
├── Early Termination?
|
||||
│ ├── Yes → Increment termination counter
|
||||
│ │ ├── Count < 3? → Send "continue" message, retry same model
|
||||
│ │ └── Count >= 3? → Switch to next model
|
||||
│
|
||||
└── Provider Error?
|
||||
├── Yes → Classify error type
|
||||
│ ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
|
||||
│ ├── Rate limit (429) → Wait 30s, switch model
|
||||
│ ├── Auth/Billing (401, 402, 403) → Switch model immediately
|
||||
│ ├── User error (400, 413) → Send error back, don't switch
|
||||
│ └── Other (404, etc.) → Wait 30s, switch model
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Provider-Specific Error Mappings
|
||||
|
||||
### Error Categories & Actions
|
||||
|
||||
| Category | Action | Wait Time | Switch Model? |
|
||||
|----------|--------|-----------|---------------|
|
||||
| **Tool Error (validation/schema)** | Return to user | 0s | No |
|
||||
| **Tool Error (execution/permission)** | Return to user | 0s | No |
|
||||
| **Early Termination** | Send continue | 0s | After 3 attempts |
|
||||
| **Transient Server Error (5xx)** | Wait | 30s | Yes |
|
||||
| **Rate Limit (429)** | Wait | 30s | Yes |
|
||||
| **Auth/Billing (401, 402)** | Switch immediately | 0s | Yes |
|
||||
| **Permission (403)** | Return to user | 0s | No |
|
||||
| **Not Found (404)** | Wait | 30s | Yes |
|
||||
| **User Error (400, 413)** | Return to user | 0s | No |
|
||||
| **Timeout (408)** | Wait | 30s | Yes |
|
||||
| **Overloaded (529)** | Wait | 30s | Yes |
|
||||
|
||||
### Detailed Provider Error Codes
|
||||
|
||||
#### OpenAI
|
||||
```
|
||||
400 (invalid_request_error) → User error - return to user
|
||||
401 (authentication_error) → Auth error - immediate switch
|
||||
402 (payment_required) → Billing error - immediate switch
|
||||
403 (permission_error) → Permission error - return to user
|
||||
404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
|
||||
408 (timeout) → Timeout - wait 30s, switch
|
||||
429 (rate_limit_error) → Rate limit - wait 30s, switch
|
||||
500 (api_error) → Server error - wait 30s, switch
|
||||
529 → Overloaded - wait 30s, switch
|
||||
```
|
||||
|
||||
#### Anthropic (Claude)
|
||||
```
|
||||
400 (invalid_request_error) → User error - return to user
|
||||
401 (authentication_error) → Auth error - immediate switch
|
||||
403 (permission_error) → Permission error - return to user
|
||||
404 (not_found_error) → Not found - wait 30s, switch
|
||||
413 (request_too_large) → User error - return to user
|
||||
429 (rate_limit_error) → Rate limit - wait 30s, switch
|
||||
500 (api_error) → Server error - wait 30s, switch
|
||||
529 (overloaded_error) → Overloaded - wait 30s, switch
|
||||
```
|
||||
|
||||
#### OpenRouter
|
||||
```
|
||||
400 (bad_request) → User error - return to user
|
||||
401 (invalid_credentials) → Auth error - immediate switch
|
||||
402 (insufficient_credits) → Billing error - immediate switch
|
||||
403 (moderation_flagged) → Permission - return to user
|
||||
408 (timeout) → Timeout - wait 30s, switch
|
||||
429 (rate_limited) → Rate limit - wait 30s, switch
|
||||
502 (model_down) → Model down - wait 30s, switch
|
||||
503 (no_providers) → No providers - immediate switch
|
||||
```
|
||||
|
||||
#### Chutes AI
|
||||
```
|
||||
MODEL_LOADING_FAILED → Transient - wait 30s, switch
|
||||
INFERENCE_TIMEOUT → Timeout - wait 30s, switch
|
||||
OUT_OF_MEMORY → Transient - wait 30s, switch
|
||||
INVALID_INPUT → User error - return to user
|
||||
MODEL_OVERLOADED → Overloaded - wait 30s, switch
|
||||
GENERATION_FAILED → Transient - wait 30s, switch
|
||||
CONTEXT_LENGTH_EXCEEDED → User error - return to user
|
||||
400 → Bad request - return to user
|
||||
429 → Rate limit - wait 30s, switch
|
||||
500+ → Server error - wait 30s, switch
|
||||
```
|
||||
|
||||
#### NVIDIA NIM
|
||||
```
|
||||
401 → Auth error - immediate switch
|
||||
403 → Permission - return to user
|
||||
404 → Not found - wait 30s, switch
|
||||
429 (too_many_requests) → Rate limit - wait 30s, switch
|
||||
500+ → Server error - wait 30s, switch
|
||||
```
|
||||
|
||||
#### Together AI
|
||||
```
|
||||
400 (invalid_request) → User error - return to user
|
||||
401 (authentication_error) → Auth error - immediate switch
|
||||
402 (payment_required) → Billing error - immediate switch
|
||||
403 (bad_request) → User error - return to user
|
||||
429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
|
||||
500+ → Server error - wait 30s, switch
|
||||
```
|
||||
|
||||
#### Fireworks AI
|
||||
```
|
||||
400 → User error - return to user
|
||||
401 → Auth error - immediate switch
|
||||
429 → Rate limit - wait 30s, switch
|
||||
500+ → Server error - wait 30s, switch
|
||||
```
|
||||
|
||||
#### Mistral
|
||||
```
|
||||
400 → Bad request - return to user
|
||||
401 → Unauthorized - immediate switch
|
||||
403 → Forbidden - return to user
|
||||
404 → Not found - wait 30s, switch
|
||||
429 → Too many requests - wait 30s, switch
|
||||
500+ → Server error - wait 30s, switch
|
||||
```
|
||||
|
||||
#### Groq
|
||||
```
|
||||
400 → Bad request - return to user
|
||||
401 → Unauthorized - immediate switch
|
||||
402 → Payment required - immediate switch
|
||||
403 → Forbidden - return to user
|
||||
404 → Not found - wait 30s, switch
|
||||
413 → Payload too large - return to user
|
||||
429 → Rate limit - wait 30s, switch
|
||||
500+ → Server error - wait 30s, switch
|
||||
```
|
||||
|
||||
#### Google (Gemini)
|
||||
```
|
||||
400 → Invalid request - return to user
|
||||
401 → Unauthorized - immediate switch
|
||||
403 → Permission denied - return to user
|
||||
404 → Not found - wait 30s, switch
|
||||
413 → Request too large - return to user
|
||||
429 (resource_exhausted) → Rate limit - wait 30s, switch
|
||||
500+ → Server error - wait 30s, switch
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Error Classification System
|
||||
|
||||
**File: `chat/server.js`**
|
||||
|
||||
Create new function `classifyProviderError()`:
|
||||
|
||||
```javascript
|
||||
function classifyProviderError(error, provider) {
|
||||
// Extract HTTP status code
|
||||
const statusCode = error.statusCode || error.code;
|
||||
const errorMessage = (error.message || '').toLowerCase();
|
||||
|
||||
const providerPatterns = {
|
||||
openai: {
|
||||
transient: [500, 502, 503, 504, 529],
|
||||
rateLimit: 429,
|
||||
auth: [401, 402],
|
||||
permission: 403,
|
||||
userError: [400],
|
||||
notFound: 404,
|
||||
timeout: 408
|
||||
},
|
||||
anthropic: {
|
||||
transient: [500, 529],
|
||||
rateLimit: 429,
|
||||
auth: 401,
|
||||
permission: 403,
|
||||
userError: [400, 413],
|
||||
notFound: 404
|
||||
},
|
||||
openrouter: {
|
||||
transient: [502, 503],
|
||||
rateLimit: 429,
|
||||
auth: [401, 402],
|
||||
permission: 403,
|
||||
userError: [400],
|
||||
timeout: 408,
|
||||
notFound: 404
|
||||
},
|
||||
chutes: {
|
||||
transient: [500, 502, 503],
|
||||
rateLimit: 429,
|
||||
auth: 401,
|
||||
permission: 403,
|
||||
userError: [400, 413],
|
||||
notFound: 404
|
||||
},
|
||||
nvidia: {
|
||||
transient: [500, 502, 503],
|
||||
rateLimit: 429,
|
||||
auth: 401,
|
||||
permission: 403,
|
||||
userError: [400],
|
||||
notFound: 404
|
||||
},
|
||||
together: {
|
||||
transient: [500, 502, 503],
|
||||
rateLimit: 429,
|
||||
auth: [401, 402],
|
||||
permission: 403,
|
||||
userError: [400],
|
||||
notFound: 404
|
||||
},
|
||||
fireworks: {
|
||||
transient: [500, 502, 503],
|
||||
rateLimit: 429,
|
||||
auth: 401,
|
||||
userError: [400],
|
||||
notFound: 404
|
||||
},
|
||||
mistral: {
|
||||
transient: [500, 502, 503],
|
||||
rateLimit: 429,
|
||||
auth: 401,
|
||||
permission: 403,
|
||||
userError: [400],
|
||||
notFound: 404
|
||||
},
|
||||
groq: {
|
||||
transient: [500, 502, 503],
|
||||
rateLimit: 429,
|
||||
auth: [401, 402],
|
||||
permission: 403,
|
||||
userError: [400, 413],
|
||||
notFound: 404
|
||||
},
|
||||
google: {
|
||||
transient: [500, 502, 503],
|
||||
rateLimit: 429,
|
||||
auth: 401,
|
||||
permission: 403,
|
||||
userError: [400, 413],
|
||||
notFound: 404
|
||||
},
|
||||
default: {
|
||||
transient: [500, 502, 503, 529],
|
||||
rateLimit: 429,
|
||||
auth: [401, 402],
|
||||
permission: 403,
|
||||
userError: [400, 413],
|
||||
notFound: 404
|
||||
}
|
||||
};
|
||||
|
||||
const patterns = providerPatterns[provider] || providerPatterns.default;
|
||||
|
||||
// Check for tool errors first (shouldn't happen here but just in case)
|
||||
if (error.isToolError) {
|
||||
return { category: 'toolError', action: 'return', waitTime: 0 };
|
||||
}
|
||||
|
||||
// Determine category based on status code
|
||||
if (patterns.transient?.includes(statusCode)) {
|
||||
return { category: 'transient', action: 'wait', waitTime: 30000 };
|
||||
}
|
||||
if (statusCode === patterns.rateLimit) {
|
||||
return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
|
||||
}
|
||||
if (patterns.auth?.includes(statusCode)) {
|
||||
return { category: 'auth', action: 'switch', waitTime: 0 };
|
||||
}
|
||||
if (statusCode === patterns.permission) {
|
||||
return { category: 'permission', action: 'return', waitTime: 0 };
|
||||
}
|
||||
if (patterns.userError?.includes(statusCode)) {
|
||||
return { category: 'userError', action: 'return', waitTime: 0 };
|
||||
}
|
||||
if (statusCode === patterns.timeout) {
|
||||
return { category: 'timeout', action: 'wait', waitTime: 30000 };
|
||||
}
|
||||
if (statusCode === patterns.notFound) {
|
||||
// Special case: OpenAI treats 404 as retryable
|
||||
return { category: 'notFound', action: 'wait', waitTime: 30000 };
|
||||
}
|
||||
|
||||
// Default to transient for 5xx
|
||||
if (statusCode >= 500) {
|
||||
return { category: 'serverError', action: 'wait', waitTime: 30000 };
|
||||
}
|
||||
|
||||
// Check error message for additional patterns
|
||||
if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
|
||||
return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
|
||||
}
|
||||
if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
|
||||
return { category: 'billing', action: 'switch', waitTime: 0 };
|
||||
}
|
||||
if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
|
||||
return { category: 'userError', action: 'return', waitTime: 0 };
|
||||
}
|
||||
|
||||
// Unknown error - switch immediately
|
||||
return { category: 'unknown', action: 'switch', waitTime: 0 };
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 2: Tool Error Handling
|
||||
|
||||
**File: `opencode/packages/opencode/src/session/processor.ts`**
|
||||
|
||||
Enhance tool-error handling to distinguish between tool error types:
|
||||
|
||||
```typescript
|
||||
// Add tool error type classification
|
||||
enum ToolErrorType {
|
||||
validation = 'validation',
|
||||
permission = 'permission',
|
||||
timeout = 'timeout',
|
||||
notFound = 'notFound',
|
||||
execution = 'execution'
|
||||
}
|
||||
|
||||
function classifyToolError(error: unknown): ToolErrorType {
|
||||
const message = String(error).toLowerCase();
|
||||
|
||||
if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
|
||||
return ToolErrorType.validation;
|
||||
}
|
||||
if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
|
||||
return ToolErrorType.permission;
|
||||
}
|
||||
if (message.includes('timeout') || message.includes('timed out')) {
|
||||
return ToolErrorType.timeout;
|
||||
}
|
||||
if (message.includes('not found') || message.includes('does not exist')) {
|
||||
return ToolErrorType.notFound;
|
||||
}
|
||||
return ToolErrorType.execution;
|
||||
}
|
||||
|
||||
// In the switch case for "tool-error"
|
||||
case "tool-error": {
|
||||
const match = toolcalls[value.toolCallId];
|
||||
if (match && match.state.status === "running") {
|
||||
await Session.updatePart({
|
||||
...match,
|
||||
state: {
|
||||
status: "error",
|
||||
input: value.input ?? match.state.input,
|
||||
error: (value.error as any).toString(),
|
||||
errorType: classifyToolError(value.error),
|
||||
time: {
|
||||
start: match.state.time.start,
|
||||
end: Date.now(),
|
||||
},
|
||||
},
|
||||
})
|
||||
|
||||
// Don't trigger fallback for tool errors - let model retry
|
||||
// Only trigger fallback for permission rejections
|
||||
if (
|
||||
value.error instanceof PermissionNext.RejectedError ||
|
||||
value.error instanceof Question.RejectedError
|
||||
) {
|
||||
blocked = shouldBreak
|
||||
}
|
||||
|
||||
// Mark that this was a tool error (not provider error)
|
||||
(value.error as any).isToolError = true;
|
||||
|
||||
delete toolcalls[value.toolCallId]
|
||||
}
|
||||
break;
|
||||
}
|
||||
```
|
||||
|
||||
**File: `chat/server.js`**
|
||||
|
||||
Modify `shouldFallbackCliError()` to check for tool errors:
|
||||
|
||||
```javascript
|
||||
function shouldFallbackCliError(err, message) {
|
||||
if (!err) return false;
|
||||
|
||||
// Don't fallback on tool errors - let model retry
|
||||
if (err.isToolError) {
|
||||
log('Tool error detected - no fallback needed', {
|
||||
error: err.message,
|
||||
toolError: true
|
||||
});
|
||||
return false;
|
||||
}
|
||||
|
||||
// ... rest of existing checks
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 3: Continue Message System
|
||||
|
||||
Enhance `sendToOpencodeWithFallback()` with continue message tracking and provider error handling:
|
||||
|
||||
```javascript
|
||||
async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) {
|
||||
const cliName = normalizeCli(cli || session?.cli);
|
||||
const preferredModel = model || session?.model;
|
||||
const chain = buildOpencodeAttemptChain(cliName, preferredModel);
|
||||
const tried = new Set();
|
||||
const attempts = [];
|
||||
let lastError = null;
|
||||
let switchedToBackup = false;
|
||||
|
||||
// Track continue attempts per model
|
||||
const continueAttempts = new Map();
|
||||
const MAX_CONTINUE_ATTEMPTS = 3;
|
||||
const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.';
|
||||
|
||||
// Track last error type to prevent infinite loops
|
||||
const lastErrorTypes = new Map();
|
||||
|
||||
log('Fallback sequence initiated', {
|
||||
sessionId: session?.id,
|
||||
messageId: message?.id,
|
||||
primaryModel: preferredModel,
|
||||
cliName,
|
||||
chainLength: chain.length,
|
||||
timestamp: new Date().toISOString()
|
||||
});
|
||||
|
||||
const tryOption = async (option, isBackup = false) => {
|
||||
const key = `${option.provider}:${option.model}`;
|
||||
if (tried.has(key)) return null;
|
||||
tried.add(key);
|
||||
|
||||
const limit = isProviderLimited(option.provider, option.model);
|
||||
if (limit.limited) {
|
||||
attempts.push({
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
error: `limit: ${limit.reason}`,
|
||||
classification: 'rateLimit'
|
||||
});
|
||||
return null;
|
||||
}
|
||||
|
||||
try {
|
||||
resetMessageStreamingFields(message);
|
||||
|
||||
// Handle continue messages
|
||||
let messageContent = content;
|
||||
const modelKey = `${option.provider}:${option.model}`;
|
||||
const continueCount = continueAttempts.get(modelKey) || 0;
|
||||
|
||||
if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) {
|
||||
messageContent = `${CONTINUE_MESSAGE}\n\n${content}`;
|
||||
log('Sending continue message', {
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
attempt: continueCount,
|
||||
modelKey
|
||||
});
|
||||
}
|
||||
|
||||
const result = await sendToOpencode({
|
||||
session,
|
||||
model: option.model,
|
||||
content: messageContent,
|
||||
message,
|
||||
cli: cliName,
|
||||
streamCallback,
|
||||
opencodeSessionId
|
||||
});
|
||||
|
||||
const normalizedResult = (result && typeof result === 'object') ? result : { reply: result };
|
||||
|
||||
// Token usage tracking (existing code)
|
||||
let tokensUsed = 0;
|
||||
let tokenSource = 'none';
|
||||
let tokenExtractionLog = [];
|
||||
|
||||
if (result && typeof result === 'object' && result.tokensUsed > 0) {
|
||||
tokensUsed = result.tokensUsed;
|
||||
tokenSource = result.tokenSource || 'result';
|
||||
tokenExtractionLog = result.tokenExtractionLog || [];
|
||||
} else {
|
||||
tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false });
|
||||
if (tokensUsed > 0) {
|
||||
tokenSource = 'response-extracted';
|
||||
tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed });
|
||||
}
|
||||
}
|
||||
|
||||
// Success: reset counters
|
||||
continueAttempts.delete(modelKey);
|
||||
lastErrorTypes.delete(modelKey);
|
||||
|
||||
recordProviderUsage(option.provider, option.model, tokensUsed, 1);
|
||||
|
||||
if (attempts.length) {
|
||||
log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider });
|
||||
}
|
||||
|
||||
return {
|
||||
reply: normalizedResult.reply,
|
||||
model: option.model,
|
||||
attempts,
|
||||
provider: option.provider,
|
||||
raw: normalizedResult.raw,
|
||||
tokensUsed,
|
||||
tokenSource,
|
||||
tokenExtractionLog
|
||||
};
|
||||
|
||||
} catch (err) {
|
||||
lastError = err;
|
||||
|
||||
const errorData = {
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
error: err.message || String(err),
|
||||
code: err.code || null,
|
||||
timestamp: new Date().toISOString()
|
||||
};
|
||||
|
||||
// Check for early termination
|
||||
if (err.earlyTermination) {
|
||||
const partialOutputLength = (message?.partialOutput || '').length;
|
||||
const hasSubstantialOutput = partialOutputLength > 500;
|
||||
|
||||
if (hasSubstantialOutput) {
|
||||
log('Blocking fallback - model has substantial output despite early termination', {
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
error: err.message,
|
||||
partialOutputLength
|
||||
});
|
||||
return err;
|
||||
}
|
||||
|
||||
// Increment continue counter
|
||||
const modelKey = `${option.provider}:${option.model}`;
|
||||
const currentCount = continueAttempts.get(modelKey) || 0;
|
||||
continueAttempts.set(modelKey, currentCount + 1);
|
||||
|
||||
log('Early termination detected', {
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
continueAttempt: currentCount + 1,
|
||||
maxAttempts: MAX_CONTINUE_ATTEMPTS
|
||||
});
|
||||
|
||||
// Retry with same model if under limit
|
||||
if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) {
|
||||
errorData.earlyTermination = true;
|
||||
errorData.continueAttempt = currentCount + 1;
|
||||
errorData.willContinue = true;
|
||||
attempts.push(errorData);
|
||||
|
||||
// Remove from tried set to allow retry with same option
|
||||
tried.delete(key);
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
// Switch to next model after MAX_CONTINUE_ATTEMPTS
|
||||
log('Max continue attempts reached, switching model', {
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
totalAttempts: MAX_CONTINUE_ATTEMPTS
|
||||
});
|
||||
|
||||
attempts.push(errorData);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Classify provider error
|
||||
const classification = classifyProviderError(err, option.provider);
|
||||
errorData.classification = classification.category;
|
||||
|
||||
// Track error types to prevent infinite loops
|
||||
const modelKey = `${option.provider}:${option.model}`;
|
||||
const lastErrorType = lastErrorTypes.get(modelKey);
|
||||
|
||||
if (lastErrorType === classification.category &&
|
||||
classification.category !== 'unknown') {
|
||||
// Same error type twice in a row - might be persistent error
|
||||
log('Repeated error type detected, may need immediate switch', {
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
errorType: classification.category
|
||||
});
|
||||
lastErrorTypes.set(modelKey, classification.category);
|
||||
}
|
||||
|
||||
if (classification.action === 'return') {
|
||||
// User/permission errors - return to user
|
||||
log('User/permission error - returning to user', {
|
||||
category: classification.category,
|
||||
model: option.model,
|
||||
provider: option.provider
|
||||
});
|
||||
err.willNotFallback = true;
|
||||
return err;
|
||||
}
|
||||
|
||||
if (classification.action === 'wait') {
|
||||
// Transient/rate limit errors - wait before switch
|
||||
log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, {
|
||||
model: option.model,
|
||||
provider: option.provider,
|
||||
category: classification.category,
|
||||
waitTime: classification.waitTime
|
||||
});
|
||||
|
||||
errorData.willWait = true;
|
||||
errorData.waitTime = classification.waitTime;
|
||||
attempts.push(errorData);
|
||||
|
||||
// Wait before allowing next attempt
|
||||
await new Promise(resolve => setTimeout(resolve, classification.waitTime));
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
// Switch immediately for auth/unknown errors
|
||||
errorData.immediateSwitch = true;
|
||||
attempts.push(errorData);
|
||||
|
||||
return null;
|
||||
}
|
||||
};
|
||||
|
||||
// Try each option in chain
|
||||
for (const option of chain) {
|
||||
const result = await tryOption(option);
|
||||
if (result instanceof Error) break;
|
||||
if (result) return result;
|
||||
}
|
||||
|
||||
// Try backup model if configured
|
||||
const backupModel = (providerLimits.opencodeBackupModel || '').trim();
|
||||
if (backupModel) {
|
||||
const backupChain = buildOpencodeAttemptChain(cliName, backupModel);
|
||||
for (const option of backupChain) {
|
||||
const result = await tryOption(option, true);
|
||||
if (result instanceof Error) break;
|
||||
if (result) return result;
|
||||
}
|
||||
}
|
||||
|
||||
const err = new Error(`All ${cliName.toUpperCase()} models failed`);
|
||||
err.attempts = attempts;
|
||||
err.cause = lastError;
|
||||
throw err;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary of Files Modified
|
||||
|
||||
1. **`MODEL_FALLBACK_IMPROVEMENT_PLAN.md`** (NEW) - This comprehensive plan document
|
||||
2. **`chat/server.js`** - Major modifications:
|
||||
- Add `classifyProviderError()` function (~120 lines)
|
||||
- Modify `sendToOpencodeWithFallback()` with continue message logic
|
||||
- Update `shouldFallbackCliError()` to handle tool errors
|
||||
|
||||
3. **`opencode/packages/opencode/src/session/processor.ts`** - Minor modifications:
|
||||
- Add `classifyToolError()` function
|
||||
- Enhance tool-error case with error type classification
|
||||
|
||||
---
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
- [ ] Tool errors don't trigger fallback
|
||||
- [ ] Early termination sends continue messages (max 3 attempts)
|
||||
- [ ] Provider errors with >=500 status wait 30s before switch
|
||||
- [ ] Rate limit errors (429) wait 30s before switch
|
||||
- [ ] Auth errors (401, 402) switch immediately
|
||||
- [ ] Permission errors (403) return to user without switch
|
||||
- [ ] User errors (400, 413) return to user without switch
|
||||
- [ ] Continue attempts reset on successful response
|
||||
- [ ] Fallback chain respects continue attempts per model
|
||||
- [ ] Logging captures all error classifications and actions
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Analytics
|
||||
|
||||
Add tracking for:
|
||||
1. Error type distribution
|
||||
2. Continue message frequency
|
||||
3. Provider error wait times
|
||||
4. Model switch patterns
|
||||
5. Tool error vs provider error ratios
|
||||
|
||||
Export to monitoring system for analysis and optimization.
|
||||
Reference in New Issue
Block a user