Add comprehensive model fallback improvement plan
- Detailed error classification system for tool errors, early termination, and provider errors - Provider-specific error mappings for 11 LLM providers (OpenAI, Anthropic, OpenRouter, Chutes, NVIDIA, Together, Fireworks, Mistral, Groq, Google) - Continue message system with 3-attempt limit before model switch - 30-second wait for transient/rate limit errors before switching - Distinguishes tool errors (return to user) from provider errors (switch model) - Implementation plan with code examples for server.js and processor.ts
This commit is contained in:
744
MODEL_FALLBACK_IMPROVEMENT_PLAN.md
Normal file
744
MODEL_FALLBACK_IMPROVEMENT_PLAN.md
Normal file
@@ -0,0 +1,744 @@
|
|||||||
|
# Model Fallback & Continue Functionality Improvement Plan
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
This plan outlines improvements to the model fallback system to handle three distinct error categories:
|
||||||
|
1. **Bad tool calls** - Send error data back to user/model for retry
|
||||||
|
2. **Request stops** - Send continue message, retry 3x, then switch model
|
||||||
|
3. **Provider errors** - Wait 30 seconds, then switch to next model in fallback chain
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Implementation Analysis
|
||||||
|
|
||||||
|
### Existing Components
|
||||||
|
|
||||||
|
| File | Function | Current Behavior |
|
||||||
|
|------|-----------|-----------------|
|
||||||
|
| `chat/server.js:9347` | `shouldFallbackCliError()` | Decides if error warrants fallback based on patterns |
|
||||||
|
| `chat/server.js:9502` | `sendToOpencodeWithFallback()` | Main fallback orchestration with model chain |
|
||||||
|
| `chat/server.js:9296` | `buildOpencodeAttemptChain()` | Builds ordered provider/model chain |
|
||||||
|
| `opencode/session/retry.ts` | `SessionRetry.delay()` | Calculates retry delays with exponential backoff |
|
||||||
|
| `opencode/session/message-v2.ts:714` | `isOpenAiErrorRetryable()` | Determines if OpenAI errors are retryable |
|
||||||
|
|
||||||
|
### Current Gaps
|
||||||
|
|
||||||
|
1. **No explicit continue mechanism** - Early terminations counted but no "continue" message system
|
||||||
|
2. **Tool errors not distinguished** - Tool errors treated same as provider errors
|
||||||
|
3. **No 30-second wait for provider errors** - Immediate fallback on provider issues
|
||||||
|
4. **Missing provider-specific error mappings** - Generic patterns only
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Proposed Architecture
|
||||||
|
|
||||||
|
### Error Classification Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Error Occurs
|
||||||
|
│
|
||||||
|
├── Tool Call Error?
|
||||||
|
│ ├── Yes → Check tool error type
|
||||||
|
│ │ ├── Validation/Schema error → Send error back, continue with same model
|
||||||
|
│ │ ├── Permission denied → Send error back, continue with same model
|
||||||
|
│ │ ├── Execution failure → Send error back, continue with same model
|
||||||
|
│ │ └── Tool timeout → Send error back, continue with same model
|
||||||
|
│
|
||||||
|
├── Early Termination?
|
||||||
|
│ ├── Yes → Increment termination counter
|
||||||
|
│ │ ├── Count < 3? → Send "continue" message, retry same model
|
||||||
|
│ │ └── Count >= 3? → Switch to next model
|
||||||
|
│
|
||||||
|
└── Provider Error?
|
||||||
|
├── Yes → Classify error type
|
||||||
|
│ ├── Transient (500, 502, 503, timeout) → Wait 30s, switch model
|
||||||
|
│ ├── Rate limit (429) → Wait 30s, switch model
|
||||||
|
│ ├── Auth/Billing (401, 402, 403) → Switch model immediately
|
||||||
|
│ ├── User error (400, 413) → Send error back, don't switch
|
||||||
|
│ └── Other (404, etc.) → Wait 30s, switch model
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provider-Specific Error Mappings
|
||||||
|
|
||||||
|
### Error Categories & Actions
|
||||||
|
|
||||||
|
| Category | Action | Wait Time | Switch Model? |
|
||||||
|
|----------|--------|-----------|---------------|
|
||||||
|
| **Tool Error (validation/schema)** | Return to user | 0s | No |
|
||||||
|
| **Tool Error (execution/permission)** | Return to user | 0s | No |
|
||||||
|
| **Early Termination** | Send continue | 0s | After 3 attempts |
|
||||||
|
| **Transient Server Error (5xx)** | Wait | 30s | Yes |
|
||||||
|
| **Rate Limit (429)** | Wait | 30s | Yes |
|
||||||
|
| **Auth/Billing (401, 402)** | Switch immediately | 0s | Yes |
|
||||||
|
| **Permission (403)** | Return to user | 0s | No |
|
||||||
|
| **Not Found (404)** | Wait | 30s | Yes |
|
||||||
|
| **User Error (400, 413)** | Return to user | 0s | No |
|
||||||
|
| **Timeout (408)** | Wait | 30s | Yes |
|
||||||
|
| **Overloaded (529)** | Wait | 30s | Yes |
|
||||||
|
|
||||||
|
### Detailed Provider Error Codes
|
||||||
|
|
||||||
|
#### OpenAI
|
||||||
|
```
|
||||||
|
400 (invalid_request_error) → User error - return to user
|
||||||
|
401 (authentication_error) → Auth error - immediate switch
|
||||||
|
402 (payment_required) → Billing error - immediate switch
|
||||||
|
403 (permission_error) → Permission error - return to user
|
||||||
|
404 (not_found_error) → OpenAI treats as retryable - wait 30s, switch
|
||||||
|
408 (timeout) → Timeout - wait 30s, switch
|
||||||
|
429 (rate_limit_error) → Rate limit - wait 30s, switch
|
||||||
|
500 (api_error) → Server error - wait 30s, switch
|
||||||
|
529 → Overloaded - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Anthropic (Claude)
|
||||||
|
```
|
||||||
|
400 (invalid_request_error) → User error - return to user
|
||||||
|
401 (authentication_error) → Auth error - immediate switch
|
||||||
|
403 (permission_error) → Permission error - return to user
|
||||||
|
404 (not_found_error) → Not found - wait 30s, switch
|
||||||
|
413 (request_too_large) → User error - return to user
|
||||||
|
429 (rate_limit_error) → Rate limit - wait 30s, switch
|
||||||
|
500 (api_error) → Server error - wait 30s, switch
|
||||||
|
529 (overloaded_error) → Overloaded - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### OpenRouter
|
||||||
|
```
|
||||||
|
400 (bad_request) → User error - return to user
|
||||||
|
401 (invalid_credentials) → Auth error - immediate switch
|
||||||
|
402 (insufficient_credits) → Billing error - immediate switch
|
||||||
|
403 (moderation_flagged) → Permission - return to user
|
||||||
|
408 (timeout) → Timeout - wait 30s, switch
|
||||||
|
429 (rate_limited) → Rate limit - wait 30s, switch
|
||||||
|
502 (model_down) → Model down - wait 30s, switch
|
||||||
|
503 (no_providers) → No providers - immediate switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Chutes AI
|
||||||
|
```
|
||||||
|
MODEL_LOADING_FAILED → Transient - wait 30s, switch
|
||||||
|
INFERENCE_TIMEOUT → Timeout - wait 30s, switch
|
||||||
|
OUT_OF_MEMORY → Transient - wait 30s, switch
|
||||||
|
INVALID_INPUT → User error - return to user
|
||||||
|
MODEL_OVERLOADED → Overloaded - wait 30s, switch
|
||||||
|
GENERATION_FAILED → Transient - wait 30s, switch
|
||||||
|
CONTEXT_LENGTH_EXCEEDED → User error - return to user
|
||||||
|
400 → Bad request - return to user
|
||||||
|
429 → Rate limit - wait 30s, switch
|
||||||
|
500+ → Server error - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### NVIDIA NIM
|
||||||
|
```
|
||||||
|
401 → Auth error - immediate switch
|
||||||
|
403 → Permission - return to user
|
||||||
|
404 → Not found - wait 30s, switch
|
||||||
|
429 (too_many_requests) → Rate limit - wait 30s, switch
|
||||||
|
500+ → Server error - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Together AI
|
||||||
|
```
|
||||||
|
400 (invalid_request) → User error - return to user
|
||||||
|
401 (authentication_error) → Auth error - immediate switch
|
||||||
|
402 (payment_required) → Billing error - immediate switch
|
||||||
|
403 (bad_request) → User error - return to user
|
||||||
|
429 (rate_limit_exceeded) → Rate limit - wait 30s, switch
|
||||||
|
500+ → Server error - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Fireworks AI
|
||||||
|
```
|
||||||
|
400 → User error - return to user
|
||||||
|
401 → Auth error - immediate switch
|
||||||
|
429 → Rate limit - wait 30s, switch
|
||||||
|
500+ → Server error - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Mistral
|
||||||
|
```
|
||||||
|
400 → Bad request - return to user
|
||||||
|
401 → Unauthorized - immediate switch
|
||||||
|
403 → Forbidden - return to user
|
||||||
|
404 → Not found - wait 30s, switch
|
||||||
|
429 → Too many requests - wait 30s, switch
|
||||||
|
500+ → Server error - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Groq
|
||||||
|
```
|
||||||
|
400 → Bad request - return to user
|
||||||
|
401 → Unauthorized - immediate switch
|
||||||
|
402 → Payment required - immediate switch
|
||||||
|
403 → Forbidden - return to user
|
||||||
|
404 → Not found - wait 30s, switch
|
||||||
|
413 → Payload too large - return to user
|
||||||
|
429 → Rate limit - wait 30s, switch
|
||||||
|
500+ → Server error - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Google (Gemini)
|
||||||
|
```
|
||||||
|
400 → Invalid request - return to user
|
||||||
|
401 → Unauthorized - immediate switch
|
||||||
|
403 → Permission denied - return to user
|
||||||
|
404 → Not found - wait 30s, switch
|
||||||
|
413 → Request too large - return to user
|
||||||
|
429 (resource_exhausted) → Rate limit - wait 30s, switch
|
||||||
|
500+ → Server error - wait 30s, switch
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1: Error Classification System
|
||||||
|
|
||||||
|
**File: `chat/server.js`**
|
||||||
|
|
||||||
|
Create new function `classifyProviderError()`:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
function classifyProviderError(error, provider) {
|
||||||
|
// Extract HTTP status code
|
||||||
|
const statusCode = error.statusCode || error.code;
|
||||||
|
const errorMessage = (error.message || '').toLowerCase();
|
||||||
|
|
||||||
|
const providerPatterns = {
|
||||||
|
openai: {
|
||||||
|
transient: [500, 502, 503, 504, 529],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: [401, 402],
|
||||||
|
permission: 403,
|
||||||
|
userError: [400],
|
||||||
|
notFound: 404,
|
||||||
|
timeout: 408
|
||||||
|
},
|
||||||
|
anthropic: {
|
||||||
|
transient: [500, 529],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: 401,
|
||||||
|
permission: 403,
|
||||||
|
userError: [400, 413],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
openrouter: {
|
||||||
|
transient: [502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: [401, 402],
|
||||||
|
permission: 403,
|
||||||
|
userError: [400],
|
||||||
|
timeout: 408,
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
chutes: {
|
||||||
|
transient: [500, 502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: 401,
|
||||||
|
permission: 403,
|
||||||
|
userError: [400, 413],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
nvidia: {
|
||||||
|
transient: [500, 502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: 401,
|
||||||
|
permission: 403,
|
||||||
|
userError: [400],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
together: {
|
||||||
|
transient: [500, 502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: [401, 402],
|
||||||
|
permission: 403,
|
||||||
|
userError: [400],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
fireworks: {
|
||||||
|
transient: [500, 502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: 401,
|
||||||
|
userError: [400],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
mistral: {
|
||||||
|
transient: [500, 502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: 401,
|
||||||
|
permission: 403,
|
||||||
|
userError: [400],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
groq: {
|
||||||
|
transient: [500, 502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: [401, 402],
|
||||||
|
permission: 403,
|
||||||
|
userError: [400, 413],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
google: {
|
||||||
|
transient: [500, 502, 503],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: 401,
|
||||||
|
permission: 403,
|
||||||
|
userError: [400, 413],
|
||||||
|
notFound: 404
|
||||||
|
},
|
||||||
|
default: {
|
||||||
|
transient: [500, 502, 503, 529],
|
||||||
|
rateLimit: 429,
|
||||||
|
auth: [401, 402],
|
||||||
|
permission: 403,
|
||||||
|
userError: [400, 413],
|
||||||
|
notFound: 404
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
const patterns = providerPatterns[provider] || providerPatterns.default;
|
||||||
|
|
||||||
|
// Check for tool errors first (shouldn't happen here but just in case)
|
||||||
|
if (error.isToolError) {
|
||||||
|
return { category: 'toolError', action: 'return', waitTime: 0 };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Determine category based on status code
|
||||||
|
if (patterns.transient?.includes(statusCode)) {
|
||||||
|
return { category: 'transient', action: 'wait', waitTime: 30000 };
|
||||||
|
}
|
||||||
|
if (statusCode === patterns.rateLimit) {
|
||||||
|
return { category: 'rateLimit', action: 'wait', waitTime: 30000 };
|
||||||
|
}
|
||||||
|
if (patterns.auth?.includes(statusCode)) {
|
||||||
|
return { category: 'auth', action: 'switch', waitTime: 0 };
|
||||||
|
}
|
||||||
|
if (statusCode === patterns.permission) {
|
||||||
|
return { category: 'permission', action: 'return', waitTime: 0 };
|
||||||
|
}
|
||||||
|
if (patterns.userError?.includes(statusCode)) {
|
||||||
|
return { category: 'userError', action: 'return', waitTime: 0 };
|
||||||
|
}
|
||||||
|
if (statusCode === patterns.timeout) {
|
||||||
|
return { category: 'timeout', action: 'wait', waitTime: 30000 };
|
||||||
|
}
|
||||||
|
if (statusCode === patterns.notFound) {
|
||||||
|
// Special case: OpenAI treats 404 as retryable
|
||||||
|
return { category: 'notFound', action: 'wait', waitTime: 30000 };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Default to transient for 5xx
|
||||||
|
if (statusCode >= 500) {
|
||||||
|
return { category: 'serverError', action: 'wait', waitTime: 30000 };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check error message for additional patterns
|
||||||
|
if (errorMessage.includes('model not found') || errorMessage.includes('unknown model')) {
|
||||||
|
return { category: 'modelNotFound', action: 'wait', waitTime: 30000 };
|
||||||
|
}
|
||||||
|
if (errorMessage.includes('insufficient credit') || errorMessage.includes('insufficient quota')) {
|
||||||
|
return { category: 'billing', action: 'switch', waitTime: 0 };
|
||||||
|
}
|
||||||
|
if (errorMessage.includes('context length exceeded') || errorMessage.includes('token limit exceeded')) {
|
||||||
|
return { category: 'userError', action: 'return', waitTime: 0 };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unknown error - switch immediately
|
||||||
|
return { category: 'unknown', action: 'switch', waitTime: 0 };
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: Tool Error Handling
|
||||||
|
|
||||||
|
**File: `opencode/packages/opencode/src/session/processor.ts`**
|
||||||
|
|
||||||
|
Enhance tool-error handling to distinguish between tool error types:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Add tool error type classification
|
||||||
|
enum ToolErrorType {
|
||||||
|
validation = 'validation',
|
||||||
|
permission = 'permission',
|
||||||
|
timeout = 'timeout',
|
||||||
|
notFound = 'notFound',
|
||||||
|
execution = 'execution'
|
||||||
|
}
|
||||||
|
|
||||||
|
function classifyToolError(error: unknown): ToolErrorType {
|
||||||
|
const message = String(error).toLowerCase();
|
||||||
|
|
||||||
|
if (message.includes('validation') || message.includes('schema') || message.includes('invalid arguments')) {
|
||||||
|
return ToolErrorType.validation;
|
||||||
|
}
|
||||||
|
if (message.includes('permission') || message.includes('forbidden') || message.includes('denied')) {
|
||||||
|
return ToolErrorType.permission;
|
||||||
|
}
|
||||||
|
if (message.includes('timeout') || message.includes('timed out')) {
|
||||||
|
return ToolErrorType.timeout;
|
||||||
|
}
|
||||||
|
if (message.includes('not found') || message.includes('does not exist')) {
|
||||||
|
return ToolErrorType.notFound;
|
||||||
|
}
|
||||||
|
return ToolErrorType.execution;
|
||||||
|
}
|
||||||
|
|
||||||
|
// In the switch case for "tool-error"
|
||||||
|
case "tool-error": {
|
||||||
|
const match = toolcalls[value.toolCallId];
|
||||||
|
if (match && match.state.status === "running") {
|
||||||
|
await Session.updatePart({
|
||||||
|
...match,
|
||||||
|
state: {
|
||||||
|
status: "error",
|
||||||
|
input: value.input ?? match.state.input,
|
||||||
|
error: (value.error as any).toString(),
|
||||||
|
errorType: classifyToolError(value.error),
|
||||||
|
time: {
|
||||||
|
start: match.state.time.start,
|
||||||
|
end: Date.now(),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
})
|
||||||
|
|
||||||
|
// Don't trigger fallback for tool errors - let model retry
|
||||||
|
// Only trigger fallback for permission rejections
|
||||||
|
if (
|
||||||
|
value.error instanceof PermissionNext.RejectedError ||
|
||||||
|
value.error instanceof Question.RejectedError
|
||||||
|
) {
|
||||||
|
blocked = shouldBreak
|
||||||
|
}
|
||||||
|
|
||||||
|
// Mark that this was a tool error (not provider error)
|
||||||
|
(value.error as any).isToolError = true;
|
||||||
|
|
||||||
|
delete toolcalls[value.toolCallId]
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**File: `chat/server.js`**
|
||||||
|
|
||||||
|
Modify `shouldFallbackCliError()` to check for tool errors:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
function shouldFallbackCliError(err, message) {
|
||||||
|
if (!err) return false;
|
||||||
|
|
||||||
|
// Don't fallback on tool errors - let model retry
|
||||||
|
if (err.isToolError) {
|
||||||
|
log('Tool error detected - no fallback needed', {
|
||||||
|
error: err.message,
|
||||||
|
toolError: true
|
||||||
|
});
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ... rest of existing checks
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 3: Continue Message System
|
||||||
|
|
||||||
|
Enhance `sendToOpencodeWithFallback()` with continue message tracking and provider error handling:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
async function sendToOpencodeWithFallback({ session, model, content, message, cli, streamCallback, opencodeSessionId, plan }) {
|
||||||
|
const cliName = normalizeCli(cli || session?.cli);
|
||||||
|
const preferredModel = model || session?.model;
|
||||||
|
const chain = buildOpencodeAttemptChain(cliName, preferredModel);
|
||||||
|
const tried = new Set();
|
||||||
|
const attempts = [];
|
||||||
|
let lastError = null;
|
||||||
|
let switchedToBackup = false;
|
||||||
|
|
||||||
|
// Track continue attempts per model
|
||||||
|
const continueAttempts = new Map();
|
||||||
|
const MAX_CONTINUE_ATTEMPTS = 3;
|
||||||
|
const CONTINUE_MESSAGE = '[CONTINUE] Please continue from where you left off.';
|
||||||
|
|
||||||
|
// Track last error type to prevent infinite loops
|
||||||
|
const lastErrorTypes = new Map();
|
||||||
|
|
||||||
|
log('Fallback sequence initiated', {
|
||||||
|
sessionId: session?.id,
|
||||||
|
messageId: message?.id,
|
||||||
|
primaryModel: preferredModel,
|
||||||
|
cliName,
|
||||||
|
chainLength: chain.length,
|
||||||
|
timestamp: new Date().toISOString()
|
||||||
|
});
|
||||||
|
|
||||||
|
const tryOption = async (option, isBackup = false) => {
|
||||||
|
const key = `${option.provider}:${option.model}`;
|
||||||
|
if (tried.has(key)) return null;
|
||||||
|
tried.add(key);
|
||||||
|
|
||||||
|
const limit = isProviderLimited(option.provider, option.model);
|
||||||
|
if (limit.limited) {
|
||||||
|
attempts.push({
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
error: `limit: ${limit.reason}`,
|
||||||
|
classification: 'rateLimit'
|
||||||
|
});
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
resetMessageStreamingFields(message);
|
||||||
|
|
||||||
|
// Handle continue messages
|
||||||
|
let messageContent = content;
|
||||||
|
const modelKey = `${option.provider}:${option.model}`;
|
||||||
|
const continueCount = continueAttempts.get(modelKey) || 0;
|
||||||
|
|
||||||
|
if (continueCount > 0 && continueCount <= MAX_CONTINUE_ATTEMPTS) {
|
||||||
|
messageContent = `${CONTINUE_MESSAGE}\n\n${content}`;
|
||||||
|
log('Sending continue message', {
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
attempt: continueCount,
|
||||||
|
modelKey
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
const result = await sendToOpencode({
|
||||||
|
session,
|
||||||
|
model: option.model,
|
||||||
|
content: messageContent,
|
||||||
|
message,
|
||||||
|
cli: cliName,
|
||||||
|
streamCallback,
|
||||||
|
opencodeSessionId
|
||||||
|
});
|
||||||
|
|
||||||
|
const normalizedResult = (result && typeof result === 'object') ? result : { reply: result };
|
||||||
|
|
||||||
|
// Token usage tracking (existing code)
|
||||||
|
let tokensUsed = 0;
|
||||||
|
let tokenSource = 'none';
|
||||||
|
let tokenExtractionLog = [];
|
||||||
|
|
||||||
|
if (result && typeof result === 'object' && result.tokensUsed > 0) {
|
||||||
|
tokensUsed = result.tokensUsed;
|
||||||
|
tokenSource = result.tokenSource || 'result';
|
||||||
|
tokenExtractionLog = result.tokenExtractionLog || [];
|
||||||
|
} else {
|
||||||
|
tokensUsed = extractTokenUsageFromResult(normalizedResult, [messageContent], { allowEstimate: false });
|
||||||
|
if (tokensUsed > 0) {
|
||||||
|
tokenSource = 'response-extracted';
|
||||||
|
tokenExtractionLog.push({ method: 'extractTokenUsageFromResult', success: true, value: tokensUsed });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Success: reset counters
|
||||||
|
continueAttempts.delete(modelKey);
|
||||||
|
lastErrorTypes.delete(modelKey);
|
||||||
|
|
||||||
|
recordProviderUsage(option.provider, option.model, tokensUsed, 1);
|
||||||
|
|
||||||
|
if (attempts.length) {
|
||||||
|
log('opencode succeeded after fallback', { attempts, model: option.model, provider: option.provider });
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
reply: normalizedResult.reply,
|
||||||
|
model: option.model,
|
||||||
|
attempts,
|
||||||
|
provider: option.provider,
|
||||||
|
raw: normalizedResult.raw,
|
||||||
|
tokensUsed,
|
||||||
|
tokenSource,
|
||||||
|
tokenExtractionLog
|
||||||
|
};
|
||||||
|
|
||||||
|
} catch (err) {
|
||||||
|
lastError = err;
|
||||||
|
|
||||||
|
const errorData = {
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
error: err.message || String(err),
|
||||||
|
code: err.code || null,
|
||||||
|
timestamp: new Date().toISOString()
|
||||||
|
};
|
||||||
|
|
||||||
|
// Check for early termination
|
||||||
|
if (err.earlyTermination) {
|
||||||
|
const partialOutputLength = (message?.partialOutput || '').length;
|
||||||
|
const hasSubstantialOutput = partialOutputLength > 500;
|
||||||
|
|
||||||
|
if (hasSubstantialOutput) {
|
||||||
|
log('Blocking fallback - model has substantial output despite early termination', {
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
error: err.message,
|
||||||
|
partialOutputLength
|
||||||
|
});
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Increment continue counter
|
||||||
|
const modelKey = `${option.provider}:${option.model}`;
|
||||||
|
const currentCount = continueAttempts.get(modelKey) || 0;
|
||||||
|
continueAttempts.set(modelKey, currentCount + 1);
|
||||||
|
|
||||||
|
log('Early termination detected', {
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
continueAttempt: currentCount + 1,
|
||||||
|
maxAttempts: MAX_CONTINUE_ATTEMPTS
|
||||||
|
});
|
||||||
|
|
||||||
|
// Retry with same model if under limit
|
||||||
|
if (currentCount + 1 < MAX_CONTINUE_ATTEMPTS) {
|
||||||
|
errorData.earlyTermination = true;
|
||||||
|
errorData.continueAttempt = currentCount + 1;
|
||||||
|
errorData.willContinue = true;
|
||||||
|
attempts.push(errorData);
|
||||||
|
|
||||||
|
// Remove from tried set to allow retry with same option
|
||||||
|
tried.delete(key);
|
||||||
|
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Switch to next model after MAX_CONTINUE_ATTEMPTS
|
||||||
|
log('Max continue attempts reached, switching model', {
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
totalAttempts: MAX_CONTINUE_ATTEMPTS
|
||||||
|
});
|
||||||
|
|
||||||
|
attempts.push(errorData);
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Classify provider error
|
||||||
|
const classification = classifyProviderError(err, option.provider);
|
||||||
|
errorData.classification = classification.category;
|
||||||
|
|
||||||
|
// Track error types to prevent infinite loops
|
||||||
|
const modelKey = `${option.provider}:${option.model}`;
|
||||||
|
const lastErrorType = lastErrorTypes.get(modelKey);
|
||||||
|
|
||||||
|
if (lastErrorType === classification.category &&
|
||||||
|
classification.category !== 'unknown') {
|
||||||
|
// Same error type twice in a row - might be persistent error
|
||||||
|
log('Repeated error type detected, may need immediate switch', {
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
errorType: classification.category
|
||||||
|
});
|
||||||
|
lastErrorTypes.set(modelKey, classification.category);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (classification.action === 'return') {
|
||||||
|
// User/permission errors - return to user
|
||||||
|
log('User/permission error - returning to user', {
|
||||||
|
category: classification.category,
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider
|
||||||
|
});
|
||||||
|
err.willNotFallback = true;
|
||||||
|
return err;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (classification.action === 'wait') {
|
||||||
|
// Transient/rate limit errors - wait before switch
|
||||||
|
log(`Provider error (${classification.category}) - waiting ${classification.waitTime}ms`, {
|
||||||
|
model: option.model,
|
||||||
|
provider: option.provider,
|
||||||
|
category: classification.category,
|
||||||
|
waitTime: classification.waitTime
|
||||||
|
});
|
||||||
|
|
||||||
|
errorData.willWait = true;
|
||||||
|
errorData.waitTime = classification.waitTime;
|
||||||
|
attempts.push(errorData);
|
||||||
|
|
||||||
|
// Wait before allowing next attempt
|
||||||
|
await new Promise(resolve => setTimeout(resolve, classification.waitTime));
|
||||||
|
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Switch immediately for auth/unknown errors
|
||||||
|
errorData.immediateSwitch = true;
|
||||||
|
attempts.push(errorData);
|
||||||
|
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// Try each option in chain
|
||||||
|
for (const option of chain) {
|
||||||
|
const result = await tryOption(option);
|
||||||
|
if (result instanceof Error) break;
|
||||||
|
if (result) return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try backup model if configured
|
||||||
|
const backupModel = (providerLimits.opencodeBackupModel || '').trim();
|
||||||
|
if (backupModel) {
|
||||||
|
const backupChain = buildOpencodeAttemptChain(cliName, backupModel);
|
||||||
|
for (const option of backupChain) {
|
||||||
|
const result = await tryOption(option, true);
|
||||||
|
if (result instanceof Error) break;
|
||||||
|
if (result) return result;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const err = new Error(`All ${cliName.toUpperCase()} models failed`);
|
||||||
|
err.attempts = attempts;
|
||||||
|
err.cause = lastError;
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary of Files Modified
|
||||||
|
|
||||||
|
1. **`MODEL_FALLBACK_IMPROVEMENT_PLAN.md`** (NEW) - This comprehensive plan document
|
||||||
|
2. **`chat/server.js`** - Major modifications:
|
||||||
|
- Add `classifyProviderError()` function (~120 lines)
|
||||||
|
- Modify `sendToOpencodeWithFallback()` with continue message logic
|
||||||
|
- Update `shouldFallbackCliError()` to handle tool errors
|
||||||
|
|
||||||
|
3. **`opencode/packages/opencode/src/session/processor.ts`** - Minor modifications:
|
||||||
|
- Add `classifyToolError()` function
|
||||||
|
- Enhance tool-error case with error type classification
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Checklist
|
||||||
|
|
||||||
|
- [ ] Tool errors don't trigger fallback
|
||||||
|
- [ ] Early termination sends continue messages (max 3 attempts)
|
||||||
|
- [ ] Provider errors with >=500 status wait 30s before switch
|
||||||
|
- [ ] Rate limit errors (429) wait 30s before switch
|
||||||
|
- [ ] Auth errors (401, 402) switch immediately
|
||||||
|
- [ ] Permission errors (403) return to user without switch
|
||||||
|
- [ ] User errors (400, 413) return to user without switch
|
||||||
|
- [ ] Continue attempts reset on successful response
|
||||||
|
- [ ] Fallback chain respects continue attempts per model
|
||||||
|
- [ ] Logging captures all error classifications and actions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring & Analytics
|
||||||
|
|
||||||
|
Add tracking for:
|
||||||
|
1. Error type distribution
|
||||||
|
2. Continue message frequency
|
||||||
|
3. Provider error wait times
|
||||||
|
4. Model switch patterns
|
||||||
|
5. Tool error vs provider error ratios
|
||||||
|
|
||||||
|
Export to monitoring system for analysis and optimization.
|
||||||
Reference in New Issue
Block a user