docs: Add comprehensive plan for Chutes AI tool call counting issue
Documents the issue where tool calls count as separate Chutes AI requests, proposed solutions, technical analysis, and user concerns about breaking sequential workflows. Includes: - Root cause analysis of Vercel AI SDK multi-step execution - 4 proposed solution options with pros/cons - User concerns about model context and workflow breaks - Code references and technical diagrams - Recommended next steps for testing and implementation Relates to: Tool call execution flow in session management
This commit is contained in:
392
opencode/CHUTES_AI_TOOL_CALL_ISSUE_PLAN.md
Normal file
392
opencode/CHUTES_AI_TOOL_CALL_ISSUE_PLAN.md
Normal file
@@ -0,0 +1,392 @@
|
||||
# Chutes AI Tool Call Counting Issue - Analysis and Plan
|
||||
|
||||
**Status:** Under Investigation
|
||||
**Date:** February 11, 2026
|
||||
**Related Files:**
|
||||
- `opencode/packages/opencode/src/session/llm.ts` (lines 87-167)
|
||||
- `opencode/packages/opencode/src/session/processor.ts`
|
||||
- `opencode/packages/opencode/src/session/prompt.ts` (lines 716-745, 755-837)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
When using Chutes AI with opencode, tool calls are counting as separate API requests, causing excessive billing. This is due to the Vercel AI SDK's `streamText` function automatically handling multi-step tool execution.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
### Current Behavior (Undesired)
|
||||
|
||||
The Vercel AI SDK's `streamText` function automatically executes tools and sends results back to the model in **multiple HTTP requests**:
|
||||
|
||||
```
|
||||
Step 1: Initial Request
|
||||
- Model receives prompt + available tools
|
||||
- Model returns: "read file X"
|
||||
- SDK executes tool locally
|
||||
|
||||
Step 2: Automatic Follow-up Request (NEW API CALL!)
|
||||
- SDK sends tool results back to model
|
||||
- Model returns: "edit file X with changes"
|
||||
- SDK executes tool locally
|
||||
|
||||
Step 3: Automatic Follow-up Request (NEW API CALL!)
|
||||
- SDK sends tool results back to model
|
||||
- Model returns final response
|
||||
- Stream ends
|
||||
```
|
||||
|
||||
**Result:** Each step after the first counts as a separate Chutes AI API request, multiplying costs.
|
||||
|
||||
### Root Cause
|
||||
|
||||
In `llm.ts` lines 87-167, `streamText` is called with tools that include `execute` functions:
|
||||
|
||||
```typescript
|
||||
// prompt.ts lines 716-745
|
||||
const result = await item.execute(args, ctx) // <-- Tools have execute functions
|
||||
```
|
||||
|
||||
When tools have `execute` functions, the SDK automatically:
|
||||
1. Executes the tools when the model requests them
|
||||
2. Sends the results back to the model in a **new API request**
|
||||
3. Continues this process for multiple steps
|
||||
|
||||
---
|
||||
|
||||
## User Concerns (Critical Issues)
|
||||
|
||||
### Concern #1: Model Won't See Tool Results in Time
|
||||
|
||||
**Issue:** If we limit to `maxSteps: 1`, the model will:
|
||||
1. Call "read file"
|
||||
2. SDK executes it
|
||||
3. SDK STOPS (doesn't send results back)
|
||||
4. Model never sees the file contents to make edit decisions
|
||||
|
||||
**Impact:** Breaks sequential workflows like read→edit.
|
||||
|
||||
### Concern #2: Model Can't Do Multiple Tool Calls
|
||||
|
||||
**Issue:** Will the model be limited to only one tool call per session/iteration?
|
||||
|
||||
**Impact:** Complex multi-step tasks become impossible.
|
||||
|
||||
### Concern #3: Session Completion Timing
|
||||
|
||||
**Issue:** Will tool results only be available after the entire session finishes?
|
||||
|
||||
**Impact:** Model can't react to tool outputs in real-time.
|
||||
|
||||
---
|
||||
|
||||
## Technical Analysis
|
||||
|
||||
### How Tool Execution Currently Works
|
||||
|
||||
1. **Opencode's Outer Loop** (`prompt.ts:282`):
|
||||
```typescript
|
||||
while (true) {
|
||||
// Each iteration is one "step"
|
||||
const tools = await resolveTools({...})
|
||||
const result = await processor.process({tools, ...})
|
||||
}
|
||||
```
|
||||
|
||||
2. **SDK's Internal Multi-Step** (`llm.ts:87`):
|
||||
```typescript
|
||||
const result = streamText({
|
||||
tools, // Tools with execute functions
|
||||
// No maxSteps or stopWhen parameter!
|
||||
})
|
||||
```
|
||||
|
||||
3. **Processor Handles Events** (`processor.ts:94`):
|
||||
- `tool-input-start`: Tool call begins
|
||||
- `tool-call`: Tool is called
|
||||
- `tool-result`: Tool execution completes
|
||||
- `finish-step`: Step ends
|
||||
|
||||
### Message Flow
|
||||
|
||||
**Current Flow (SDK Multi-Step):**
|
||||
```
|
||||
┌─────────────┐ ┌──────────┐ ┌──────────────┐
|
||||
│ Opencode │────▶│ SDK │────▶│ Chutes AI │
|
||||
│ Loop │ │ streamText│ │ API Request 1│
|
||||
└─────────────┘ └──────────┘ └──────────────┘
|
||||
│ │
|
||||
│ ▼
|
||||
│ ┌──────────────┐
|
||||
│ │ Model decides│
|
||||
│ │ to call tools│
|
||||
│ └──────────────┘
|
||||
│ │
|
||||
▼ │
|
||||
┌──────────┐ │
|
||||
│ Executes │ │
|
||||
│ tools │ │
|
||||
└──────────┘ │
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ Chutes AI │◀────│ SDK sends │
|
||||
│ API Request 2│ │ tool results │
|
||||
└──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
**Proposed Flow (maxSteps: 1):**
|
||||
```
|
||||
┌─────────────┐ ┌──────────┐ ┌──────────────┐
|
||||
│ Opencode │────▶│ SDK │────▶│ Chutes AI │
|
||||
│ Loop │ │ streamText│ │ API Request 1│
|
||||
└─────────────┘ │ (maxSteps:1)│ └──────────────┘
|
||||
▲ └──────────┘ │
|
||||
│ │ │
|
||||
│ ▼ ▼
|
||||
│ ┌──────────┐ ┌──────────────┐
|
||||
│ │ Executes │ │ Model decides│
|
||||
│ │ tools │ │ to call tools│
|
||||
│ └──────────┘ └──────────────┘
|
||||
│ │ │
|
||||
│ │ ┌──────┘
|
||||
│ │ │ SDK STOPS
|
||||
│ ▼ │ (doesn't send
|
||||
│ ┌──────────────┐ │ results back)
|
||||
│ │ Tool results │ │
|
||||
│ │ stored in │ │
|
||||
│ │ opencode │ │
|
||||
│ └──────────────┘ │
|
||||
│ │ │
|
||||
└───────────────────┴─────────────┘
|
||||
│
|
||||
Next opencode loop iteration
|
||||
Tool results included in messages
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────┐
|
||||
│ Model now sees tool results │
|
||||
│ and can make next decision │
|
||||
└──────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Proposed Solutions
|
||||
|
||||
### Option A: Add `maxSteps: 1` Parameter (Quick Fix)
|
||||
|
||||
**Change:** Add to `llm.ts:87`:
|
||||
```typescript
|
||||
import { stepCountIs } from 'ai'
|
||||
|
||||
const result = streamText({
|
||||
// ... existing options
|
||||
stopWhen: stepCountIs(1),
|
||||
// ...
|
||||
})
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Prevents SDK from making multiple LLM calls internally
|
||||
- Each `streamText()` call = exactly 1 Chutes AI request
|
||||
- Opencode's outer loop handles iterations with full control
|
||||
|
||||
**Cons:**
|
||||
- **MAY BREAK SEQUENTIAL WORKFLOWS**: Model won't see tool results until next opencode loop iteration
|
||||
- Tool execution still happens but results aren't automatically fed back
|
||||
|
||||
**Risk Level:** HIGH - May break multi-step tool workflows
|
||||
|
||||
### Option B: Remove `execute` Functions from Tools
|
||||
|
||||
**Change:** Modify `prompt.ts` to pass tools WITHOUT `execute` functions:
|
||||
|
||||
```typescript
|
||||
// Instead of:
|
||||
tools[item.id] = tool({
|
||||
execute: async (args, options) => { ... } // Remove this
|
||||
})
|
||||
|
||||
// Use:
|
||||
tools[item.id] = tool({
|
||||
description: item.description,
|
||||
inputSchema: jsonSchema(schema as any),
|
||||
// NO execute function - SDK won't auto-execute
|
||||
})
|
||||
```
|
||||
|
||||
Then manually execute tools in `processor.ts` when `tool-call` events are received.
|
||||
|
||||
**Pros:**
|
||||
- SDK never automatically executes tools
|
||||
- Full control over execution flow
|
||||
- No hidden API requests
|
||||
|
||||
**Cons:**
|
||||
- Requires significant refactoring of tool handling
|
||||
- Need to manually implement tool execution loop
|
||||
- Risk of introducing bugs
|
||||
|
||||
**Risk Level:** MEDIUM-HIGH - Requires substantial code changes
|
||||
|
||||
### Option C: Provider-Specific Configuration for Chutes
|
||||
|
||||
**Change:** Detect Chutes provider and apply special handling:
|
||||
|
||||
```typescript
|
||||
const result = streamText({
|
||||
// ... existing options
|
||||
...(input.model.providerID === 'chutes' && {
|
||||
stopWhen: stepCountIs(1),
|
||||
}),
|
||||
// ...
|
||||
})
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Only affects Chutes AI, other providers work as before
|
||||
- Minimal code changes
|
||||
- Can test specifically with Chutes
|
||||
|
||||
**Cons:**
|
||||
- Still has the same risks as Option A
|
||||
- Provider-specific code adds complexity
|
||||
|
||||
**Risk Level:** MEDIUM - Targeted fix but still risky
|
||||
|
||||
### Option D: Keep Current Behavior + Documentation
|
||||
|
||||
**Change:** None - just document the behavior
|
||||
|
||||
**Pros:**
|
||||
- No code changes = no risk of breaking anything
|
||||
- Works correctly for sequential workflows
|
||||
|
||||
**Cons:**
|
||||
- Chutes AI users pay for multiple requests
|
||||
- Not a real solution
|
||||
|
||||
**Risk Level:** NONE - But doesn't solve the problem
|
||||
|
||||
---
|
||||
|
||||
## Key Findings
|
||||
|
||||
1. **SDK Version:** Using `ai@5.0.124` (from root package.json line 43)
|
||||
|
||||
2. **Default Behavior:** According to docs, `stopWhen` defaults to `stepCountIs(1)`, but it's not explicitly set in the code, and the behavior suggests multi-step is enabled
|
||||
|
||||
3. **Tool Execution:** Even with `maxSteps: 1`, tools WILL still execute because they have `execute` functions - the SDK just won't automatically send results back to the model
|
||||
|
||||
4. **Message Conversion:** `MessageV2.toModelMessages()` (line 656 in message-v2.ts) already handles converting tool results back to model messages for the next iteration
|
||||
|
||||
5. **Opencode's Loop:** The outer `while (true)` loop in `prompt.ts:282` manages the conversation flow and WILL include tool results in the next iteration's messages
|
||||
|
||||
---
|
||||
|
||||
## Critical Questions to Resolve
|
||||
|
||||
1. **Does the model ACTUALLY lose context with `maxSteps: 1`?**
|
||||
- Theory: SDK executes tools, stores results, opencode loop includes them in next iteration
|
||||
- Need to verify: Does the model see results in time to make sequential decisions?
|
||||
|
||||
2. **What happens to parallel tool calls?**
|
||||
- If model calls 3 tools at once, will they all execute before next iteration?
|
||||
- Or will opencode's loop serialize them?
|
||||
|
||||
3. **How does this affect Chutes AI billing specifically?**
|
||||
- Does Chutes count: (a) HTTP requests, (b) tokens, or (c) conversation steps?
|
||||
- If (a), then `maxSteps: 1` definitely helps
|
||||
- If (b) or (c), may not help as much
|
||||
|
||||
4. **Can we test without affecting production?**
|
||||
- Need a test environment or feature flag
|
||||
- Should A/B test with different providers
|
||||
|
||||
---
|
||||
|
||||
## Recommended Next Steps
|
||||
|
||||
1. **Create a Test Branch:** Implement Option A (`maxSteps: 1`) in isolation
|
||||
2. **Test Sequential Workflows:** Verify read→edit workflows still work
|
||||
3. **Monitor Request Count:** Log actual HTTP requests to Chutes API
|
||||
4. **Measure Latency:** Check if response times change significantly
|
||||
5. **Test Parallel Tool Calls:** Ensure multiple tools in one step work correctly
|
||||
6. **Document Behavior:** Update documentation to explain the flow
|
||||
7. **Consider Option B:** If Option A breaks workflows, implement manual tool execution
|
||||
|
||||
---
|
||||
|
||||
## Code References
|
||||
|
||||
### streamText Call (llm.ts:87-167)
|
||||
```typescript
|
||||
const result = streamText({
|
||||
onError(error) { ... },
|
||||
async experimental_repairToolCall(failed) { ... },
|
||||
temperature: params.temperature,
|
||||
topP: params.topP,
|
||||
topK: params.topK,
|
||||
providerOptions: ProviderTransform.providerOptions(input.model, params.options),
|
||||
activeTools: Object.keys(tools).filter((x) => x !== "invalid"),
|
||||
tools, // <-- These have execute functions!
|
||||
maxOutputTokens,
|
||||
abortSignal: input.abort,
|
||||
headers: { ... },
|
||||
maxRetries: 0,
|
||||
messages: [ ... ],
|
||||
model: wrapLanguageModel({ ... }),
|
||||
experimental_telemetry: { ... },
|
||||
// MISSING: maxSteps or stopWhen parameter!
|
||||
})
|
||||
```
|
||||
|
||||
### Tool Definition with Execute (prompt.ts:716-745)
|
||||
```typescript
|
||||
tools[item.id] = tool({
|
||||
id: item.id as any,
|
||||
description: item.description,
|
||||
inputSchema: jsonSchema(schema as any),
|
||||
async execute(args, options) {
|
||||
const ctx = context(args, options)
|
||||
await Plugin.trigger("tool.execute.before", ...)
|
||||
const result = await item.execute(args, ctx) // <-- Execute function!
|
||||
await Plugin.trigger("tool.execute.after", ...)
|
||||
return result
|
||||
},
|
||||
})
|
||||
```
|
||||
|
||||
### Opencode's Outer Loop (prompt.ts:282)
|
||||
```typescript
|
||||
while (true) {
|
||||
SessionStatus.set(sessionID, { type: "busy" })
|
||||
let step = 0
|
||||
step++
|
||||
|
||||
const tools = await resolveTools({...})
|
||||
|
||||
const result = await processor.process({
|
||||
tools,
|
||||
model,
|
||||
// ...
|
||||
})
|
||||
|
||||
if (result === "stop") break
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The issue is confirmed: the Vercel AI SDK's automatic multi-step execution causes multiple Chutes AI API requests per conversation turn. The proposed fix of adding `maxSteps: 1` or `stopWhen: stepCountIs(1)` would reduce this to one request per opencode loop iteration.
|
||||
|
||||
However, the user's concerns about breaking sequential workflows are valid and need thorough testing before implementation. The recommended approach is to create a test branch, verify all workflow types, and then decide on the best solution.
|
||||
|
||||
**Priority:** HIGH
|
||||
**Effort:** LOW for Option A, HIGH for Option B
|
||||
**Risk:** MEDIUM-HIGH (may break existing workflows)
|
||||
Reference in New Issue
Block a user