Documents the issue where tool calls count as separate Chutes AI requests, proposed solutions, technical analysis, and user concerns about breaking sequential workflows. Includes: - Root cause analysis of Vercel AI SDK multi-step execution - 4 proposed solution options with pros/cons - User concerns about model context and workflow breaks - Code references and technical diagrams - Recommended next steps for testing and implementation Relates to: Tool call execution flow in session management
14 KiB
Chutes AI Tool Call Counting Issue - Analysis and Plan
Status: Under Investigation
Date: February 11, 2026
Related Files:
opencode/packages/opencode/src/session/llm.ts(lines 87-167)opencode/packages/opencode/src/session/processor.tsopencode/packages/opencode/src/session/prompt.ts(lines 716-745, 755-837)
Executive Summary
When using Chutes AI with opencode, tool calls are counting as separate API requests, causing excessive billing. This is due to the Vercel AI SDK's streamText function automatically handling multi-step tool execution.
The Problem
Current Behavior (Undesired)
The Vercel AI SDK's streamText function automatically executes tools and sends results back to the model in multiple HTTP requests:
Step 1: Initial Request
- Model receives prompt + available tools
- Model returns: "read file X"
- SDK executes tool locally
Step 2: Automatic Follow-up Request (NEW API CALL!)
- SDK sends tool results back to model
- Model returns: "edit file X with changes"
- SDK executes tool locally
Step 3: Automatic Follow-up Request (NEW API CALL!)
- SDK sends tool results back to model
- Model returns final response
- Stream ends
Result: Each step after the first counts as a separate Chutes AI API request, multiplying costs.
Root Cause
In llm.ts lines 87-167, streamText is called with tools that include execute functions:
// prompt.ts lines 716-745
const result = await item.execute(args, ctx) // <-- Tools have execute functions
When tools have execute functions, the SDK automatically:
- Executes the tools when the model requests them
- Sends the results back to the model in a new API request
- Continues this process for multiple steps
User Concerns (Critical Issues)
Concern #1: Model Won't See Tool Results in Time
Issue: If we limit to maxSteps: 1, the model will:
- Call "read file"
- SDK executes it
- SDK STOPS (doesn't send results back)
- Model never sees the file contents to make edit decisions
Impact: Breaks sequential workflows like read→edit.
Concern #2: Model Can't Do Multiple Tool Calls
Issue: Will the model be limited to only one tool call per session/iteration?
Impact: Complex multi-step tasks become impossible.
Concern #3: Session Completion Timing
Issue: Will tool results only be available after the entire session finishes?
Impact: Model can't react to tool outputs in real-time.
Technical Analysis
How Tool Execution Currently Works
-
Opencode's Outer Loop (
prompt.ts:282):while (true) { // Each iteration is one "step" const tools = await resolveTools({...}) const result = await processor.process({tools, ...}) } -
SDK's Internal Multi-Step (
llm.ts:87):const result = streamText({ tools, // Tools with execute functions // No maxSteps or stopWhen parameter! }) -
Processor Handles Events (
processor.ts:94):tool-input-start: Tool call beginstool-call: Tool is calledtool-result: Tool execution completesfinish-step: Step ends
Message Flow
Current Flow (SDK Multi-Step):
┌─────────────┐ ┌──────────┐ ┌──────────────┐
│ Opencode │────▶│ SDK │────▶│ Chutes AI │
│ Loop │ │ streamText│ │ API Request 1│
└─────────────┘ └──────────┘ └──────────────┘
│ │
│ ▼
│ ┌──────────────┐
│ │ Model decides│
│ │ to call tools│
│ └──────────────┘
│ │
▼ │
┌──────────┐ │
│ Executes │ │
│ tools │ │
└──────────┘ │
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Chutes AI │◀────│ SDK sends │
│ API Request 2│ │ tool results │
└──────────────┘ └──────────────┘
Proposed Flow (maxSteps: 1):
┌─────────────┐ ┌──────────┐ ┌──────────────┐
│ Opencode │────▶│ SDK │────▶│ Chutes AI │
│ Loop │ │ streamText│ │ API Request 1│
└─────────────┘ │ (maxSteps:1)│ └──────────────┘
▲ └──────────┘ │
│ │ │
│ ▼ ▼
│ ┌──────────┐ ┌──────────────┐
│ │ Executes │ │ Model decides│
│ │ tools │ │ to call tools│
│ └──────────┘ └──────────────┘
│ │ │
│ │ ┌──────┘
│ │ │ SDK STOPS
│ ▼ │ (doesn't send
│ ┌──────────────┐ │ results back)
│ │ Tool results │ │
│ │ stored in │ │
│ │ opencode │ │
│ └──────────────┘ │
│ │ │
└───────────────────┴─────────────┘
│
Next opencode loop iteration
Tool results included in messages
│
▼
┌──────────────────────────────┐
│ Model now sees tool results │
│ and can make next decision │
└──────────────────────────────┘
Proposed Solutions
Option A: Add maxSteps: 1 Parameter (Quick Fix)
Change: Add to llm.ts:87:
import { stepCountIs } from 'ai'
const result = streamText({
// ... existing options
stopWhen: stepCountIs(1),
// ...
})
Pros:
- Prevents SDK from making multiple LLM calls internally
- Each
streamText()call = exactly 1 Chutes AI request - Opencode's outer loop handles iterations with full control
Cons:
- MAY BREAK SEQUENTIAL WORKFLOWS: Model won't see tool results until next opencode loop iteration
- Tool execution still happens but results aren't automatically fed back
Risk Level: HIGH - May break multi-step tool workflows
Option B: Remove execute Functions from Tools
Change: Modify prompt.ts to pass tools WITHOUT execute functions:
// Instead of:
tools[item.id] = tool({
execute: async (args, options) => { ... } // Remove this
})
// Use:
tools[item.id] = tool({
description: item.description,
inputSchema: jsonSchema(schema as any),
// NO execute function - SDK won't auto-execute
})
Then manually execute tools in processor.ts when tool-call events are received.
Pros:
- SDK never automatically executes tools
- Full control over execution flow
- No hidden API requests
Cons:
- Requires significant refactoring of tool handling
- Need to manually implement tool execution loop
- Risk of introducing bugs
Risk Level: MEDIUM-HIGH - Requires substantial code changes
Option C: Provider-Specific Configuration for Chutes
Change: Detect Chutes provider and apply special handling:
const result = streamText({
// ... existing options
...(input.model.providerID === 'chutes' && {
stopWhen: stepCountIs(1),
}),
// ...
})
Pros:
- Only affects Chutes AI, other providers work as before
- Minimal code changes
- Can test specifically with Chutes
Cons:
- Still has the same risks as Option A
- Provider-specific code adds complexity
Risk Level: MEDIUM - Targeted fix but still risky
Option D: Keep Current Behavior + Documentation
Change: None - just document the behavior
Pros:
- No code changes = no risk of breaking anything
- Works correctly for sequential workflows
Cons:
- Chutes AI users pay for multiple requests
- Not a real solution
Risk Level: NONE - But doesn't solve the problem
Key Findings
-
SDK Version: Using
ai@5.0.124(from root package.json line 43) -
Default Behavior: According to docs,
stopWhendefaults tostepCountIs(1), but it's not explicitly set in the code, and the behavior suggests multi-step is enabled -
Tool Execution: Even with
maxSteps: 1, tools WILL still execute because they haveexecutefunctions - the SDK just won't automatically send results back to the model -
Message Conversion:
MessageV2.toModelMessages()(line 656 in message-v2.ts) already handles converting tool results back to model messages for the next iteration -
Opencode's Loop: The outer
while (true)loop inprompt.ts:282manages the conversation flow and WILL include tool results in the next iteration's messages
Critical Questions to Resolve
-
Does the model ACTUALLY lose context with
maxSteps: 1?- Theory: SDK executes tools, stores results, opencode loop includes them in next iteration
- Need to verify: Does the model see results in time to make sequential decisions?
-
What happens to parallel tool calls?
- If model calls 3 tools at once, will they all execute before next iteration?
- Or will opencode's loop serialize them?
-
How does this affect Chutes AI billing specifically?
- Does Chutes count: (a) HTTP requests, (b) tokens, or (c) conversation steps?
- If (a), then
maxSteps: 1definitely helps - If (b) or (c), may not help as much
-
Can we test without affecting production?
- Need a test environment or feature flag
- Should A/B test with different providers
Recommended Next Steps
- Create a Test Branch: Implement Option A (
maxSteps: 1) in isolation - Test Sequential Workflows: Verify read→edit workflows still work
- Monitor Request Count: Log actual HTTP requests to Chutes API
- Measure Latency: Check if response times change significantly
- Test Parallel Tool Calls: Ensure multiple tools in one step work correctly
- Document Behavior: Update documentation to explain the flow
- Consider Option B: If Option A breaks workflows, implement manual tool execution
Code References
streamText Call (llm.ts:87-167)
const result = streamText({
onError(error) { ... },
async experimental_repairToolCall(failed) { ... },
temperature: params.temperature,
topP: params.topP,
topK: params.topK,
providerOptions: ProviderTransform.providerOptions(input.model, params.options),
activeTools: Object.keys(tools).filter((x) => x !== "invalid"),
tools, // <-- These have execute functions!
maxOutputTokens,
abortSignal: input.abort,
headers: { ... },
maxRetries: 0,
messages: [ ... ],
model: wrapLanguageModel({ ... }),
experimental_telemetry: { ... },
// MISSING: maxSteps or stopWhen parameter!
})
Tool Definition with Execute (prompt.ts:716-745)
tools[item.id] = tool({
id: item.id as any,
description: item.description,
inputSchema: jsonSchema(schema as any),
async execute(args, options) {
const ctx = context(args, options)
await Plugin.trigger("tool.execute.before", ...)
const result = await item.execute(args, ctx) // <-- Execute function!
await Plugin.trigger("tool.execute.after", ...)
return result
},
})
Opencode's Outer Loop (prompt.ts:282)
while (true) {
SessionStatus.set(sessionID, { type: "busy" })
let step = 0
step++
const tools = await resolveTools({...})
const result = await processor.process({
tools,
model,
// ...
})
if (result === "stop") break
}
Conclusion
The issue is confirmed: the Vercel AI SDK's automatic multi-step execution causes multiple Chutes AI API requests per conversation turn. The proposed fix of adding maxSteps: 1 or stopWhen: stepCountIs(1) would reduce this to one request per opencode loop iteration.
However, the user's concerns about breaking sequential workflows are valid and need thorough testing before implementation. The recommended approach is to create a test branch, verify all workflow types, and then decide on the best solution.
Priority: HIGH
Effort: LOW for Option A, HIGH for Option B
Risk: MEDIUM-HIGH (may break existing workflows)