diff --git a/opencode/CHUTES_AI_TOOL_CALL_ISSUE_PLAN.md b/opencode/CHUTES_AI_TOOL_CALL_ISSUE_PLAN.md new file mode 100644 index 0000000..3727d26 --- /dev/null +++ b/opencode/CHUTES_AI_TOOL_CALL_ISSUE_PLAN.md @@ -0,0 +1,392 @@ +# Chutes AI Tool Call Counting Issue - Analysis and Plan + +**Status:** Under Investigation +**Date:** February 11, 2026 +**Related Files:** +- `opencode/packages/opencode/src/session/llm.ts` (lines 87-167) +- `opencode/packages/opencode/src/session/processor.ts` +- `opencode/packages/opencode/src/session/prompt.ts` (lines 716-745, 755-837) + +--- + +## Executive Summary + +When using Chutes AI with opencode, tool calls are counting as separate API requests, causing excessive billing. This is due to the Vercel AI SDK's `streamText` function automatically handling multi-step tool execution. + +--- + +## The Problem + +### Current Behavior (Undesired) + +The Vercel AI SDK's `streamText` function automatically executes tools and sends results back to the model in **multiple HTTP requests**: + +``` +Step 1: Initial Request +- Model receives prompt + available tools +- Model returns: "read file X" +- SDK executes tool locally + +Step 2: Automatic Follow-up Request (NEW API CALL!) +- SDK sends tool results back to model +- Model returns: "edit file X with changes" +- SDK executes tool locally + +Step 3: Automatic Follow-up Request (NEW API CALL!) +- SDK sends tool results back to model +- Model returns final response +- Stream ends +``` + +**Result:** Each step after the first counts as a separate Chutes AI API request, multiplying costs. + +### Root Cause + +In `llm.ts` lines 87-167, `streamText` is called with tools that include `execute` functions: + +```typescript +// prompt.ts lines 716-745 +const result = await item.execute(args, ctx) // <-- Tools have execute functions +``` + +When tools have `execute` functions, the SDK automatically: +1. Executes the tools when the model requests them +2. Sends the results back to the model in a **new API request** +3. Continues this process for multiple steps + +--- + +## User Concerns (Critical Issues) + +### Concern #1: Model Won't See Tool Results in Time + +**Issue:** If we limit to `maxSteps: 1`, the model will: +1. Call "read file" +2. SDK executes it +3. SDK STOPS (doesn't send results back) +4. Model never sees the file contents to make edit decisions + +**Impact:** Breaks sequential workflows like read→edit. + +### Concern #2: Model Can't Do Multiple Tool Calls + +**Issue:** Will the model be limited to only one tool call per session/iteration? + +**Impact:** Complex multi-step tasks become impossible. + +### Concern #3: Session Completion Timing + +**Issue:** Will tool results only be available after the entire session finishes? + +**Impact:** Model can't react to tool outputs in real-time. + +--- + +## Technical Analysis + +### How Tool Execution Currently Works + +1. **Opencode's Outer Loop** (`prompt.ts:282`): + ```typescript + while (true) { + // Each iteration is one "step" + const tools = await resolveTools({...}) + const result = await processor.process({tools, ...}) + } + ``` + +2. **SDK's Internal Multi-Step** (`llm.ts:87`): + ```typescript + const result = streamText({ + tools, // Tools with execute functions + // No maxSteps or stopWhen parameter! + }) + ``` + +3. **Processor Handles Events** (`processor.ts:94`): + - `tool-input-start`: Tool call begins + - `tool-call`: Tool is called + - `tool-result`: Tool execution completes + - `finish-step`: Step ends + +### Message Flow + +**Current Flow (SDK Multi-Step):** +``` +┌─────────────┐ ┌──────────┐ ┌──────────────┐ +│ Opencode │────▶│ SDK │────▶│ Chutes AI │ +│ Loop │ │ streamText│ │ API Request 1│ +└─────────────┘ └──────────┘ └──────────────┘ + │ │ + │ ▼ + │ ┌──────────────┐ + │ │ Model decides│ + │ │ to call tools│ + │ └──────────────┘ + │ │ + ▼ │ + ┌──────────┐ │ + │ Executes │ │ + │ tools │ │ + └──────────┘ │ + │ │ + ▼ ▼ + ┌──────────────┐ ┌──────────────┐ + │ Chutes AI │◀────│ SDK sends │ + │ API Request 2│ │ tool results │ + └──────────────┘ └──────────────┘ +``` + +**Proposed Flow (maxSteps: 1):** +``` +┌─────────────┐ ┌──────────┐ ┌──────────────┐ +│ Opencode │────▶│ SDK │────▶│ Chutes AI │ +│ Loop │ │ streamText│ │ API Request 1│ +└─────────────┘ │ (maxSteps:1)│ └──────────────┘ + ▲ └──────────┘ │ + │ │ │ + │ ▼ ▼ + │ ┌──────────┐ ┌──────────────┐ + │ │ Executes │ │ Model decides│ + │ │ tools │ │ to call tools│ + │ └──────────┘ └──────────────┘ + │ │ │ + │ │ ┌──────┘ + │ │ │ SDK STOPS + │ ▼ │ (doesn't send + │ ┌──────────────┐ │ results back) + │ │ Tool results │ │ + │ │ stored in │ │ + │ │ opencode │ │ + │ └──────────────┘ │ + │ │ │ + └───────────────────┴─────────────┘ + │ + Next opencode loop iteration + Tool results included in messages + │ + ▼ + ┌──────────────────────────────┐ + │ Model now sees tool results │ + │ and can make next decision │ + └──────────────────────────────┘ +``` + +--- + +## Proposed Solutions + +### Option A: Add `maxSteps: 1` Parameter (Quick Fix) + +**Change:** Add to `llm.ts:87`: +```typescript +import { stepCountIs } from 'ai' + +const result = streamText({ + // ... existing options + stopWhen: stepCountIs(1), + // ... +}) +``` + +**Pros:** +- Prevents SDK from making multiple LLM calls internally +- Each `streamText()` call = exactly 1 Chutes AI request +- Opencode's outer loop handles iterations with full control + +**Cons:** +- **MAY BREAK SEQUENTIAL WORKFLOWS**: Model won't see tool results until next opencode loop iteration +- Tool execution still happens but results aren't automatically fed back + +**Risk Level:** HIGH - May break multi-step tool workflows + +### Option B: Remove `execute` Functions from Tools + +**Change:** Modify `prompt.ts` to pass tools WITHOUT `execute` functions: + +```typescript +// Instead of: +tools[item.id] = tool({ + execute: async (args, options) => { ... } // Remove this +}) + +// Use: +tools[item.id] = tool({ + description: item.description, + inputSchema: jsonSchema(schema as any), + // NO execute function - SDK won't auto-execute +}) +``` + +Then manually execute tools in `processor.ts` when `tool-call` events are received. + +**Pros:** +- SDK never automatically executes tools +- Full control over execution flow +- No hidden API requests + +**Cons:** +- Requires significant refactoring of tool handling +- Need to manually implement tool execution loop +- Risk of introducing bugs + +**Risk Level:** MEDIUM-HIGH - Requires substantial code changes + +### Option C: Provider-Specific Configuration for Chutes + +**Change:** Detect Chutes provider and apply special handling: + +```typescript +const result = streamText({ + // ... existing options + ...(input.model.providerID === 'chutes' && { + stopWhen: stepCountIs(1), + }), + // ... +}) +``` + +**Pros:** +- Only affects Chutes AI, other providers work as before +- Minimal code changes +- Can test specifically with Chutes + +**Cons:** +- Still has the same risks as Option A +- Provider-specific code adds complexity + +**Risk Level:** MEDIUM - Targeted fix but still risky + +### Option D: Keep Current Behavior + Documentation + +**Change:** None - just document the behavior + +**Pros:** +- No code changes = no risk of breaking anything +- Works correctly for sequential workflows + +**Cons:** +- Chutes AI users pay for multiple requests +- Not a real solution + +**Risk Level:** NONE - But doesn't solve the problem + +--- + +## Key Findings + +1. **SDK Version:** Using `ai@5.0.124` (from root package.json line 43) + +2. **Default Behavior:** According to docs, `stopWhen` defaults to `stepCountIs(1)`, but it's not explicitly set in the code, and the behavior suggests multi-step is enabled + +3. **Tool Execution:** Even with `maxSteps: 1`, tools WILL still execute because they have `execute` functions - the SDK just won't automatically send results back to the model + +4. **Message Conversion:** `MessageV2.toModelMessages()` (line 656 in message-v2.ts) already handles converting tool results back to model messages for the next iteration + +5. **Opencode's Loop:** The outer `while (true)` loop in `prompt.ts:282` manages the conversation flow and WILL include tool results in the next iteration's messages + +--- + +## Critical Questions to Resolve + +1. **Does the model ACTUALLY lose context with `maxSteps: 1`?** + - Theory: SDK executes tools, stores results, opencode loop includes them in next iteration + - Need to verify: Does the model see results in time to make sequential decisions? + +2. **What happens to parallel tool calls?** + - If model calls 3 tools at once, will they all execute before next iteration? + - Or will opencode's loop serialize them? + +3. **How does this affect Chutes AI billing specifically?** + - Does Chutes count: (a) HTTP requests, (b) tokens, or (c) conversation steps? + - If (a), then `maxSteps: 1` definitely helps + - If (b) or (c), may not help as much + +4. **Can we test without affecting production?** + - Need a test environment or feature flag + - Should A/B test with different providers + +--- + +## Recommended Next Steps + +1. **Create a Test Branch:** Implement Option A (`maxSteps: 1`) in isolation +2. **Test Sequential Workflows:** Verify read→edit workflows still work +3. **Monitor Request Count:** Log actual HTTP requests to Chutes API +4. **Measure Latency:** Check if response times change significantly +5. **Test Parallel Tool Calls:** Ensure multiple tools in one step work correctly +6. **Document Behavior:** Update documentation to explain the flow +7. **Consider Option B:** If Option A breaks workflows, implement manual tool execution + +--- + +## Code References + +### streamText Call (llm.ts:87-167) +```typescript +const result = streamText({ + onError(error) { ... }, + async experimental_repairToolCall(failed) { ... }, + temperature: params.temperature, + topP: params.topP, + topK: params.topK, + providerOptions: ProviderTransform.providerOptions(input.model, params.options), + activeTools: Object.keys(tools).filter((x) => x !== "invalid"), + tools, // <-- These have execute functions! + maxOutputTokens, + abortSignal: input.abort, + headers: { ... }, + maxRetries: 0, + messages: [ ... ], + model: wrapLanguageModel({ ... }), + experimental_telemetry: { ... }, + // MISSING: maxSteps or stopWhen parameter! +}) +``` + +### Tool Definition with Execute (prompt.ts:716-745) +```typescript +tools[item.id] = tool({ + id: item.id as any, + description: item.description, + inputSchema: jsonSchema(schema as any), + async execute(args, options) { + const ctx = context(args, options) + await Plugin.trigger("tool.execute.before", ...) + const result = await item.execute(args, ctx) // <-- Execute function! + await Plugin.trigger("tool.execute.after", ...) + return result + }, +}) +``` + +### Opencode's Outer Loop (prompt.ts:282) +```typescript +while (true) { + SessionStatus.set(sessionID, { type: "busy" }) + let step = 0 + step++ + + const tools = await resolveTools({...}) + + const result = await processor.process({ + tools, + model, + // ... + }) + + if (result === "stop") break +} +``` + +--- + +## Conclusion + +The issue is confirmed: the Vercel AI SDK's automatic multi-step execution causes multiple Chutes AI API requests per conversation turn. The proposed fix of adding `maxSteps: 1` or `stopWhen: stepCountIs(1)` would reduce this to one request per opencode loop iteration. + +However, the user's concerns about breaking sequential workflows are valid and need thorough testing before implementation. The recommended approach is to create a test branch, verify all workflow types, and then decide on the best solution. + +**Priority:** HIGH +**Effort:** LOW for Option A, HIGH for Option B +**Risk:** MEDIUM-HIGH (may break existing workflows)