Files
shopify-ai-backup/opencode/TOKEN_EFFICIENCY_GUIDE.md
southseact-3d 1fbf5abce6 feat: Add support for multiple AI providers (bytez, llm7.io, aimlapi.com, routeway.ai, g4f.dev) and fix Chutes loader
- Add custom loaders for bytez, llm7, aimlapi, routeway, and g4f providers
- Add provider definitions to models-api.json with sample models
- Add provider icon names to types.ts
- Chutes loader already exists and should work with CHUTES_API_KEY env var

Providers added:
- bytez: Uses BYTEZ_API_KEY, OpenAI-compatible
- llm7: Uses LLM7_API_KEY (optional), OpenAI-compatible
- aimlapi: Uses AIMLAPI_API_KEY, OpenAI-compatible
- routeway: Uses ROUTEWAY_API_KEY, OpenAI-compatible
- g4f: Uses G4F_API_KEY (optional), free tier available
2026-02-08 16:07:02 +00:00

22 KiB

Token Efficiency Guide for OpenCode CLI

This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.

Overview

Token efficiency is critical for:

  • Reducing API costs
  • Avoiding context window overflows
  • Improving response latency
  • Enabling longer conversations

OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.


Current Token Management

Existing Mechanisms

Mechanism Location Description
Compaction src/session/compaction.ts Summarizes conversation history when context is exceeded
Truncation src/tool/truncation.ts Limits tool outputs to 2000 lines / 50KB
Pruning src/session/compaction.ts:41-90 Removes old tool outputs beyond 40K tokens
Token Estimation src/util/token.ts Uses 4:1 character-to-token ratio

Token Flow

User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API

1. Smart Compaction Strategy

Current Behavior: Compaction triggers reactively when tokens exceed threshold.

Improvements:

  • Predictive Compaction: Analyze token growth patterns and compact proactively before reaching limits
  • Configurable Thresholds: Allow compaction at 70-80% of context instead of 100%
  • Task-Aware Triggers: Compact before expensive operations (file edits, builds)
// Example: Predictive compaction logic
async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
  const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
    return acc + Token.estimate(msg.content)
  }, 0)
  
  const trend = recentGrowth / 5
  const projectedTotal = Token.estimate(allMessages) + (trend * 3)
  
  return projectedTotal > contextLimit * 0.7 ? 1 : 0
}

Files to Modify:

  • src/session/compaction.ts
  • src/session/prompt.ts

2. Enhanced Token Estimation

Current Behavior: Simple 4:1 character ratio estimation.

Improvements:

  • Use tiktoken for accurate OpenAI/Anthropic tokenization
  • Add provider-specific token estimators
  • Cache token counts to avoid recalculation
// src/util/token.ts - Enhanced estimation
import cl100k_base from "@dqbd/tiktoken/cl100k_base"

const encoder = cl100k_base()

export namespace Token {
  export function estimate(input: string): number {
    return encoder.encode(input).length
  }

  export function estimateMessages(messages: ModelMessage[]): number {
    const perMessageOverhead = 3 // <|start|> role content <|end|>
    const base = messages.length * perMessageOverhead
    const content = messages.reduce((acc, msg) => {
      if (typeof msg.content === "string") {
        return acc + encoder.encode(msg.content).length
      }
      return acc + encoder.encode(JSON.stringify(msg.content)).length
    }, 0)
    return base + content
  }
}

Files to Modify:

  • src/util/token.ts
  • src/provider/provider.ts (add token limits)

3. Intelligent Tool Output Management

Current Behavior: Fixed truncation at 2000 lines / 50KB.

Improvements:

  • Content-Aware Truncation:
    • Code: Keep function signatures, truncate bodies
    • Logs: Keep head+tail, truncate middle
    • JSON: Preserve structure, truncate arrays
    • Errors: Never truncate
// src/tool/truncation.ts - Smart truncation
export async function smartTruncate(
  content: string,
  options: SmartTruncateOptions = {}
): Promise<Truncate.Result> {
  const { fileType = detectFileType(content), maxTokens = 8000 } = options

  switch (fileType) {
    case "code":
      return truncateCode(content, maxTokens)
    case "logs":
      return truncateLogs(content, maxTokens)
    case "json":
      return truncateJSON(content, maxTokens)
    case "error":
      return { content, truncated: false }
    default:
      return genericTruncate(content, maxTokens)
  }
}

function truncateCode(content: string, maxTokens: number): Truncate.Result {
  const lines = content.split("\n")
  const result: string[] = []

  let currentTokens = 0
  const overheadPerLine = 2 // ~0.5 tokens per line

  for (const line of lines) {
    const lineTokens = Token.estimate(line)
    if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
      break
    }

    // Always include function signatures
    if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
      result.push(line)
      currentTokens += lineTokens
      continue
    }

    // Skip implementation details after max reached
    if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
      result.push(`  // ${lines.length - result.length} lines truncated...`)
      break
    }

    result.push(line)
    currentTokens += lineTokens
  }

  return formatResult(result, content)
}

Files to Modify:

  • src/tool/truncation.ts
  • Add src/tool/truncation/code.ts
  • Add src/tool/truncation/logs.ts

4. Message History Optimization

Current Behavior: Full message history sent until compaction.

Improvements:

  • Importance Scoring: Prioritize messages by importance
  • Selective History: Remove low-value messages
  • Ephemeral Messages: Mark transient context for removal
// Message importance scoring
const MESSAGE_IMPORTANCE = {
  user_request: 100,
  file_edit: 90,
  agent_completion: 80,
  tool_success: 60,
  tool_output: 50,
  intermediate_result: 30,
  system_reminder: 20,
}

function scoreMessage(message: MessageV2.WithParts): number {
  let score = 0

  // Role-based scoring
  if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
  if (message.info.role === "assistant") {
    if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
      score += MESSAGE_IMPORTANCE.file_edit
    } else {
      score += MESSAGE_IMPORTANCE.agent_completion
    }
  }

  // Tool call scoring
  for (const part of message.parts) {
    if (part.type === "tool") {
      const toolScore = getToolImportanceScore(part.tool)
      score += toolScore
    }
  }

  return score
}

// Selective history retention
async function getOptimizedHistory(
  sessionID: string,
  maxTokens: number
): Promise<MessageV2.WithParts[]> {
  const messages = await Session.messages({ sessionID })
  const scored = messages.map(msg => ({
    message: msg,
    score: scoreMessage(msg),
    tokens: Token.estimate(msg),
  }))

  scored.sort((a, b) => b.score - a.score)

  const result: MessageV2.WithParts[] = []
  let usedTokens = 0

  for (const item of scored) {
    if (usedTokens + item.tokens > maxTokens) break

    // Always keep last user message
    if (item.message.info.role === "user" &&
        result.length > 0 &&
        result[result.length - 1].info.id < item.message.info.id) {
      result.push(item.message)
      usedTokens += item.tokens
      continue
    }

    // Keep if high importance score
    if (item.score >= 50) {
      result.push(item.message)
      usedTokens += item.tokens
    }
  }

  return result.reverse()
}

Files to Modify:

  • src/session/message-v2.ts
  • src/session/prompt.ts

5. System Prompt Compression

Current Behavior: Provider-specific prompts loaded from text files.

Improvements:

  • Audit and compress prompts
  • Move optional instructions to first user message
  • Create "minimal" mode for quick tasks
// src/session/system.ts - Compressed prompts
export namespace SystemPrompt {
  // Core instructions (always sent)
  const CORE_PROMPT = `You are an expert software engineering assistant.`

  // Optional instructions (sent based on context)
  const OPTIONAL_PROMPTS = {
    code_quality: `Focus on clean, maintainable code with proper error handling.`,
    testing: `Always write tests for new functionality.`,
    documentation: `Document complex logic and API surfaces.`,
  }

  export async function getCompressedPrompt(
    model: Provider.Model,
    context: PromptContext
  ): Promise<string[]> {
    const prompts: string[] = [CORE_PROMPT]

    // Add model-specific base prompt
    const basePrompt = getBasePrompt(model)
    prompts.push(basePrompt)

    // Conditionally add optional prompts
    if (context.needsQualityFocus) {
      prompts.push(OPTIONAL_PROMPTS.code_quality)
    }
    if (context.needsTesting) {
      prompts.push(OPTIONAL_PROMPTS.testing)
    }

    return prompts
  }
}

Files to Modify:

  • src/session/system.ts
  • src/session/prompt/*.txt

6. Smart Grep Result Limits

Current Behavior: Hard limit of 100 matches.

Improvements:

  • Reduce default to 50 matches
  • Add priority scoring based on relevance
  • Group matches by file
// src/tool/grep.ts - Enhanced result handling
const DEFAULT_MATCH_LIMIT = 50
const PRIORITY_WEIGHTS = {
  recently_modified: 1.5,
  same_directory: 1.3,
  matching_extension: 1.2,
  exact_match: 1.1,
}

interface MatchPriority {
  match: Match
  score: number
}

function scoreMatch(match: Match, context: GrepContext): number {
  let score = 1.0

  // Recently modified files
  const fileAge = Date.now() - match.modTime
  if (fileAge < 7 * 24 * 60 * 60 * 1000) {
    score *= PRIORITY_WEIGHTS.recently_modified
  }

  // Same directory as current work
  if (match.path.startsWith(context.cwd)) {
    score *= PRIORITY_WEIGHTS.same_directory
  }

  // Matching extension
  if (context.targetExtensions.includes(path.extname(match.path))) {
    score *= PRIORITY_WEIGHTS.matching_extension
  }

  return score
}

export async function execute(params: GrepParams, ctx: Tool.Context) {
  const results = await ripgrep(params)
  const scored: MatchPriority[] = results.map(match => ({
    match,
    score: scoreMatch(match, ctx),
  }))

  scored.sort((a, b) => b.score - a.score)

  const limit = params.limit ?? DEFAULT_MATCH_LIMIT
  const topMatches = scored.slice(0, limit)

  return formatGroupedOutput(topMatches)
}

function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
  const byFile = groupBy(matches, m => path.dirname(m.match.path))

  const output: string[] = []
  output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)

  for (const [dir, fileMatches] of byFile) {
    output.push(`\n${dir}:`)
    for (const { match, score } of fileMatches.slice(0, 10)) {
      const relevance = score > 1.0 ? " [high relevance]" : ""
      output.push(`  Line ${match.lineNum}: ${match.lineText}${relevance}`)
    }
    if (fileMatches.length > 10) {
      output.push(`  ... and ${fileMatches.length - 10} more`)
    }
  }

  return { output: output.join("\n") }
}

Files to Modify:

  • src/tool/grep.ts

7. Web Search Context Optimization

Current Behavior: 10,000 character default limit.

Improvements:

  • Reduce default to 6,000 characters
  • Content quality scoring
  • Query-relevant extraction
// src/tool/websearch.ts - Optimized content extraction
const DEFAULT_CONTEXT_CHARS = 6000

interface ContentQualityScore {
  source: string
  score: number
  relevantSections: string[]
}

function scoreAndExtract(
  content: string,
  query: string
): ContentQualityScore {
  const paragraphs = content.split(/\n\n+/)
  const queryTerms = query.toLowerCase().split(/\s+/)

  const scored = paragraphs.map(para => {
    const lower = para.toLowerCase()
    const termMatches = queryTerms.filter(term => lower.includes(term)).length
    const density = termMatches / para.length
    const position = paragraphs.indexOf(para) / paragraphs.length

    return {
      para,
      score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
    }
  })

  scored.sort((a, b) => b.score - a.score)

  const relevant: string[] = []
  let usedChars = 0

  for (const { para } of scored) {
    if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
    relevant.push(para)
    usedChars += para.length
  }

  return {
    source: content.substring(0, DEFAULT_CONTEXT_CHARS),
    score: scored[0]?.score ?? 0,
    relevantSections: relevant,
  }
}

export async function execute(params: WebSearchParams, ctx: Tool.Context) {
  const response = await exaSearch({
    query: params.query,
    numResults: params.numResults ?? 8,
    type: params.type ?? "auto",
    livecrawl: params.livecrawl ?? "fallback",
    contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
  })

  const optimized = optimizeResults(response.results, params.query)

  return {
    output: formatOptimizedResults(optimized),
    metadata: {
      originalChars: response.totalChars,
      optimizedChars: optimized.totalChars,
      savings: 1 - (optimized.totalChars / response.totalChars),
    },
  }
}

Files to Modify:

  • src/tool/websearch.ts

8. File Read Optimization

Current Behavior: Full file content sent unless offset/limit specified.

Improvements:

  • Default limits based on file type
  • Smart offset detection (function boundaries)
// src/tool/read.ts - Optimized file reading
const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
  ".json": { defaultLimit: Infinity, truncate: false },
  ".md": { defaultLimit: 2000, truncate: true },
  ".ts": { defaultLimit: 400, truncate: true },
  ".js": { defaultLimit: 400, truncate: true },
  ".py": { defaultLimit: 400, truncate: true },
  ".yml": { defaultLimit: 500, truncate: true },
  ".yaml": { defaultLimit: 500, truncate: true },
  ".txt": { defaultLimit: 1000, truncate: true },
  default: { defaultLimit: 300, truncate: true },
}

export async function execute(params: ReadParams, ctx: Tool.Context) {
  const ext = path.extname(params.filePath)
  const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default

  const offset = params.offset ?? 0
  const limit = params.limit ?? config.defaultLimit

  const file = Bun.file(params.filePath)
  const content = await file.text()
  const lines = content.split("\n")

  if (!config.truncate || lines.length <= limit + offset) {
    return {
      output: content,
      attachments: [],
    }
  }

  const displayedLines = lines.slice(offset, offset + limit)
  const output = [
    ...displayedLines,
    "",
    `... ${lines.length - displayedLines.length} lines truncated ...`,
    "",
    `File: ${params.filePath}`,
    `Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
  ].join("\n")

  return {
    output,
    attachments: [{
      type: "file",
      filename: params.filePath,
      mime: mime.lookup(params.filePath) || "text/plain",
      url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
    }],
  }
}

Files to Modify:

  • src/tool/read.ts

9. Context Window Budgeting

Current Behavior: Fixed 32K output token reservation.

Improvements:

  • Dynamic budget allocation based on task type
  • Model-specific optimizations
// src/session/prompt.ts - Dynamic budget allocation
const TASK_BUDGETS: Record<string, TaskBudget> = {
  code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
  exploration: { inputRatio: 0.8, outputRatio: 0.2 },
  qa: { inputRatio: 0.7, outputRatio: 0.3 },
  refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
  debugging: { inputRatio: 0.7, outputRatio: 0.3 },
  default: { inputRatio: 0.6, outputRatio: 0.4 },
}

interface BudgetCalculation {
  inputBudget: number
  outputBudget: number
  totalBudget: number
}

function calculateBudget(
  model: Provider.Model,
  taskType: string,
  estimatedInputTokens: number
): BudgetCalculation {
  const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
  const modelContext = model.limit.context
  const modelMaxOutput = model.limit.output

  // Dynamic budget based on task type
  const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
  const outputBudget = Math.min(
    modelMaxOutput,
    Math.floor(baseBudget * config.outputRatio),
    SessionPrompt.OUTPUT_TOKEN_MAX
  )
  const inputBudget = Math.floor(baseBudget * config.inputRatio)

  return {
    inputBudget,
    outputBudget,
    totalBudget: inputBudget + outputBudget,
  }
}

async function checkAndAdjustPrompt(
  messages: ModelMessage[],
  budget: BudgetCalculation
): Promise<ModelMessage[]> {
  const currentTokens = Token.estimateMessages(messages)

  if (currentTokens <= budget.inputBudget) {
    return messages
  }

  // Need to reduce - prioritize recent messages
  const result = await pruneMessagesToBudget(messages, budget.inputBudget)
  return result
}

Files to Modify:

  • src/session/prompt.ts
  • src/session/compaction.ts

10. Duplicate Detection

Current Behavior: No deduplication of content.

Improvements:

  • Hash and track tool outputs
  • Skip identical subsequent calls
  • Cache read file contents
// src/session/duplicate-detection.ts
const outputHashCache = new Map<string, string>()

function getContentHash(content: string): string {
  return Bun.hash(content, "sha256").toString()
}

async function deduplicateToolOutput(
  toolId: string,
  input: Record<string, unknown>,
  content: string
): Promise<{ isDuplicate: boolean; output: string }> {
  const hash = getContentHash(content)
  const key = `${toolId}:${JSON.stringify(input)}:${hash}`

  if (outputHashCache.has(key)) {
    return {
      isDuplicate: true,
      output: outputHashCache.get(key)!,
    }
  }

  outputHashCache.set(key, content)
  return { isDuplicate: false, output: content }
}

// In tool execution
async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
  const content = await tool.execute(args)

  const { isDuplicate, output } = await deduplicateToolOutput(
    tool.id,
    args,
    content.output
  )

  if (isDuplicate) {
    log.debug("Skipping duplicate tool output", { tool: tool.id })
    return {
      ...content,
      output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
      metadata: { ...content.metadata, duplicate: true },
    }
  }

  return content
}

Files to Modify:

  • Add src/session/duplicate-detection.ts
  • src/tool/tool.ts

Implementation Priority

Phase 1: High Impact, Low Risk

Priority Improvement Estimated Token Savings Risk
1 Enhanced Token Estimation 5-15% Low
2 Smart Grep Limits 10-20% Low
3 Web Search Optimization 20-30% Low
4 System Prompt Compression 5-10% Low

Phase 2: Medium Impact, Medium Risk

Priority Improvement Estimated Token Savings Risk
5 Tool Output Management 15-25% Medium
6 Message History Optimization 20-30% Medium
7 File Read Limits 10-20% Medium

Phase 3: High Impact, Higher Complexity

Priority Improvement Estimated Token Savings Risk
8 Smart Compaction 25-40% High
9 Context Budgeting 15-25% High
10 Duplicate Detection 10-15% Medium

Quality Preservation

Testing Strategy

  1. A/B Testing: Compare outputs before/after each optimization
  2. Quality Metrics: Track success rate, user satisfaction, task completion
  3. Rollback Mechanism: Config flags to disable optimizations per-session
// Config schema for optimization controls
const OptimizationConfig = z.object({
  smart_compaction: z.boolean().default(true),
  enhanced_estimation: z.boolean().default(true),
  smart_truncation: z.boolean().default(true),
  message_pruning: z.boolean().default(true),
  system_prompt_compression: z.boolean().default(true),
  grep_optimization: z.boolean().default(true),
  websearch_optimization: z.boolean().default(true),
  file_read_limits: z.boolean().default(true),
  context_budgeting: z.boolean().default(true),
  duplicate_detection: z.boolean().default(true),
})

// Usage
const config = await Config.get()
const optimizations = config.optimizations ?? {}

Monitoring

// Token efficiency metrics
export async function trackTokenMetrics(sessionID: string) {
  const messages = await Session.messages({ sessionID })

  const metrics = {
    totalTokens: 0,
    inputTokens: 0,
    outputTokens: 0,
    optimizationSavings: 0,
    compactionCount: 0,
    truncationCount: 0,
  }

  for (const msg of messages) {
    metrics.totalTokens += msg.tokens.input + msg.tokens.output
    metrics.inputTokens += msg.tokens.input
    metrics.outputTokens += msg.tokens.output

    if (msg.info.mode === "compaction") {
      metrics.compactionCount++
    }
  }

  return metrics
}

Configuration

Environment Variables

# Token optimization controls
OPENCODE_TOKEN_ESTIMATION=accurate    # accurate (tiktoken) or legacy (4:1)
OPENCODE_TRUNCATION_MODE=smart        # smart or legacy (fixed limits)
OPENCODE_COMPACTION_THRESHOLD=0.7      # trigger at 70% of context
OPENCODE_GREP_LIMIT=50                 # default match limit
OPENCODE_WEBSEARCH_CHARS=6000          # default context characters
OPENCODE_FILE_READ_LIMIT=400          # default lines for code files
OPENCODE_OUTPUT_BUDGET_RATIO=0.4      # percentage for output
OPENCODE_DUPLICATE_DETECTION=true     # enable cache

Per-Model Configuration

{
  "models": {
    "gpt-4o": {
      "context_limit": 128000,
      "output_limit": 16384,
      "token_budget": {
        "code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
        "exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
      }
    },
    "claude-sonnet-4-20250514": {
      "context_limit": 200000,
      "output_limit": 8192,
      "supports_prompt_cache": true
    }
  }
}

Migration Guide

Upgrading from Legacy Token Estimation

// Before (4:1 ratio)
const tokens = content.length / 4

// After (tiktoken)
const tokens = Token.estimate(content)

Upgrading from Legacy Truncation

// Before (fixed limits)
if (lines.length > 2000 || bytes > 51200) {
  truncate(content)
}

// After (smart truncation)
const result = await Truncate.smart(content, {
  fileType: detectFileType(content),
  maxTokens: 8000,
})

Best Practices

  1. Measure First: Always measure token usage before and after changes
  2. Incrementally Roll Out: Deploy optimizations gradually
  3. User Control: Allow users to override defaults
  4. Monitor Quality: Track task success rates alongside token savings
  5. Fallback Ready: Have fallback mechanisms for when optimizations fail

References

  • Files: src/util/token.ts, src/tool/truncation.ts, src/session/compaction.ts, src/session/prompt.ts, src/session/message-v2.ts, src/tool/grep.ts, src/tool/websearch.ts, src/tool/read.ts, src/session/system.ts
  • Dependencies: tiktoken, @dqbd/tiktoken
  • Related Issues: Context overflow handling, token tracking, prompt optimization