Files

southseact-3d 1fbf5abce6 feat: Add support for multiple AI providers (bytez, llm7.io, aimlapi.com, routeway.ai, g4f.dev) and fix Chutes loader

- Add custom loaders for bytez, llm7, aimlapi, routeway, and g4f providers
- Add provider definitions to models-api.json with sample models
- Add provider icon names to types.ts
- Chutes loader already exists and should work with CHUTES_API_KEY env var

Providers added:
- bytez: Uses BYTEZ_API_KEY, OpenAI-compatible
- llm7: Uses LLM7_API_KEY (optional), OpenAI-compatible
- aimlapi: Uses AIMLAPI_API_KEY, OpenAI-compatible
- routeway: Uses ROUTEWAY_API_KEY, OpenAI-compatible
- g4f: Uses G4F_API_KEY (optional), free tier available

2026-02-08 16:07:02 +00:00

22 KiB

Raw Blame History

Token Efficiency Guide for OpenCode CLI

This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.

Overview

Token efficiency is critical for:

Reducing API costs
Avoiding context window overflows
Improving response latency
Enabling longer conversations

OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.

Current Token Management

Existing Mechanisms

Mechanism	Location	Description
Compaction	`src/session/compaction.ts`	Summarizes conversation history when context is exceeded
Truncation	`src/tool/truncation.ts`	Limits tool outputs to 2000 lines / 50KB
Pruning	`src/session/compaction.ts:41-90`	Removes old tool outputs beyond 40K tokens
Token Estimation	`src/util/token.ts`	Uses 4:1 character-to-token ratio

Token Flow

User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API

Recommended Improvements

1. Smart Compaction Strategy

Current Behavior: Compaction triggers reactively when tokens exceed threshold.

Improvements:

Predictive Compaction: Analyze token growth patterns and compact proactively before reaching limits
Configurable Thresholds: Allow compaction at 70-80% of context instead of 100%
Task-Aware Triggers: Compact before expensive operations (file edits, builds)

// Example: Predictive compaction logic
async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
  const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
    return acc + Token.estimate(msg.content)
  }, 0)
  
  const trend = recentGrowth / 5
  const projectedTotal = Token.estimate(allMessages) + (trend * 3)
  
  return projectedTotal > contextLimit * 0.7 ? 1 : 0
}

Files to Modify:

src/session/compaction.ts
src/session/prompt.ts

2. Enhanced Token Estimation

Current Behavior: Simple 4:1 character ratio estimation.

Improvements:

Use tiktoken for accurate OpenAI/Anthropic tokenization
Add provider-specific token estimators
Cache token counts to avoid recalculation

// src/util/token.ts - Enhanced estimation
import cl100k_base from "@dqbd/tiktoken/cl100k_base"

const encoder = cl100k_base()

export namespace Token {
  export function estimate(input: string): number {
    return encoder.encode(input).length
  }

  export function estimateMessages(messages: ModelMessage[]): number {
    const perMessageOverhead = 3 // <|start|> role content <|end|>
    const base = messages.length * perMessageOverhead
    const content = messages.reduce((acc, msg) => {
      if (typeof msg.content === "string") {
        return acc + encoder.encode(msg.content).length
      }
      return acc + encoder.encode(JSON.stringify(msg.content)).length
    }, 0)
    return base + content
  }
}

Files to Modify:

src/util/token.ts
src/provider/provider.ts (add token limits)

3. Intelligent Tool Output Management

Current Behavior: Fixed truncation at 2000 lines / 50KB.

Improvements:

Content-Aware Truncation:
- Code: Keep function signatures, truncate bodies
- Logs: Keep head+tail, truncate middle
- JSON: Preserve structure, truncate arrays
- Errors: Never truncate

// src/tool/truncation.ts - Smart truncation
export async function smartTruncate(
  content: string,
  options: SmartTruncateOptions = {}
): Promise<Truncate.Result> {
  const { fileType = detectFileType(content), maxTokens = 8000 } = options

  switch (fileType) {
    case "code":
      return truncateCode(content, maxTokens)
    case "logs":
      return truncateLogs(content, maxTokens)
    case "json":
      return truncateJSON(content, maxTokens)
    case "error":
      return { content, truncated: false }
    default:
      return genericTruncate(content, maxTokens)
  }
}

function truncateCode(content: string, maxTokens: number): Truncate.Result {
  const lines = content.split("\n")
  const result: string[] = []

  let currentTokens = 0
  const overheadPerLine = 2 // ~0.5 tokens per line

  for (const line of lines) {
    const lineTokens = Token.estimate(line)
    if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
      break
    }

    // Always include function signatures
    if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
      result.push(line)
      currentTokens += lineTokens
      continue
    }

    // Skip implementation details after max reached
    if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
      result.push(`  // ${lines.length - result.length} lines truncated...`)
      break
    }

    result.push(line)
    currentTokens += lineTokens
  }

  return formatResult(result, content)
}

Files to Modify:

src/tool/truncation.ts
Add src/tool/truncation/code.ts
Add src/tool/truncation/logs.ts

4. Message History Optimization

Current Behavior: Full message history sent until compaction.

Improvements:

Importance Scoring: Prioritize messages by importance
Selective History: Remove low-value messages
Ephemeral Messages: Mark transient context for removal

// Message importance scoring
const MESSAGE_IMPORTANCE = {
  user_request: 100,
  file_edit: 90,
  agent_completion: 80,
  tool_success: 60,
  tool_output: 50,
  intermediate_result: 30,
  system_reminder: 20,
}

function scoreMessage(message: MessageV2.WithParts): number {
  let score = 0

  // Role-based scoring
  if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
  if (message.info.role === "assistant") {
    if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
      score += MESSAGE_IMPORTANCE.file_edit
    } else {
      score += MESSAGE_IMPORTANCE.agent_completion
    }
  }

  // Tool call scoring
  for (const part of message.parts) {
    if (part.type === "tool") {
      const toolScore = getToolImportanceScore(part.tool)
      score += toolScore
    }
  }

  return score
}

// Selective history retention
async function getOptimizedHistory(
  sessionID: string,
  maxTokens: number
): Promise<MessageV2.WithParts[]> {
  const messages = await Session.messages({ sessionID })
  const scored = messages.map(msg => ({
    message: msg,
    score: scoreMessage(msg),
    tokens: Token.estimate(msg),
  }))

  scored.sort((a, b) => b.score - a.score)

  const result: MessageV2.WithParts[] = []
  let usedTokens = 0

  for (const item of scored) {
    if (usedTokens + item.tokens > maxTokens) break

    // Always keep last user message
    if (item.message.info.role === "user" &&
        result.length > 0 &&
        result[result.length - 1].info.id < item.message.info.id) {
      result.push(item.message)
      usedTokens += item.tokens
      continue
    }

    // Keep if high importance score
    if (item.score >= 50) {
      result.push(item.message)
      usedTokens += item.tokens
    }
  }

  return result.reverse()
}

Files to Modify:

src/session/message-v2.ts
src/session/prompt.ts

5. System Prompt Compression

Current Behavior: Provider-specific prompts loaded from text files.

Improvements:

Audit and compress prompts
Move optional instructions to first user message
Create "minimal" mode for quick tasks

// src/session/system.ts - Compressed prompts
export namespace SystemPrompt {
  // Core instructions (always sent)
  const CORE_PROMPT = `You are an expert software engineering assistant.`

  // Optional instructions (sent based on context)
  const OPTIONAL_PROMPTS = {
    code_quality: `Focus on clean, maintainable code with proper error handling.`,
    testing: `Always write tests for new functionality.`,
    documentation: `Document complex logic and API surfaces.`,
  }

  export async function getCompressedPrompt(
    model: Provider.Model,
    context: PromptContext
  ): Promise<string[]> {
    const prompts: string[] = [CORE_PROMPT]

    // Add model-specific base prompt
    const basePrompt = getBasePrompt(model)
    prompts.push(basePrompt)

    // Conditionally add optional prompts
    if (context.needsQualityFocus) {
      prompts.push(OPTIONAL_PROMPTS.code_quality)
    }
    if (context.needsTesting) {
      prompts.push(OPTIONAL_PROMPTS.testing)
    }

    return prompts
  }
}

Files to Modify:

src/session/system.ts
src/session/prompt/*.txt

6. Smart Grep Result Limits

Current Behavior: Hard limit of 100 matches.

Improvements:

Reduce default to 50 matches
Add priority scoring based on relevance
Group matches by file

// src/tool/grep.ts - Enhanced result handling
const DEFAULT_MATCH_LIMIT = 50
const PRIORITY_WEIGHTS = {
  recently_modified: 1.5,
  same_directory: 1.3,
  matching_extension: 1.2,
  exact_match: 1.1,
}

interface MatchPriority {
  match: Match
  score: number
}

function scoreMatch(match: Match, context: GrepContext): number {
  let score = 1.0

  // Recently modified files
  const fileAge = Date.now() - match.modTime
  if (fileAge < 7 * 24 * 60 * 60 * 1000) {
    score *= PRIORITY_WEIGHTS.recently_modified
  }

  // Same directory as current work
  if (match.path.startsWith(context.cwd)) {
    score *= PRIORITY_WEIGHTS.same_directory
  }

  // Matching extension
  if (context.targetExtensions.includes(path.extname(match.path))) {
    score *= PRIORITY_WEIGHTS.matching_extension
  }

  return score
}

export async function execute(params: GrepParams, ctx: Tool.Context) {
  const results = await ripgrep(params)
  const scored: MatchPriority[] = results.map(match => ({
    match,
    score: scoreMatch(match, ctx),
  }))

  scored.sort((a, b) => b.score - a.score)

  const limit = params.limit ?? DEFAULT_MATCH_LIMIT
  const topMatches = scored.slice(0, limit)

  return formatGroupedOutput(topMatches)
}

function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
  const byFile = groupBy(matches, m => path.dirname(m.match.path))

  const output: string[] = []
  output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)

  for (const [dir, fileMatches] of byFile) {
    output.push(`\n${dir}:`)
    for (const { match, score } of fileMatches.slice(0, 10)) {
      const relevance = score > 1.0 ? " [high relevance]" : ""
      output.push(`  Line ${match.lineNum}: ${match.lineText}${relevance}`)
    }
    if (fileMatches.length > 10) {
      output.push(`  ... and ${fileMatches.length - 10} more`)
    }
  }

  return { output: output.join("\n") }
}

Files to Modify:

src/tool/grep.ts

7. Web Search Context Optimization

Current Behavior: 10,000 character default limit.

Improvements:

Reduce default to 6,000 characters
Content quality scoring
Query-relevant extraction

// src/tool/websearch.ts - Optimized content extraction
const DEFAULT_CONTEXT_CHARS = 6000

interface ContentQualityScore {
  source: string
  score: number
  relevantSections: string[]
}

function scoreAndExtract(
  content: string,
  query: string
): ContentQualityScore {
  const paragraphs = content.split(/\n\n+/)
  const queryTerms = query.toLowerCase().split(/\s+/)

  const scored = paragraphs.map(para => {
    const lower = para.toLowerCase()
    const termMatches = queryTerms.filter(term => lower.includes(term)).length
    const density = termMatches / para.length
    const position = paragraphs.indexOf(para) / paragraphs.length

    return {
      para,
      score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
    }
  })

  scored.sort((a, b) => b.score - a.score)

  const relevant: string[] = []
  let usedChars = 0

  for (const { para } of scored) {
    if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
    relevant.push(para)
    usedChars += para.length
  }

  return {
    source: content.substring(0, DEFAULT_CONTEXT_CHARS),
    score: scored[0]?.score ?? 0,
    relevantSections: relevant,
  }
}

export async function execute(params: WebSearchParams, ctx: Tool.Context) {
  const response = await exaSearch({
    query: params.query,
    numResults: params.numResults ?? 8,
    type: params.type ?? "auto",
    livecrawl: params.livecrawl ?? "fallback",
    contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
  })

  const optimized = optimizeResults(response.results, params.query)

  return {
    output: formatOptimizedResults(optimized),
    metadata: {
      originalChars: response.totalChars,
      optimizedChars: optimized.totalChars,
      savings: 1 - (optimized.totalChars / response.totalChars),
    },
  }
}

Files to Modify:

src/tool/websearch.ts

8. File Read Optimization

Current Behavior: Full file content sent unless offset/limit specified.

Improvements:

Default limits based on file type
Smart offset detection (function boundaries)

// src/tool/read.ts - Optimized file reading
const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
  ".json": { defaultLimit: Infinity, truncate: false },
  ".md": { defaultLimit: 2000, truncate: true },
  ".ts": { defaultLimit: 400, truncate: true },
  ".js": { defaultLimit: 400, truncate: true },
  ".py": { defaultLimit: 400, truncate: true },
  ".yml": { defaultLimit: 500, truncate: true },
  ".yaml": { defaultLimit: 500, truncate: true },
  ".txt": { defaultLimit: 1000, truncate: true },
  default: { defaultLimit: 300, truncate: true },
}

export async function execute(params: ReadParams, ctx: Tool.Context) {
  const ext = path.extname(params.filePath)
  const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default

  const offset = params.offset ?? 0
  const limit = params.limit ?? config.defaultLimit

  const file = Bun.file(params.filePath)
  const content = await file.text()
  const lines = content.split("\n")

  if (!config.truncate || lines.length <= limit + offset) {
    return {
      output: content,
      attachments: [],
    }
  }

  const displayedLines = lines.slice(offset, offset + limit)
  const output = [
    ...displayedLines,
    "",
    `... ${lines.length - displayedLines.length} lines truncated ...`,
    "",
    `File: ${params.filePath}`,
    `Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
  ].join("\n")

  return {
    output,
    attachments: [{
      type: "file",
      filename: params.filePath,
      mime: mime.lookup(params.filePath) || "text/plain",
      url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
    }],
  }
}

Files to Modify:

src/tool/read.ts

9. Context Window Budgeting

Current Behavior: Fixed 32K output token reservation.

Improvements:

Dynamic budget allocation based on task type
Model-specific optimizations

// src/session/prompt.ts - Dynamic budget allocation
const TASK_BUDGETS: Record<string, TaskBudget> = {
  code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
  exploration: { inputRatio: 0.8, outputRatio: 0.2 },
  qa: { inputRatio: 0.7, outputRatio: 0.3 },
  refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
  debugging: { inputRatio: 0.7, outputRatio: 0.3 },
  default: { inputRatio: 0.6, outputRatio: 0.4 },
}

interface BudgetCalculation {
  inputBudget: number
  outputBudget: number
  totalBudget: number
}

function calculateBudget(
  model: Provider.Model,
  taskType: string,
  estimatedInputTokens: number
): BudgetCalculation {
  const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
  const modelContext = model.limit.context
  const modelMaxOutput = model.limit.output

  // Dynamic budget based on task type
  const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
  const outputBudget = Math.min(
    modelMaxOutput,
    Math.floor(baseBudget * config.outputRatio),
    SessionPrompt.OUTPUT_TOKEN_MAX
  )
  const inputBudget = Math.floor(baseBudget * config.inputRatio)

  return {
    inputBudget,
    outputBudget,
    totalBudget: inputBudget + outputBudget,
  }
}

async function checkAndAdjustPrompt(
  messages: ModelMessage[],
  budget: BudgetCalculation
): Promise<ModelMessage[]> {
  const currentTokens = Token.estimateMessages(messages)

  if (currentTokens <= budget.inputBudget) {
    return messages
  }

  // Need to reduce - prioritize recent messages
  const result = await pruneMessagesToBudget(messages, budget.inputBudget)
  return result
}

Files to Modify:

src/session/prompt.ts
src/session/compaction.ts

10. Duplicate Detection

Current Behavior: No deduplication of content.

Improvements:

Hash and track tool outputs
Skip identical subsequent calls
Cache read file contents

// src/session/duplicate-detection.ts
const outputHashCache = new Map<string, string>()

function getContentHash(content: string): string {
  return Bun.hash(content, "sha256").toString()
}

async function deduplicateToolOutput(
  toolId: string,
  input: Record<string, unknown>,
  content: string
): Promise<{ isDuplicate: boolean; output: string }> {
  const hash = getContentHash(content)
  const key = `${toolId}:${JSON.stringify(input)}:${hash}`

  if (outputHashCache.has(key)) {
    return {
      isDuplicate: true,
      output: outputHashCache.get(key)!,
    }
  }

  outputHashCache.set(key, content)
  return { isDuplicate: false, output: content }
}

// In tool execution
async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
  const content = await tool.execute(args)

  const { isDuplicate, output } = await deduplicateToolOutput(
    tool.id,
    args,
    content.output
  )

  if (isDuplicate) {
    log.debug("Skipping duplicate tool output", { tool: tool.id })
    return {
      ...content,
      output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
      metadata: { ...content.metadata, duplicate: true },
    }
  }

  return content
}

Files to Modify:

Add src/session/duplicate-detection.ts
src/tool/tool.ts

Implementation Priority

Phase 1: High Impact, Low Risk

Priority	Improvement	Estimated Token Savings	Risk
1	Enhanced Token Estimation	5-15%	Low
2	Smart Grep Limits	10-20%	Low
3	Web Search Optimization	20-30%	Low
4	System Prompt Compression	5-10%	Low

Phase 2: Medium Impact, Medium Risk

Priority	Improvement	Estimated Token Savings	Risk
5	Tool Output Management	15-25%	Medium
6	Message History Optimization	20-30%	Medium
7	File Read Limits	10-20%	Medium

Phase 3: High Impact, Higher Complexity

Priority	Improvement	Estimated Token Savings	Risk
8	Smart Compaction	25-40%	High
9	Context Budgeting	15-25%	High
10	Duplicate Detection	10-15%	Medium

Quality Preservation

Testing Strategy

A/B Testing: Compare outputs before/after each optimization
Quality Metrics: Track success rate, user satisfaction, task completion
Rollback Mechanism: Config flags to disable optimizations per-session

// Config schema for optimization controls
const OptimizationConfig = z.object({
  smart_compaction: z.boolean().default(true),
  enhanced_estimation: z.boolean().default(true),
  smart_truncation: z.boolean().default(true),
  message_pruning: z.boolean().default(true),
  system_prompt_compression: z.boolean().default(true),
  grep_optimization: z.boolean().default(true),
  websearch_optimization: z.boolean().default(true),
  file_read_limits: z.boolean().default(true),
  context_budgeting: z.boolean().default(true),
  duplicate_detection: z.boolean().default(true),
})

// Usage
const config = await Config.get()
const optimizations = config.optimizations ?? {}

Monitoring

// Token efficiency metrics
export async function trackTokenMetrics(sessionID: string) {
  const messages = await Session.messages({ sessionID })

  const metrics = {
    totalTokens: 0,
    inputTokens: 0,
    outputTokens: 0,
    optimizationSavings: 0,
    compactionCount: 0,
    truncationCount: 0,
  }

  for (const msg of messages) {
    metrics.totalTokens += msg.tokens.input + msg.tokens.output
    metrics.inputTokens += msg.tokens.input
    metrics.outputTokens += msg.tokens.output

    if (msg.info.mode === "compaction") {
      metrics.compactionCount++
    }
  }

  return metrics
}

Configuration

Environment Variables

# Token optimization controls
OPENCODE_TOKEN_ESTIMATION=accurate    # accurate (tiktoken) or legacy (4:1)
OPENCODE_TRUNCATION_MODE=smart        # smart or legacy (fixed limits)
OPENCODE_COMPACTION_THRESHOLD=0.7      # trigger at 70% of context
OPENCODE_GREP_LIMIT=50                 # default match limit
OPENCODE_WEBSEARCH_CHARS=6000          # default context characters
OPENCODE_FILE_READ_LIMIT=400          # default lines for code files
OPENCODE_OUTPUT_BUDGET_RATIO=0.4      # percentage for output
OPENCODE_DUPLICATE_DETECTION=true     # enable cache

Per-Model Configuration

{
  "models": {
    "gpt-4o": {
      "context_limit": 128000,
      "output_limit": 16384,
      "token_budget": {
        "code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
        "exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
      }
    },
    "claude-sonnet-4-20250514": {
      "context_limit": 200000,
      "output_limit": 8192,
      "supports_prompt_cache": true
    }
  }
}

Migration Guide

Upgrading from Legacy Token Estimation

// Before (4:1 ratio)
const tokens = content.length / 4

// After (tiktoken)
const tokens = Token.estimate(content)

Upgrading from Legacy Truncation

// Before (fixed limits)
if (lines.length > 2000 || bytes > 51200) {
  truncate(content)
}

// After (smart truncation)
const result = await Truncate.smart(content, {
  fileType: detectFileType(content),
  maxTokens: 8000,
})

Best Practices

Measure First: Always measure token usage before and after changes
Incrementally Roll Out: Deploy optimizations gradually
User Control: Allow users to override defaults
Monitor Quality: Track task success rates alongside token savings
Fallback Ready: Have fallback mechanisms for when optimizations fail

References

Files: src/util/token.ts, src/tool/truncation.ts, src/session/compaction.ts, src/session/prompt.ts, src/session/message-v2.ts, src/tool/grep.ts, src/tool/websearch.ts, src/tool/read.ts, src/session/system.ts
Dependencies: tiktoken, @dqbd/tiktoken
Related Issues: Context overflow handling, token tracking, prompt optimization

22 KiB Raw Blame History

Token Efficiency Guide for OpenCode CLI

Overview

Current Token Management

Existing Mechanisms

Token Flow

Recommended Improvements

1. Smart Compaction Strategy

2. Enhanced Token Estimation

3. Intelligent Tool Output Management

4. Message History Optimization

5. System Prompt Compression

6. Smart Grep Result Limits

7. Web Search Context Optimization

8. File Read Optimization

9. Context Window Budgeting

10. Duplicate Detection

Implementation Priority

Phase 1: High Impact, Low Risk

Phase 2: Medium Impact, Medium Risk

Phase 3: High Impact, Higher Complexity

Quality Preservation

Testing Strategy

Monitoring

Configuration

Environment Variables

Per-Model Configuration

Migration Guide

Upgrading from Legacy Token Estimation

Upgrading from Legacy Truncation

Best Practices

References

22 KiB

Raw Blame History