# Token Efficiency Guide for OpenCode CLI This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality. ## Overview Token efficiency is critical for: - Reducing API costs - Avoiding context window overflows - Improving response latency - Enabling longer conversations OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further. --- ## Current Token Management ### Existing Mechanisms | Mechanism | Location | Description | |-----------|----------|-------------| | Compaction | `src/session/compaction.ts` | Summarizes conversation history when context is exceeded | | Truncation | `src/tool/truncation.ts` | Limits tool outputs to 2000 lines / 50KB | | Pruning | `src/session/compaction.ts:41-90` | Removes old tool outputs beyond 40K tokens | | Token Estimation | `src/util/token.ts` | Uses 4:1 character-to-token ratio | ### Token Flow ``` User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API ``` --- ## Recommended Improvements ### 1. Smart Compaction Strategy **Current Behavior:** Compaction triggers reactively when tokens exceed threshold. **Improvements:** - **Predictive Compaction:** Analyze token growth patterns and compact proactively before reaching limits - **Configurable Thresholds:** Allow compaction at 70-80% of context instead of 100% - **Task-Aware Triggers:** Compact before expensive operations (file edits, builds) ```typescript // Example: Predictive compaction logic async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise { const recentGrowth = messages.slice(-5).reduce((acc, msg) => { return acc + Token.estimate(msg.content) }, 0) const trend = recentGrowth / 5 const projectedTotal = Token.estimate(allMessages) + (trend * 3) return projectedTotal > contextLimit * 0.7 ? 1 : 0 } ``` **Files to Modify:** - `src/session/compaction.ts` - `src/session/prompt.ts` --- ### 2. Enhanced Token Estimation **Current Behavior:** Simple 4:1 character ratio estimation. **Improvements:** - Use `tiktoken` for accurate OpenAI/Anthropic tokenization - Add provider-specific token estimators - Cache token counts to avoid recalculation ```typescript // src/util/token.ts - Enhanced estimation import cl100k_base from "@dqbd/tiktoken/cl100k_base" const encoder = cl100k_base() export namespace Token { export function estimate(input: string): number { return encoder.encode(input).length } export function estimateMessages(messages: ModelMessage[]): number { const perMessageOverhead = 3 // <|start|> role content <|end|> const base = messages.length * perMessageOverhead const content = messages.reduce((acc, msg) => { if (typeof msg.content === "string") { return acc + encoder.encode(msg.content).length } return acc + encoder.encode(JSON.stringify(msg.content)).length }, 0) return base + content } } ``` **Files to Modify:** - `src/util/token.ts` - `src/provider/provider.ts` (add token limits) --- ### 3. Intelligent Tool Output Management **Current Behavior:** Fixed truncation at 2000 lines / 50KB. **Improvements:** - **Content-Aware Truncation:** - **Code:** Keep function signatures, truncate bodies - **Logs:** Keep head+tail, truncate middle - **JSON:** Preserve structure, truncate arrays - **Errors:** Never truncate ```typescript // src/tool/truncation.ts - Smart truncation export async function smartTruncate( content: string, options: SmartTruncateOptions = {} ): Promise { const { fileType = detectFileType(content), maxTokens = 8000 } = options switch (fileType) { case "code": return truncateCode(content, maxTokens) case "logs": return truncateLogs(content, maxTokens) case "json": return truncateJSON(content, maxTokens) case "error": return { content, truncated: false } default: return genericTruncate(content, maxTokens) } } function truncateCode(content: string, maxTokens: number): Truncate.Result { const lines = content.split("\n") const result: string[] = [] let currentTokens = 0 const overheadPerLine = 2 // ~0.5 tokens per line for (const line of lines) { const lineTokens = Token.estimate(line) if (currentTokens + lineTokens + overheadPerLine > maxTokens) { break } // Always include function signatures if (line.match(/^(function|class|const|let|var|export|interface|type)/)) { result.push(line) currentTokens += lineTokens continue } // Skip implementation details after max reached if (result.length > 0 && result[result.length - 1].match(/^{$/)) { result.push(` // ${lines.length - result.length} lines truncated...`) break } result.push(line) currentTokens += lineTokens } return formatResult(result, content) } ``` **Files to Modify:** - `src/tool/truncation.ts` - Add `src/tool/truncation/code.ts` - Add `src/tool/truncation/logs.ts` --- ### 4. Message History Optimization **Current Behavior:** Full message history sent until compaction. **Improvements:** - **Importance Scoring:** Prioritize messages by importance - **Selective History:** Remove low-value messages - **Ephemeral Messages:** Mark transient context for removal ```typescript // Message importance scoring const MESSAGE_IMPORTANCE = { user_request: 100, file_edit: 90, agent_completion: 80, tool_success: 60, tool_output: 50, intermediate_result: 30, system_reminder: 20, } function scoreMessage(message: MessageV2.WithParts): number { let score = 0 // Role-based scoring if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request if (message.info.role === "assistant") { if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) { score += MESSAGE_IMPORTANCE.file_edit } else { score += MESSAGE_IMPORTANCE.agent_completion } } // Tool call scoring for (const part of message.parts) { if (part.type === "tool") { const toolScore = getToolImportanceScore(part.tool) score += toolScore } } return score } // Selective history retention async function getOptimizedHistory( sessionID: string, maxTokens: number ): Promise { const messages = await Session.messages({ sessionID }) const scored = messages.map(msg => ({ message: msg, score: scoreMessage(msg), tokens: Token.estimate(msg), })) scored.sort((a, b) => b.score - a.score) const result: MessageV2.WithParts[] = [] let usedTokens = 0 for (const item of scored) { if (usedTokens + item.tokens > maxTokens) break // Always keep last user message if (item.message.info.role === "user" && result.length > 0 && result[result.length - 1].info.id < item.message.info.id) { result.push(item.message) usedTokens += item.tokens continue } // Keep if high importance score if (item.score >= 50) { result.push(item.message) usedTokens += item.tokens } } return result.reverse() } ``` **Files to Modify:** - `src/session/message-v2.ts` - `src/session/prompt.ts` --- ### 5. System Prompt Compression **Current Behavior:** Provider-specific prompts loaded from text files. **Improvements:** - Audit and compress prompts - Move optional instructions to first user message - Create "minimal" mode for quick tasks ```typescript // src/session/system.ts - Compressed prompts export namespace SystemPrompt { // Core instructions (always sent) const CORE_PROMPT = `You are an expert software engineering assistant.` // Optional instructions (sent based on context) const OPTIONAL_PROMPTS = { code_quality: `Focus on clean, maintainable code with proper error handling.`, testing: `Always write tests for new functionality.`, documentation: `Document complex logic and API surfaces.`, } export async function getCompressedPrompt( model: Provider.Model, context: PromptContext ): Promise { const prompts: string[] = [CORE_PROMPT] // Add model-specific base prompt const basePrompt = getBasePrompt(model) prompts.push(basePrompt) // Conditionally add optional prompts if (context.needsQualityFocus) { prompts.push(OPTIONAL_PROMPTS.code_quality) } if (context.needsTesting) { prompts.push(OPTIONAL_PROMPTS.testing) } return prompts } } ``` **Files to Modify:** - `src/session/system.ts` - `src/session/prompt/*.txt` --- ### 6. Smart Grep Result Limits **Current Behavior:** Hard limit of 100 matches. **Improvements:** - Reduce default to 50 matches - Add priority scoring based on relevance - Group matches by file ```typescript // src/tool/grep.ts - Enhanced result handling const DEFAULT_MATCH_LIMIT = 50 const PRIORITY_WEIGHTS = { recently_modified: 1.5, same_directory: 1.3, matching_extension: 1.2, exact_match: 1.1, } interface MatchPriority { match: Match score: number } function scoreMatch(match: Match, context: GrepContext): number { let score = 1.0 // Recently modified files const fileAge = Date.now() - match.modTime if (fileAge < 7 * 24 * 60 * 60 * 1000) { score *= PRIORITY_WEIGHTS.recently_modified } // Same directory as current work if (match.path.startsWith(context.cwd)) { score *= PRIORITY_WEIGHTS.same_directory } // Matching extension if (context.targetExtensions.includes(path.extname(match.path))) { score *= PRIORITY_WEIGHTS.matching_extension } return score } export async function execute(params: GrepParams, ctx: Tool.Context) { const results = await ripgrep(params) const scored: MatchPriority[] = results.map(match => ({ match, score: scoreMatch(match, ctx), })) scored.sort((a, b) => b.score - a.score) const limit = params.limit ?? DEFAULT_MATCH_LIMIT const topMatches = scored.slice(0, limit) return formatGroupedOutput(topMatches) } function formatGroupedOutput(matches: MatchPriority[]): ToolResult { const byFile = groupBy(matches, m => path.dirname(m.match.path)) const output: string[] = [] output.push(`Found ${matches.length} matches across ${byFile.size} files\n`) for (const [dir, fileMatches] of byFile) { output.push(`\n${dir}:`) for (const { match, score } of fileMatches.slice(0, 10)) { const relevance = score > 1.0 ? " [high relevance]" : "" output.push(` Line ${match.lineNum}: ${match.lineText}${relevance}`) } if (fileMatches.length > 10) { output.push(` ... and ${fileMatches.length - 10} more`) } } return { output: output.join("\n") } } ``` **Files to Modify:** - `src/tool/grep.ts` --- ### 7. Web Search Context Optimization **Current Behavior:** 10,000 character default limit. **Improvements:** - Reduce default to 6,000 characters - Content quality scoring - Query-relevant extraction ```typescript // src/tool/websearch.ts - Optimized content extraction const DEFAULT_CONTEXT_CHARS = 6000 interface ContentQualityScore { source: string score: number relevantSections: string[] } function scoreAndExtract( content: string, query: string ): ContentQualityScore { const paragraphs = content.split(/\n\n+/) const queryTerms = query.toLowerCase().split(/\s+/) const scored = paragraphs.map(para => { const lower = para.toLowerCase() const termMatches = queryTerms.filter(term => lower.includes(term)).length const density = termMatches / para.length const position = paragraphs.indexOf(para) / paragraphs.length return { para, score: termMatches * 2 + density * 1000 + (1 - position) * 0.5, } }) scored.sort((a, b) => b.score - a.score) const relevant: string[] = [] let usedChars = 0 for (const { para } of scored) { if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break relevant.push(para) usedChars += para.length } return { source: content.substring(0, DEFAULT_CONTEXT_CHARS), score: scored[0]?.score ?? 0, relevantSections: relevant, } } export async function execute(params: WebSearchParams, ctx: Tool.Context) { const response = await exaSearch({ query: params.query, numResults: params.numResults ?? 8, type: params.type ?? "auto", livecrawl: params.livecrawl ?? "fallback", contextMaxCharacters: DEFAULT_CONTEXT_CHARS, }) const optimized = optimizeResults(response.results, params.query) return { output: formatOptimizedResults(optimized), metadata: { originalChars: response.totalChars, optimizedChars: optimized.totalChars, savings: 1 - (optimized.totalChars / response.totalChars), }, } } ``` **Files to Modify:** - `src/tool/websearch.ts` --- ### 8. File Read Optimization **Current Behavior:** Full file content sent unless offset/limit specified. **Improvements:** - Default limits based on file type - Smart offset detection (function boundaries) ```typescript // src/tool/read.ts - Optimized file reading const FILE_TYPE_CONFIGS: Record = { ".json": { defaultLimit: Infinity, truncate: false }, ".md": { defaultLimit: 2000, truncate: true }, ".ts": { defaultLimit: 400, truncate: true }, ".js": { defaultLimit: 400, truncate: true }, ".py": { defaultLimit: 400, truncate: true }, ".yml": { defaultLimit: 500, truncate: true }, ".yaml": { defaultLimit: 500, truncate: true }, ".txt": { defaultLimit: 1000, truncate: true }, default: { defaultLimit: 300, truncate: true }, } export async function execute(params: ReadParams, ctx: Tool.Context) { const ext = path.extname(params.filePath) const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default const offset = params.offset ?? 0 const limit = params.limit ?? config.defaultLimit const file = Bun.file(params.filePath) const content = await file.text() const lines = content.split("\n") if (!config.truncate || lines.length <= limit + offset) { return { output: content, attachments: [], } } const displayedLines = lines.slice(offset, offset + limit) const output = [ ...displayedLines, "", `... ${lines.length - displayedLines.length} lines truncated ...`, "", `File: ${params.filePath}`, `Lines: ${offset + 1}-${offset + limit} of ${lines.length}`, ].join("\n") return { output, attachments: [{ type: "file", filename: params.filePath, mime: mime.lookup(params.filePath) || "text/plain", url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`, }], } } ``` **Files to Modify:** - `src/tool/read.ts` --- ### 9. Context Window Budgeting **Current Behavior:** Fixed 32K output token reservation. **Improvements:** - Dynamic budget allocation based on task type - Model-specific optimizations ```typescript // src/session/prompt.ts - Dynamic budget allocation const TASK_BUDGETS: Record = { code_generation: { inputRatio: 0.5, outputRatio: 0.5 }, exploration: { inputRatio: 0.8, outputRatio: 0.2 }, qa: { inputRatio: 0.7, outputRatio: 0.3 }, refactoring: { inputRatio: 0.6, outputRatio: 0.4 }, debugging: { inputRatio: 0.7, outputRatio: 0.3 }, default: { inputRatio: 0.6, outputRatio: 0.4 }, } interface BudgetCalculation { inputBudget: number outputBudget: number totalBudget: number } function calculateBudget( model: Provider.Model, taskType: string, estimatedInputTokens: number ): BudgetCalculation { const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default const modelContext = model.limit.context const modelMaxOutput = model.limit.output // Dynamic budget based on task type const baseBudget = Math.min(modelContext, estimatedInputTokens * 2) const outputBudget = Math.min( modelMaxOutput, Math.floor(baseBudget * config.outputRatio), SessionPrompt.OUTPUT_TOKEN_MAX ) const inputBudget = Math.floor(baseBudget * config.inputRatio) return { inputBudget, outputBudget, totalBudget: inputBudget + outputBudget, } } async function checkAndAdjustPrompt( messages: ModelMessage[], budget: BudgetCalculation ): Promise { const currentTokens = Token.estimateMessages(messages) if (currentTokens <= budget.inputBudget) { return messages } // Need to reduce - prioritize recent messages const result = await pruneMessagesToBudget(messages, budget.inputBudget) return result } ``` **Files to Modify:** - `src/session/prompt.ts` - `src/session/compaction.ts` --- ### 10. Duplicate Detection **Current Behavior:** No deduplication of content. **Improvements:** - Hash and track tool outputs - Skip identical subsequent calls - Cache read file contents ```typescript // src/session/duplicate-detection.ts const outputHashCache = new Map() function getContentHash(content: string): string { return Bun.hash(content, "sha256").toString() } async function deduplicateToolOutput( toolId: string, input: Record, content: string ): Promise<{ isDuplicate: boolean; output: string }> { const hash = getContentHash(content) const key = `${toolId}:${JSON.stringify(input)}:${hash}` if (outputHashCache.has(key)) { return { isDuplicate: true, output: outputHashCache.get(key)!, } } outputHashCache.set(key, content) return { isDuplicate: false, output: content } } // In tool execution async function executeTool(tool: Tool.Info, args: Record) { const content = await tool.execute(args) const { isDuplicate, output } = await deduplicateToolOutput( tool.id, args, content.output ) if (isDuplicate) { log.debug("Skipping duplicate tool output", { tool: tool.id }) return { ...content, output: `[Previous identical output: ${content.output.substring(0, 100)}...]`, metadata: { ...content.metadata, duplicate: true }, } } return content } ``` **Files to Modify:** - Add `src/session/duplicate-detection.ts` - `src/tool/tool.ts` --- ## Implementation Priority ### Phase 1: High Impact, Low Risk | Priority | Improvement | Estimated Token Savings | Risk | |----------|-------------|------------------------|------| | 1 | Enhanced Token Estimation | 5-15% | Low | | 2 | Smart Grep Limits | 10-20% | Low | | 3 | Web Search Optimization | 20-30% | Low | | 4 | System Prompt Compression | 5-10% | Low | ### Phase 2: Medium Impact, Medium Risk | Priority | Improvement | Estimated Token Savings | Risk | |----------|-------------|------------------------|------| | 5 | Tool Output Management | 15-25% | Medium | | 6 | Message History Optimization | 20-30% | Medium | | 7 | File Read Limits | 10-20% | Medium | ### Phase 3: High Impact, Higher Complexity | Priority | Improvement | Estimated Token Savings | Risk | |----------|-------------|------------------------|------| | 8 | Smart Compaction | 25-40% | High | | 9 | Context Budgeting | 15-25% | High | | 10 | Duplicate Detection | 10-15% | Medium | --- ## Quality Preservation ### Testing Strategy 1. **A/B Testing:** Compare outputs before/after each optimization 2. **Quality Metrics:** Track success rate, user satisfaction, task completion 3. **Rollback Mechanism:** Config flags to disable optimizations per-session ```typescript // Config schema for optimization controls const OptimizationConfig = z.object({ smart_compaction: z.boolean().default(true), enhanced_estimation: z.boolean().default(true), smart_truncation: z.boolean().default(true), message_pruning: z.boolean().default(true), system_prompt_compression: z.boolean().default(true), grep_optimization: z.boolean().default(true), websearch_optimization: z.boolean().default(true), file_read_limits: z.boolean().default(true), context_budgeting: z.boolean().default(true), duplicate_detection: z.boolean().default(true), }) // Usage const config = await Config.get() const optimizations = config.optimizations ?? {} ``` ### Monitoring ```typescript // Token efficiency metrics export async function trackTokenMetrics(sessionID: string) { const messages = await Session.messages({ sessionID }) const metrics = { totalTokens: 0, inputTokens: 0, outputTokens: 0, optimizationSavings: 0, compactionCount: 0, truncationCount: 0, } for (const msg of messages) { metrics.totalTokens += msg.tokens.input + msg.tokens.output metrics.inputTokens += msg.tokens.input metrics.outputTokens += msg.tokens.output if (msg.info.mode === "compaction") { metrics.compactionCount++ } } return metrics } ``` --- ## Configuration ### Environment Variables ```bash # Token optimization controls OPENCODE_TOKEN_ESTIMATION=accurate # accurate (tiktoken) or legacy (4:1) OPENCODE_TRUNCATION_MODE=smart # smart or legacy (fixed limits) OPENCODE_COMPACTION_THRESHOLD=0.7 # trigger at 70% of context OPENCODE_GREP_LIMIT=50 # default match limit OPENCODE_WEBSEARCH_CHARS=6000 # default context characters OPENCODE_FILE_READ_LIMIT=400 # default lines for code files OPENCODE_OUTPUT_BUDGET_RATIO=0.4 # percentage for output OPENCODE_DUPLICATE_DETECTION=true # enable cache ``` ### Per-Model Configuration ```json { "models": { "gpt-4o": { "context_limit": 128000, "output_limit": 16384, "token_budget": { "code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 }, "exploration": { "input_ratio": 0.8, "output_ratio": 0.2 } } }, "claude-sonnet-4-20250514": { "context_limit": 200000, "output_limit": 8192, "supports_prompt_cache": true } } } ``` --- ## Migration Guide ### Upgrading from Legacy Token Estimation ```typescript // Before (4:1 ratio) const tokens = content.length / 4 // After (tiktoken) const tokens = Token.estimate(content) ``` ### Upgrading from Legacy Truncation ```typescript // Before (fixed limits) if (lines.length > 2000 || bytes > 51200) { truncate(content) } // After (smart truncation) const result = await Truncate.smart(content, { fileType: detectFileType(content), maxTokens: 8000, }) ``` --- ## Best Practices 1. **Measure First:** Always measure token usage before and after changes 2. **Incrementally Roll Out:** Deploy optimizations gradually 3. **User Control:** Allow users to override defaults 4. **Monitor Quality:** Track task success rates alongside token savings 5. **Fallback Ready:** Have fallback mechanisms for when optimizations fail --- ## References - **Files:** `src/util/token.ts`, `src/tool/truncation.ts`, `src/session/compaction.ts`, `src/session/prompt.ts`, `src/session/message-v2.ts`, `src/tool/grep.ts`, `src/tool/websearch.ts`, `src/tool/read.ts`, `src/session/system.ts` - **Dependencies:** `tiktoken`, `@dqbd/tiktoken` - **Related Issues:** Context overflow handling, token tracking, prompt optimization