- Add custom loaders for bytez, llm7, aimlapi, routeway, and g4f providers - Add provider definitions to models-api.json with sample models - Add provider icon names to types.ts - Chutes loader already exists and should work with CHUTES_API_KEY env var Providers added: - bytez: Uses BYTEZ_API_KEY, OpenAI-compatible - llm7: Uses LLM7_API_KEY (optional), OpenAI-compatible - aimlapi: Uses AIMLAPI_API_KEY, OpenAI-compatible - routeway: Uses ROUTEWAY_API_KEY, OpenAI-compatible - g4f: Uses G4F_API_KEY (optional), free tier available
22 KiB
Token Efficiency Guide for OpenCode CLI
This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.
Overview
Token efficiency is critical for:
- Reducing API costs
- Avoiding context window overflows
- Improving response latency
- Enabling longer conversations
OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.
Current Token Management
Existing Mechanisms
| Mechanism | Location | Description |
|---|---|---|
| Compaction | src/session/compaction.ts |
Summarizes conversation history when context is exceeded |
| Truncation | src/tool/truncation.ts |
Limits tool outputs to 2000 lines / 50KB |
| Pruning | src/session/compaction.ts:41-90 |
Removes old tool outputs beyond 40K tokens |
| Token Estimation | src/util/token.ts |
Uses 4:1 character-to-token ratio |
Token Flow
User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API
Recommended Improvements
1. Smart Compaction Strategy
Current Behavior: Compaction triggers reactively when tokens exceed threshold.
Improvements:
- Predictive Compaction: Analyze token growth patterns and compact proactively before reaching limits
- Configurable Thresholds: Allow compaction at 70-80% of context instead of 100%
- Task-Aware Triggers: Compact before expensive operations (file edits, builds)
// Example: Predictive compaction logic
async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
return acc + Token.estimate(msg.content)
}, 0)
const trend = recentGrowth / 5
const projectedTotal = Token.estimate(allMessages) + (trend * 3)
return projectedTotal > contextLimit * 0.7 ? 1 : 0
}
Files to Modify:
src/session/compaction.tssrc/session/prompt.ts
2. Enhanced Token Estimation
Current Behavior: Simple 4:1 character ratio estimation.
Improvements:
- Use
tiktokenfor accurate OpenAI/Anthropic tokenization - Add provider-specific token estimators
- Cache token counts to avoid recalculation
// src/util/token.ts - Enhanced estimation
import cl100k_base from "@dqbd/tiktoken/cl100k_base"
const encoder = cl100k_base()
export namespace Token {
export function estimate(input: string): number {
return encoder.encode(input).length
}
export function estimateMessages(messages: ModelMessage[]): number {
const perMessageOverhead = 3 // <|start|> role content <|end|>
const base = messages.length * perMessageOverhead
const content = messages.reduce((acc, msg) => {
if (typeof msg.content === "string") {
return acc + encoder.encode(msg.content).length
}
return acc + encoder.encode(JSON.stringify(msg.content)).length
}, 0)
return base + content
}
}
Files to Modify:
src/util/token.tssrc/provider/provider.ts(add token limits)
3. Intelligent Tool Output Management
Current Behavior: Fixed truncation at 2000 lines / 50KB.
Improvements:
- Content-Aware Truncation:
- Code: Keep function signatures, truncate bodies
- Logs: Keep head+tail, truncate middle
- JSON: Preserve structure, truncate arrays
- Errors: Never truncate
// src/tool/truncation.ts - Smart truncation
export async function smartTruncate(
content: string,
options: SmartTruncateOptions = {}
): Promise<Truncate.Result> {
const { fileType = detectFileType(content), maxTokens = 8000 } = options
switch (fileType) {
case "code":
return truncateCode(content, maxTokens)
case "logs":
return truncateLogs(content, maxTokens)
case "json":
return truncateJSON(content, maxTokens)
case "error":
return { content, truncated: false }
default:
return genericTruncate(content, maxTokens)
}
}
function truncateCode(content: string, maxTokens: number): Truncate.Result {
const lines = content.split("\n")
const result: string[] = []
let currentTokens = 0
const overheadPerLine = 2 // ~0.5 tokens per line
for (const line of lines) {
const lineTokens = Token.estimate(line)
if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
break
}
// Always include function signatures
if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
result.push(line)
currentTokens += lineTokens
continue
}
// Skip implementation details after max reached
if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
result.push(` // ${lines.length - result.length} lines truncated...`)
break
}
result.push(line)
currentTokens += lineTokens
}
return formatResult(result, content)
}
Files to Modify:
src/tool/truncation.ts- Add
src/tool/truncation/code.ts - Add
src/tool/truncation/logs.ts
4. Message History Optimization
Current Behavior: Full message history sent until compaction.
Improvements:
- Importance Scoring: Prioritize messages by importance
- Selective History: Remove low-value messages
- Ephemeral Messages: Mark transient context for removal
// Message importance scoring
const MESSAGE_IMPORTANCE = {
user_request: 100,
file_edit: 90,
agent_completion: 80,
tool_success: 60,
tool_output: 50,
intermediate_result: 30,
system_reminder: 20,
}
function scoreMessage(message: MessageV2.WithParts): number {
let score = 0
// Role-based scoring
if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
if (message.info.role === "assistant") {
if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
score += MESSAGE_IMPORTANCE.file_edit
} else {
score += MESSAGE_IMPORTANCE.agent_completion
}
}
// Tool call scoring
for (const part of message.parts) {
if (part.type === "tool") {
const toolScore = getToolImportanceScore(part.tool)
score += toolScore
}
}
return score
}
// Selective history retention
async function getOptimizedHistory(
sessionID: string,
maxTokens: number
): Promise<MessageV2.WithParts[]> {
const messages = await Session.messages({ sessionID })
const scored = messages.map(msg => ({
message: msg,
score: scoreMessage(msg),
tokens: Token.estimate(msg),
}))
scored.sort((a, b) => b.score - a.score)
const result: MessageV2.WithParts[] = []
let usedTokens = 0
for (const item of scored) {
if (usedTokens + item.tokens > maxTokens) break
// Always keep last user message
if (item.message.info.role === "user" &&
result.length > 0 &&
result[result.length - 1].info.id < item.message.info.id) {
result.push(item.message)
usedTokens += item.tokens
continue
}
// Keep if high importance score
if (item.score >= 50) {
result.push(item.message)
usedTokens += item.tokens
}
}
return result.reverse()
}
Files to Modify:
src/session/message-v2.tssrc/session/prompt.ts
5. System Prompt Compression
Current Behavior: Provider-specific prompts loaded from text files.
Improvements:
- Audit and compress prompts
- Move optional instructions to first user message
- Create "minimal" mode for quick tasks
// src/session/system.ts - Compressed prompts
export namespace SystemPrompt {
// Core instructions (always sent)
const CORE_PROMPT = `You are an expert software engineering assistant.`
// Optional instructions (sent based on context)
const OPTIONAL_PROMPTS = {
code_quality: `Focus on clean, maintainable code with proper error handling.`,
testing: `Always write tests for new functionality.`,
documentation: `Document complex logic and API surfaces.`,
}
export async function getCompressedPrompt(
model: Provider.Model,
context: PromptContext
): Promise<string[]> {
const prompts: string[] = [CORE_PROMPT]
// Add model-specific base prompt
const basePrompt = getBasePrompt(model)
prompts.push(basePrompt)
// Conditionally add optional prompts
if (context.needsQualityFocus) {
prompts.push(OPTIONAL_PROMPTS.code_quality)
}
if (context.needsTesting) {
prompts.push(OPTIONAL_PROMPTS.testing)
}
return prompts
}
}
Files to Modify:
src/session/system.tssrc/session/prompt/*.txt
6. Smart Grep Result Limits
Current Behavior: Hard limit of 100 matches.
Improvements:
- Reduce default to 50 matches
- Add priority scoring based on relevance
- Group matches by file
// src/tool/grep.ts - Enhanced result handling
const DEFAULT_MATCH_LIMIT = 50
const PRIORITY_WEIGHTS = {
recently_modified: 1.5,
same_directory: 1.3,
matching_extension: 1.2,
exact_match: 1.1,
}
interface MatchPriority {
match: Match
score: number
}
function scoreMatch(match: Match, context: GrepContext): number {
let score = 1.0
// Recently modified files
const fileAge = Date.now() - match.modTime
if (fileAge < 7 * 24 * 60 * 60 * 1000) {
score *= PRIORITY_WEIGHTS.recently_modified
}
// Same directory as current work
if (match.path.startsWith(context.cwd)) {
score *= PRIORITY_WEIGHTS.same_directory
}
// Matching extension
if (context.targetExtensions.includes(path.extname(match.path))) {
score *= PRIORITY_WEIGHTS.matching_extension
}
return score
}
export async function execute(params: GrepParams, ctx: Tool.Context) {
const results = await ripgrep(params)
const scored: MatchPriority[] = results.map(match => ({
match,
score: scoreMatch(match, ctx),
}))
scored.sort((a, b) => b.score - a.score)
const limit = params.limit ?? DEFAULT_MATCH_LIMIT
const topMatches = scored.slice(0, limit)
return formatGroupedOutput(topMatches)
}
function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
const byFile = groupBy(matches, m => path.dirname(m.match.path))
const output: string[] = []
output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)
for (const [dir, fileMatches] of byFile) {
output.push(`\n${dir}:`)
for (const { match, score } of fileMatches.slice(0, 10)) {
const relevance = score > 1.0 ? " [high relevance]" : ""
output.push(` Line ${match.lineNum}: ${match.lineText}${relevance}`)
}
if (fileMatches.length > 10) {
output.push(` ... and ${fileMatches.length - 10} more`)
}
}
return { output: output.join("\n") }
}
Files to Modify:
src/tool/grep.ts
7. Web Search Context Optimization
Current Behavior: 10,000 character default limit.
Improvements:
- Reduce default to 6,000 characters
- Content quality scoring
- Query-relevant extraction
// src/tool/websearch.ts - Optimized content extraction
const DEFAULT_CONTEXT_CHARS = 6000
interface ContentQualityScore {
source: string
score: number
relevantSections: string[]
}
function scoreAndExtract(
content: string,
query: string
): ContentQualityScore {
const paragraphs = content.split(/\n\n+/)
const queryTerms = query.toLowerCase().split(/\s+/)
const scored = paragraphs.map(para => {
const lower = para.toLowerCase()
const termMatches = queryTerms.filter(term => lower.includes(term)).length
const density = termMatches / para.length
const position = paragraphs.indexOf(para) / paragraphs.length
return {
para,
score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
}
})
scored.sort((a, b) => b.score - a.score)
const relevant: string[] = []
let usedChars = 0
for (const { para } of scored) {
if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
relevant.push(para)
usedChars += para.length
}
return {
source: content.substring(0, DEFAULT_CONTEXT_CHARS),
score: scored[0]?.score ?? 0,
relevantSections: relevant,
}
}
export async function execute(params: WebSearchParams, ctx: Tool.Context) {
const response = await exaSearch({
query: params.query,
numResults: params.numResults ?? 8,
type: params.type ?? "auto",
livecrawl: params.livecrawl ?? "fallback",
contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
})
const optimized = optimizeResults(response.results, params.query)
return {
output: formatOptimizedResults(optimized),
metadata: {
originalChars: response.totalChars,
optimizedChars: optimized.totalChars,
savings: 1 - (optimized.totalChars / response.totalChars),
},
}
}
Files to Modify:
src/tool/websearch.ts
8. File Read Optimization
Current Behavior: Full file content sent unless offset/limit specified.
Improvements:
- Default limits based on file type
- Smart offset detection (function boundaries)
// src/tool/read.ts - Optimized file reading
const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
".json": { defaultLimit: Infinity, truncate: false },
".md": { defaultLimit: 2000, truncate: true },
".ts": { defaultLimit: 400, truncate: true },
".js": { defaultLimit: 400, truncate: true },
".py": { defaultLimit: 400, truncate: true },
".yml": { defaultLimit: 500, truncate: true },
".yaml": { defaultLimit: 500, truncate: true },
".txt": { defaultLimit: 1000, truncate: true },
default: { defaultLimit: 300, truncate: true },
}
export async function execute(params: ReadParams, ctx: Tool.Context) {
const ext = path.extname(params.filePath)
const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default
const offset = params.offset ?? 0
const limit = params.limit ?? config.defaultLimit
const file = Bun.file(params.filePath)
const content = await file.text()
const lines = content.split("\n")
if (!config.truncate || lines.length <= limit + offset) {
return {
output: content,
attachments: [],
}
}
const displayedLines = lines.slice(offset, offset + limit)
const output = [
...displayedLines,
"",
`... ${lines.length - displayedLines.length} lines truncated ...`,
"",
`File: ${params.filePath}`,
`Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
].join("\n")
return {
output,
attachments: [{
type: "file",
filename: params.filePath,
mime: mime.lookup(params.filePath) || "text/plain",
url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
}],
}
}
Files to Modify:
src/tool/read.ts
9. Context Window Budgeting
Current Behavior: Fixed 32K output token reservation.
Improvements:
- Dynamic budget allocation based on task type
- Model-specific optimizations
// src/session/prompt.ts - Dynamic budget allocation
const TASK_BUDGETS: Record<string, TaskBudget> = {
code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
exploration: { inputRatio: 0.8, outputRatio: 0.2 },
qa: { inputRatio: 0.7, outputRatio: 0.3 },
refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
debugging: { inputRatio: 0.7, outputRatio: 0.3 },
default: { inputRatio: 0.6, outputRatio: 0.4 },
}
interface BudgetCalculation {
inputBudget: number
outputBudget: number
totalBudget: number
}
function calculateBudget(
model: Provider.Model,
taskType: string,
estimatedInputTokens: number
): BudgetCalculation {
const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
const modelContext = model.limit.context
const modelMaxOutput = model.limit.output
// Dynamic budget based on task type
const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
const outputBudget = Math.min(
modelMaxOutput,
Math.floor(baseBudget * config.outputRatio),
SessionPrompt.OUTPUT_TOKEN_MAX
)
const inputBudget = Math.floor(baseBudget * config.inputRatio)
return {
inputBudget,
outputBudget,
totalBudget: inputBudget + outputBudget,
}
}
async function checkAndAdjustPrompt(
messages: ModelMessage[],
budget: BudgetCalculation
): Promise<ModelMessage[]> {
const currentTokens = Token.estimateMessages(messages)
if (currentTokens <= budget.inputBudget) {
return messages
}
// Need to reduce - prioritize recent messages
const result = await pruneMessagesToBudget(messages, budget.inputBudget)
return result
}
Files to Modify:
src/session/prompt.tssrc/session/compaction.ts
10. Duplicate Detection
Current Behavior: No deduplication of content.
Improvements:
- Hash and track tool outputs
- Skip identical subsequent calls
- Cache read file contents
// src/session/duplicate-detection.ts
const outputHashCache = new Map<string, string>()
function getContentHash(content: string): string {
return Bun.hash(content, "sha256").toString()
}
async function deduplicateToolOutput(
toolId: string,
input: Record<string, unknown>,
content: string
): Promise<{ isDuplicate: boolean; output: string }> {
const hash = getContentHash(content)
const key = `${toolId}:${JSON.stringify(input)}:${hash}`
if (outputHashCache.has(key)) {
return {
isDuplicate: true,
output: outputHashCache.get(key)!,
}
}
outputHashCache.set(key, content)
return { isDuplicate: false, output: content }
}
// In tool execution
async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
const content = await tool.execute(args)
const { isDuplicate, output } = await deduplicateToolOutput(
tool.id,
args,
content.output
)
if (isDuplicate) {
log.debug("Skipping duplicate tool output", { tool: tool.id })
return {
...content,
output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
metadata: { ...content.metadata, duplicate: true },
}
}
return content
}
Files to Modify:
- Add
src/session/duplicate-detection.ts src/tool/tool.ts
Implementation Priority
Phase 1: High Impact, Low Risk
| Priority | Improvement | Estimated Token Savings | Risk |
|---|---|---|---|
| 1 | Enhanced Token Estimation | 5-15% | Low |
| 2 | Smart Grep Limits | 10-20% | Low |
| 3 | Web Search Optimization | 20-30% | Low |
| 4 | System Prompt Compression | 5-10% | Low |
Phase 2: Medium Impact, Medium Risk
| Priority | Improvement | Estimated Token Savings | Risk |
|---|---|---|---|
| 5 | Tool Output Management | 15-25% | Medium |
| 6 | Message History Optimization | 20-30% | Medium |
| 7 | File Read Limits | 10-20% | Medium |
Phase 3: High Impact, Higher Complexity
| Priority | Improvement | Estimated Token Savings | Risk |
|---|---|---|---|
| 8 | Smart Compaction | 25-40% | High |
| 9 | Context Budgeting | 15-25% | High |
| 10 | Duplicate Detection | 10-15% | Medium |
Quality Preservation
Testing Strategy
- A/B Testing: Compare outputs before/after each optimization
- Quality Metrics: Track success rate, user satisfaction, task completion
- Rollback Mechanism: Config flags to disable optimizations per-session
// Config schema for optimization controls
const OptimizationConfig = z.object({
smart_compaction: z.boolean().default(true),
enhanced_estimation: z.boolean().default(true),
smart_truncation: z.boolean().default(true),
message_pruning: z.boolean().default(true),
system_prompt_compression: z.boolean().default(true),
grep_optimization: z.boolean().default(true),
websearch_optimization: z.boolean().default(true),
file_read_limits: z.boolean().default(true),
context_budgeting: z.boolean().default(true),
duplicate_detection: z.boolean().default(true),
})
// Usage
const config = await Config.get()
const optimizations = config.optimizations ?? {}
Monitoring
// Token efficiency metrics
export async function trackTokenMetrics(sessionID: string) {
const messages = await Session.messages({ sessionID })
const metrics = {
totalTokens: 0,
inputTokens: 0,
outputTokens: 0,
optimizationSavings: 0,
compactionCount: 0,
truncationCount: 0,
}
for (const msg of messages) {
metrics.totalTokens += msg.tokens.input + msg.tokens.output
metrics.inputTokens += msg.tokens.input
metrics.outputTokens += msg.tokens.output
if (msg.info.mode === "compaction") {
metrics.compactionCount++
}
}
return metrics
}
Configuration
Environment Variables
# Token optimization controls
OPENCODE_TOKEN_ESTIMATION=accurate # accurate (tiktoken) or legacy (4:1)
OPENCODE_TRUNCATION_MODE=smart # smart or legacy (fixed limits)
OPENCODE_COMPACTION_THRESHOLD=0.7 # trigger at 70% of context
OPENCODE_GREP_LIMIT=50 # default match limit
OPENCODE_WEBSEARCH_CHARS=6000 # default context characters
OPENCODE_FILE_READ_LIMIT=400 # default lines for code files
OPENCODE_OUTPUT_BUDGET_RATIO=0.4 # percentage for output
OPENCODE_DUPLICATE_DETECTION=true # enable cache
Per-Model Configuration
{
"models": {
"gpt-4o": {
"context_limit": 128000,
"output_limit": 16384,
"token_budget": {
"code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
"exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
}
},
"claude-sonnet-4-20250514": {
"context_limit": 200000,
"output_limit": 8192,
"supports_prompt_cache": true
}
}
}
Migration Guide
Upgrading from Legacy Token Estimation
// Before (4:1 ratio)
const tokens = content.length / 4
// After (tiktoken)
const tokens = Token.estimate(content)
Upgrading from Legacy Truncation
// Before (fixed limits)
if (lines.length > 2000 || bytes > 51200) {
truncate(content)
}
// After (smart truncation)
const result = await Truncate.smart(content, {
fileType: detectFileType(content),
maxTokens: 8000,
})
Best Practices
- Measure First: Always measure token usage before and after changes
- Incrementally Roll Out: Deploy optimizations gradually
- User Control: Allow users to override defaults
- Monitor Quality: Track task success rates alongside token savings
- Fallback Ready: Have fallback mechanisms for when optimizations fail
References
- Files:
src/util/token.ts,src/tool/truncation.ts,src/session/compaction.ts,src/session/prompt.ts,src/session/message-v2.ts,src/tool/grep.ts,src/tool/websearch.ts,src/tool/read.ts,src/session/system.ts - Dependencies:
tiktoken,@dqbd/tiktoken - Related Issues: Context overflow handling, token tracking, prompt optimization