Files
shopify-ai-backup/opencode/TOKEN_EFFICIENCY_GUIDE.md
southseact-3d 1fbf5abce6 feat: Add support for multiple AI providers (bytez, llm7.io, aimlapi.com, routeway.ai, g4f.dev) and fix Chutes loader
- Add custom loaders for bytez, llm7, aimlapi, routeway, and g4f providers
- Add provider definitions to models-api.json with sample models
- Add provider icon names to types.ts
- Chutes loader already exists and should work with CHUTES_API_KEY env var

Providers added:
- bytez: Uses BYTEZ_API_KEY, OpenAI-compatible
- llm7: Uses LLM7_API_KEY (optional), OpenAI-compatible
- aimlapi: Uses AIMLAPI_API_KEY, OpenAI-compatible
- routeway: Uses ROUTEWAY_API_KEY, OpenAI-compatible
- g4f: Uses G4F_API_KEY (optional), free tier available
2026-02-08 16:07:02 +00:00

880 lines
22 KiB
Markdown

# Token Efficiency Guide for OpenCode CLI
This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.
## Overview
Token efficiency is critical for:
- Reducing API costs
- Avoiding context window overflows
- Improving response latency
- Enabling longer conversations
OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.
---
## Current Token Management
### Existing Mechanisms
| Mechanism | Location | Description |
|-----------|----------|-------------|
| Compaction | `src/session/compaction.ts` | Summarizes conversation history when context is exceeded |
| Truncation | `src/tool/truncation.ts` | Limits tool outputs to 2000 lines / 50KB |
| Pruning | `src/session/compaction.ts:41-90` | Removes old tool outputs beyond 40K tokens |
| Token Estimation | `src/util/token.ts` | Uses 4:1 character-to-token ratio |
### Token Flow
```
User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API
```
---
## Recommended Improvements
### 1. Smart Compaction Strategy
**Current Behavior:** Compaction triggers reactively when tokens exceed threshold.
**Improvements:**
- **Predictive Compaction:** Analyze token growth patterns and compact proactively before reaching limits
- **Configurable Thresholds:** Allow compaction at 70-80% of context instead of 100%
- **Task-Aware Triggers:** Compact before expensive operations (file edits, builds)
```typescript
// Example: Predictive compaction logic
async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
return acc + Token.estimate(msg.content)
}, 0)
const trend = recentGrowth / 5
const projectedTotal = Token.estimate(allMessages) + (trend * 3)
return projectedTotal > contextLimit * 0.7 ? 1 : 0
}
```
**Files to Modify:**
- `src/session/compaction.ts`
- `src/session/prompt.ts`
---
### 2. Enhanced Token Estimation
**Current Behavior:** Simple 4:1 character ratio estimation.
**Improvements:**
- Use `tiktoken` for accurate OpenAI/Anthropic tokenization
- Add provider-specific token estimators
- Cache token counts to avoid recalculation
```typescript
// src/util/token.ts - Enhanced estimation
import cl100k_base from "@dqbd/tiktoken/cl100k_base"
const encoder = cl100k_base()
export namespace Token {
export function estimate(input: string): number {
return encoder.encode(input).length
}
export function estimateMessages(messages: ModelMessage[]): number {
const perMessageOverhead = 3 // <|start|> role content <|end|>
const base = messages.length * perMessageOverhead
const content = messages.reduce((acc, msg) => {
if (typeof msg.content === "string") {
return acc + encoder.encode(msg.content).length
}
return acc + encoder.encode(JSON.stringify(msg.content)).length
}, 0)
return base + content
}
}
```
**Files to Modify:**
- `src/util/token.ts`
- `src/provider/provider.ts` (add token limits)
---
### 3. Intelligent Tool Output Management
**Current Behavior:** Fixed truncation at 2000 lines / 50KB.
**Improvements:**
- **Content-Aware Truncation:**
- **Code:** Keep function signatures, truncate bodies
- **Logs:** Keep head+tail, truncate middle
- **JSON:** Preserve structure, truncate arrays
- **Errors:** Never truncate
```typescript
// src/tool/truncation.ts - Smart truncation
export async function smartTruncate(
content: string,
options: SmartTruncateOptions = {}
): Promise<Truncate.Result> {
const { fileType = detectFileType(content), maxTokens = 8000 } = options
switch (fileType) {
case "code":
return truncateCode(content, maxTokens)
case "logs":
return truncateLogs(content, maxTokens)
case "json":
return truncateJSON(content, maxTokens)
case "error":
return { content, truncated: false }
default:
return genericTruncate(content, maxTokens)
}
}
function truncateCode(content: string, maxTokens: number): Truncate.Result {
const lines = content.split("\n")
const result: string[] = []
let currentTokens = 0
const overheadPerLine = 2 // ~0.5 tokens per line
for (const line of lines) {
const lineTokens = Token.estimate(line)
if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
break
}
// Always include function signatures
if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
result.push(line)
currentTokens += lineTokens
continue
}
// Skip implementation details after max reached
if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
result.push(` // ${lines.length - result.length} lines truncated...`)
break
}
result.push(line)
currentTokens += lineTokens
}
return formatResult(result, content)
}
```
**Files to Modify:**
- `src/tool/truncation.ts`
- Add `src/tool/truncation/code.ts`
- Add `src/tool/truncation/logs.ts`
---
### 4. Message History Optimization
**Current Behavior:** Full message history sent until compaction.
**Improvements:**
- **Importance Scoring:** Prioritize messages by importance
- **Selective History:** Remove low-value messages
- **Ephemeral Messages:** Mark transient context for removal
```typescript
// Message importance scoring
const MESSAGE_IMPORTANCE = {
user_request: 100,
file_edit: 90,
agent_completion: 80,
tool_success: 60,
tool_output: 50,
intermediate_result: 30,
system_reminder: 20,
}
function scoreMessage(message: MessageV2.WithParts): number {
let score = 0
// Role-based scoring
if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
if (message.info.role === "assistant") {
if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
score += MESSAGE_IMPORTANCE.file_edit
} else {
score += MESSAGE_IMPORTANCE.agent_completion
}
}
// Tool call scoring
for (const part of message.parts) {
if (part.type === "tool") {
const toolScore = getToolImportanceScore(part.tool)
score += toolScore
}
}
return score
}
// Selective history retention
async function getOptimizedHistory(
sessionID: string,
maxTokens: number
): Promise<MessageV2.WithParts[]> {
const messages = await Session.messages({ sessionID })
const scored = messages.map(msg => ({
message: msg,
score: scoreMessage(msg),
tokens: Token.estimate(msg),
}))
scored.sort((a, b) => b.score - a.score)
const result: MessageV2.WithParts[] = []
let usedTokens = 0
for (const item of scored) {
if (usedTokens + item.tokens > maxTokens) break
// Always keep last user message
if (item.message.info.role === "user" &&
result.length > 0 &&
result[result.length - 1].info.id < item.message.info.id) {
result.push(item.message)
usedTokens += item.tokens
continue
}
// Keep if high importance score
if (item.score >= 50) {
result.push(item.message)
usedTokens += item.tokens
}
}
return result.reverse()
}
```
**Files to Modify:**
- `src/session/message-v2.ts`
- `src/session/prompt.ts`
---
### 5. System Prompt Compression
**Current Behavior:** Provider-specific prompts loaded from text files.
**Improvements:**
- Audit and compress prompts
- Move optional instructions to first user message
- Create "minimal" mode for quick tasks
```typescript
// src/session/system.ts - Compressed prompts
export namespace SystemPrompt {
// Core instructions (always sent)
const CORE_PROMPT = `You are an expert software engineering assistant.`
// Optional instructions (sent based on context)
const OPTIONAL_PROMPTS = {
code_quality: `Focus on clean, maintainable code with proper error handling.`,
testing: `Always write tests for new functionality.`,
documentation: `Document complex logic and API surfaces.`,
}
export async function getCompressedPrompt(
model: Provider.Model,
context: PromptContext
): Promise<string[]> {
const prompts: string[] = [CORE_PROMPT]
// Add model-specific base prompt
const basePrompt = getBasePrompt(model)
prompts.push(basePrompt)
// Conditionally add optional prompts
if (context.needsQualityFocus) {
prompts.push(OPTIONAL_PROMPTS.code_quality)
}
if (context.needsTesting) {
prompts.push(OPTIONAL_PROMPTS.testing)
}
return prompts
}
}
```
**Files to Modify:**
- `src/session/system.ts`
- `src/session/prompt/*.txt`
---
### 6. Smart Grep Result Limits
**Current Behavior:** Hard limit of 100 matches.
**Improvements:**
- Reduce default to 50 matches
- Add priority scoring based on relevance
- Group matches by file
```typescript
// src/tool/grep.ts - Enhanced result handling
const DEFAULT_MATCH_LIMIT = 50
const PRIORITY_WEIGHTS = {
recently_modified: 1.5,
same_directory: 1.3,
matching_extension: 1.2,
exact_match: 1.1,
}
interface MatchPriority {
match: Match
score: number
}
function scoreMatch(match: Match, context: GrepContext): number {
let score = 1.0
// Recently modified files
const fileAge = Date.now() - match.modTime
if (fileAge < 7 * 24 * 60 * 60 * 1000) {
score *= PRIORITY_WEIGHTS.recently_modified
}
// Same directory as current work
if (match.path.startsWith(context.cwd)) {
score *= PRIORITY_WEIGHTS.same_directory
}
// Matching extension
if (context.targetExtensions.includes(path.extname(match.path))) {
score *= PRIORITY_WEIGHTS.matching_extension
}
return score
}
export async function execute(params: GrepParams, ctx: Tool.Context) {
const results = await ripgrep(params)
const scored: MatchPriority[] = results.map(match => ({
match,
score: scoreMatch(match, ctx),
}))
scored.sort((a, b) => b.score - a.score)
const limit = params.limit ?? DEFAULT_MATCH_LIMIT
const topMatches = scored.slice(0, limit)
return formatGroupedOutput(topMatches)
}
function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
const byFile = groupBy(matches, m => path.dirname(m.match.path))
const output: string[] = []
output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)
for (const [dir, fileMatches] of byFile) {
output.push(`\n${dir}:`)
for (const { match, score } of fileMatches.slice(0, 10)) {
const relevance = score > 1.0 ? " [high relevance]" : ""
output.push(` Line ${match.lineNum}: ${match.lineText}${relevance}`)
}
if (fileMatches.length > 10) {
output.push(` ... and ${fileMatches.length - 10} more`)
}
}
return { output: output.join("\n") }
}
```
**Files to Modify:**
- `src/tool/grep.ts`
---
### 7. Web Search Context Optimization
**Current Behavior:** 10,000 character default limit.
**Improvements:**
- Reduce default to 6,000 characters
- Content quality scoring
- Query-relevant extraction
```typescript
// src/tool/websearch.ts - Optimized content extraction
const DEFAULT_CONTEXT_CHARS = 6000
interface ContentQualityScore {
source: string
score: number
relevantSections: string[]
}
function scoreAndExtract(
content: string,
query: string
): ContentQualityScore {
const paragraphs = content.split(/\n\n+/)
const queryTerms = query.toLowerCase().split(/\s+/)
const scored = paragraphs.map(para => {
const lower = para.toLowerCase()
const termMatches = queryTerms.filter(term => lower.includes(term)).length
const density = termMatches / para.length
const position = paragraphs.indexOf(para) / paragraphs.length
return {
para,
score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
}
})
scored.sort((a, b) => b.score - a.score)
const relevant: string[] = []
let usedChars = 0
for (const { para } of scored) {
if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
relevant.push(para)
usedChars += para.length
}
return {
source: content.substring(0, DEFAULT_CONTEXT_CHARS),
score: scored[0]?.score ?? 0,
relevantSections: relevant,
}
}
export async function execute(params: WebSearchParams, ctx: Tool.Context) {
const response = await exaSearch({
query: params.query,
numResults: params.numResults ?? 8,
type: params.type ?? "auto",
livecrawl: params.livecrawl ?? "fallback",
contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
})
const optimized = optimizeResults(response.results, params.query)
return {
output: formatOptimizedResults(optimized),
metadata: {
originalChars: response.totalChars,
optimizedChars: optimized.totalChars,
savings: 1 - (optimized.totalChars / response.totalChars),
},
}
}
```
**Files to Modify:**
- `src/tool/websearch.ts`
---
### 8. File Read Optimization
**Current Behavior:** Full file content sent unless offset/limit specified.
**Improvements:**
- Default limits based on file type
- Smart offset detection (function boundaries)
```typescript
// src/tool/read.ts - Optimized file reading
const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
".json": { defaultLimit: Infinity, truncate: false },
".md": { defaultLimit: 2000, truncate: true },
".ts": { defaultLimit: 400, truncate: true },
".js": { defaultLimit: 400, truncate: true },
".py": { defaultLimit: 400, truncate: true },
".yml": { defaultLimit: 500, truncate: true },
".yaml": { defaultLimit: 500, truncate: true },
".txt": { defaultLimit: 1000, truncate: true },
default: { defaultLimit: 300, truncate: true },
}
export async function execute(params: ReadParams, ctx: Tool.Context) {
const ext = path.extname(params.filePath)
const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default
const offset = params.offset ?? 0
const limit = params.limit ?? config.defaultLimit
const file = Bun.file(params.filePath)
const content = await file.text()
const lines = content.split("\n")
if (!config.truncate || lines.length <= limit + offset) {
return {
output: content,
attachments: [],
}
}
const displayedLines = lines.slice(offset, offset + limit)
const output = [
...displayedLines,
"",
`... ${lines.length - displayedLines.length} lines truncated ...`,
"",
`File: ${params.filePath}`,
`Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
].join("\n")
return {
output,
attachments: [{
type: "file",
filename: params.filePath,
mime: mime.lookup(params.filePath) || "text/plain",
url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
}],
}
}
```
**Files to Modify:**
- `src/tool/read.ts`
---
### 9. Context Window Budgeting
**Current Behavior:** Fixed 32K output token reservation.
**Improvements:**
- Dynamic budget allocation based on task type
- Model-specific optimizations
```typescript
// src/session/prompt.ts - Dynamic budget allocation
const TASK_BUDGETS: Record<string, TaskBudget> = {
code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
exploration: { inputRatio: 0.8, outputRatio: 0.2 },
qa: { inputRatio: 0.7, outputRatio: 0.3 },
refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
debugging: { inputRatio: 0.7, outputRatio: 0.3 },
default: { inputRatio: 0.6, outputRatio: 0.4 },
}
interface BudgetCalculation {
inputBudget: number
outputBudget: number
totalBudget: number
}
function calculateBudget(
model: Provider.Model,
taskType: string,
estimatedInputTokens: number
): BudgetCalculation {
const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
const modelContext = model.limit.context
const modelMaxOutput = model.limit.output
// Dynamic budget based on task type
const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
const outputBudget = Math.min(
modelMaxOutput,
Math.floor(baseBudget * config.outputRatio),
SessionPrompt.OUTPUT_TOKEN_MAX
)
const inputBudget = Math.floor(baseBudget * config.inputRatio)
return {
inputBudget,
outputBudget,
totalBudget: inputBudget + outputBudget,
}
}
async function checkAndAdjustPrompt(
messages: ModelMessage[],
budget: BudgetCalculation
): Promise<ModelMessage[]> {
const currentTokens = Token.estimateMessages(messages)
if (currentTokens <= budget.inputBudget) {
return messages
}
// Need to reduce - prioritize recent messages
const result = await pruneMessagesToBudget(messages, budget.inputBudget)
return result
}
```
**Files to Modify:**
- `src/session/prompt.ts`
- `src/session/compaction.ts`
---
### 10. Duplicate Detection
**Current Behavior:** No deduplication of content.
**Improvements:**
- Hash and track tool outputs
- Skip identical subsequent calls
- Cache read file contents
```typescript
// src/session/duplicate-detection.ts
const outputHashCache = new Map<string, string>()
function getContentHash(content: string): string {
return Bun.hash(content, "sha256").toString()
}
async function deduplicateToolOutput(
toolId: string,
input: Record<string, unknown>,
content: string
): Promise<{ isDuplicate: boolean; output: string }> {
const hash = getContentHash(content)
const key = `${toolId}:${JSON.stringify(input)}:${hash}`
if (outputHashCache.has(key)) {
return {
isDuplicate: true,
output: outputHashCache.get(key)!,
}
}
outputHashCache.set(key, content)
return { isDuplicate: false, output: content }
}
// In tool execution
async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
const content = await tool.execute(args)
const { isDuplicate, output } = await deduplicateToolOutput(
tool.id,
args,
content.output
)
if (isDuplicate) {
log.debug("Skipping duplicate tool output", { tool: tool.id })
return {
...content,
output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
metadata: { ...content.metadata, duplicate: true },
}
}
return content
}
```
**Files to Modify:**
- Add `src/session/duplicate-detection.ts`
- `src/tool/tool.ts`
---
## Implementation Priority
### Phase 1: High Impact, Low Risk
| Priority | Improvement | Estimated Token Savings | Risk |
|----------|-------------|------------------------|------|
| 1 | Enhanced Token Estimation | 5-15% | Low |
| 2 | Smart Grep Limits | 10-20% | Low |
| 3 | Web Search Optimization | 20-30% | Low |
| 4 | System Prompt Compression | 5-10% | Low |
### Phase 2: Medium Impact, Medium Risk
| Priority | Improvement | Estimated Token Savings | Risk |
|----------|-------------|------------------------|------|
| 5 | Tool Output Management | 15-25% | Medium |
| 6 | Message History Optimization | 20-30% | Medium |
| 7 | File Read Limits | 10-20% | Medium |
### Phase 3: High Impact, Higher Complexity
| Priority | Improvement | Estimated Token Savings | Risk |
|----------|-------------|------------------------|------|
| 8 | Smart Compaction | 25-40% | High |
| 9 | Context Budgeting | 15-25% | High |
| 10 | Duplicate Detection | 10-15% | Medium |
---
## Quality Preservation
### Testing Strategy
1. **A/B Testing:** Compare outputs before/after each optimization
2. **Quality Metrics:** Track success rate, user satisfaction, task completion
3. **Rollback Mechanism:** Config flags to disable optimizations per-session
```typescript
// Config schema for optimization controls
const OptimizationConfig = z.object({
smart_compaction: z.boolean().default(true),
enhanced_estimation: z.boolean().default(true),
smart_truncation: z.boolean().default(true),
message_pruning: z.boolean().default(true),
system_prompt_compression: z.boolean().default(true),
grep_optimization: z.boolean().default(true),
websearch_optimization: z.boolean().default(true),
file_read_limits: z.boolean().default(true),
context_budgeting: z.boolean().default(true),
duplicate_detection: z.boolean().default(true),
})
// Usage
const config = await Config.get()
const optimizations = config.optimizations ?? {}
```
### Monitoring
```typescript
// Token efficiency metrics
export async function trackTokenMetrics(sessionID: string) {
const messages = await Session.messages({ sessionID })
const metrics = {
totalTokens: 0,
inputTokens: 0,
outputTokens: 0,
optimizationSavings: 0,
compactionCount: 0,
truncationCount: 0,
}
for (const msg of messages) {
metrics.totalTokens += msg.tokens.input + msg.tokens.output
metrics.inputTokens += msg.tokens.input
metrics.outputTokens += msg.tokens.output
if (msg.info.mode === "compaction") {
metrics.compactionCount++
}
}
return metrics
}
```
---
## Configuration
### Environment Variables
```bash
# Token optimization controls
OPENCODE_TOKEN_ESTIMATION=accurate # accurate (tiktoken) or legacy (4:1)
OPENCODE_TRUNCATION_MODE=smart # smart or legacy (fixed limits)
OPENCODE_COMPACTION_THRESHOLD=0.7 # trigger at 70% of context
OPENCODE_GREP_LIMIT=50 # default match limit
OPENCODE_WEBSEARCH_CHARS=6000 # default context characters
OPENCODE_FILE_READ_LIMIT=400 # default lines for code files
OPENCODE_OUTPUT_BUDGET_RATIO=0.4 # percentage for output
OPENCODE_DUPLICATE_DETECTION=true # enable cache
```
### Per-Model Configuration
```json
{
"models": {
"gpt-4o": {
"context_limit": 128000,
"output_limit": 16384,
"token_budget": {
"code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
"exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
}
},
"claude-sonnet-4-20250514": {
"context_limit": 200000,
"output_limit": 8192,
"supports_prompt_cache": true
}
}
}
```
---
## Migration Guide
### Upgrading from Legacy Token Estimation
```typescript
// Before (4:1 ratio)
const tokens = content.length / 4
// After (tiktoken)
const tokens = Token.estimate(content)
```
### Upgrading from Legacy Truncation
```typescript
// Before (fixed limits)
if (lines.length > 2000 || bytes > 51200) {
truncate(content)
}
// After (smart truncation)
const result = await Truncate.smart(content, {
fileType: detectFileType(content),
maxTokens: 8000,
})
```
---
## Best Practices
1. **Measure First:** Always measure token usage before and after changes
2. **Incrementally Roll Out:** Deploy optimizations gradually
3. **User Control:** Allow users to override defaults
4. **Monitor Quality:** Track task success rates alongside token savings
5. **Fallback Ready:** Have fallback mechanisms for when optimizations fail
---
## References
- **Files:** `src/util/token.ts`, `src/tool/truncation.ts`, `src/session/compaction.ts`, `src/session/prompt.ts`, `src/session/message-v2.ts`, `src/tool/grep.ts`, `src/tool/websearch.ts`, `src/tool/read.ts`, `src/session/system.ts`
- **Dependencies:** `tiktoken`, `@dqbd/tiktoken`
- **Related Issues:** Context overflow handling, token tracking, prompt optimization