feat: Add support for multiple AI providers (bytez, llm7.io, aimlapi.com, routeway.ai, g4f.dev) and fix Chutes loader

- Add custom loaders for bytez, llm7, aimlapi, routeway, and g4f providers - Add provider definitions to models-api.json with sample models - Add provider icon names to types.ts - Chutes loader already exists and should work with CHUTES_API_KEY env var Providers added: - bytez: Uses BYTEZ_API_KEY, OpenAI-compatible - llm7: Uses LLM7_API_KEY (optional), OpenAI-compatible - aimlapi: Uses AIMLAPI_API_KEY, OpenAI-compatible - routeway: Uses ROUTEWAY_API_KEY, OpenAI-compatible - g4f: Uses G4F_API_KEY (optional), free tier available
2026-02-08 16:07:02 +00:00
parent fccfd80b10
commit 1fbf5abce6
13 changed files with 4741 additions and 72 deletions
--- a/opencode/TOKEN_EFFICIENCY_GUIDE.md
+++ b/opencode/TOKEN_EFFICIENCY_GUIDE.md
@@ -0,0 +1,879 @@
+# Token Efficiency Guide for OpenCode CLI
+
+This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.
+
+## Overview
+
+Token efficiency is critical for:
+- Reducing API costs
+- Avoiding context window overflows
+- Improving response latency
+- Enabling longer conversations
+
+OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.
+
+---
+
+## Current Token Management
+
+### Existing Mechanisms
+
+| Mechanism | Location | Description |
+|-----------|----------|-------------|
+| Compaction | `src/session/compaction.ts` | Summarizes conversation history when context is exceeded |
+| Truncation | `src/tool/truncation.ts` | Limits tool outputs to 2000 lines / 50KB |
+| Pruning | `src/session/compaction.ts:41-90` | Removes old tool outputs beyond 40K tokens |
+| Token Estimation | `src/util/token.ts` | Uses 4:1 character-to-token ratio |
+
+### Token Flow
+
+```
+User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API
+```
+
+---
+
+## Recommended Improvements
+
+### 1. Smart Compaction Strategy
+
+**Current Behavior:** Compaction triggers reactively when tokens exceed threshold.
+
+**Improvements:**
+
+- **Predictive Compaction:** Analyze token growth patterns and compact proactively before reaching limits
+- **Configurable Thresholds:** Allow compaction at 70-80% of context instead of 100%
+- **Task-Aware Triggers:** Compact before expensive operations (file edits, builds)
+
+```typescript
+// Example: Predictive compaction logic
+async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
+  const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
+    return acc + Token.estimate(msg.content)
+  }, 0)
+  
+  const trend = recentGrowth / 5
+  const projectedTotal = Token.estimate(allMessages) + (trend * 3)
+  
+  return projectedTotal > contextLimit * 0.7 ? 1 : 0
+}
+```
+
+**Files to Modify:**
+- `src/session/compaction.ts`
+- `src/session/prompt.ts`
+
+---
+
+### 2. Enhanced Token Estimation
+
+**Current Behavior:** Simple 4:1 character ratio estimation.
+
+**Improvements:**
+
+- Use `tiktoken` for accurate OpenAI/Anthropic tokenization
+- Add provider-specific token estimators
+- Cache token counts to avoid recalculation
+
+```typescript
+// src/util/token.ts - Enhanced estimation
+import cl100k_base from "@dqbd/tiktoken/cl100k_base"
+
+const encoder = cl100k_base()
+
+export namespace Token {
+  export function estimate(input: string): number {
+    return encoder.encode(input).length
+  }
+
+  export function estimateMessages(messages: ModelMessage[]): number {
+    const perMessageOverhead = 3 // <|start|> role content <|end|>
+    const base = messages.length * perMessageOverhead
+    const content = messages.reduce((acc, msg) => {
+      if (typeof msg.content === "string") {
+        return acc + encoder.encode(msg.content).length
+      }
+      return acc + encoder.encode(JSON.stringify(msg.content)).length
+    }, 0)
+    return base + content
+  }
+}
+```
+
+**Files to Modify:**
+- `src/util/token.ts`
+- `src/provider/provider.ts` (add token limits)
+
+---
+
+### 3. Intelligent Tool Output Management
+
+**Current Behavior:** Fixed truncation at 2000 lines / 50KB.
+
+**Improvements:**
+
+- **Content-Aware Truncation:**
+  - **Code:** Keep function signatures, truncate bodies
+  - **Logs:** Keep head+tail, truncate middle
+  - **JSON:** Preserve structure, truncate arrays
+  - **Errors:** Never truncate
+
+```typescript
+// src/tool/truncation.ts - Smart truncation
+export async function smartTruncate(
+  content: string,
+  options: SmartTruncateOptions = {}
+): Promise<Truncate.Result> {
+  const { fileType = detectFileType(content), maxTokens = 8000 } = options
+
+  switch (fileType) {
+    case "code":
+      return truncateCode(content, maxTokens)
+    case "logs":
+      return truncateLogs(content, maxTokens)
+    case "json":
+      return truncateJSON(content, maxTokens)
+    case "error":
+      return { content, truncated: false }
+    default:
+      return genericTruncate(content, maxTokens)
+  }
+}
+
+function truncateCode(content: string, maxTokens: number): Truncate.Result {
+  const lines = content.split("\n")
+  const result: string[] = []
+
+  let currentTokens = 0
+  const overheadPerLine = 2 // ~0.5 tokens per line
+
+  for (const line of lines) {
+    const lineTokens = Token.estimate(line)
+    if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
+      break
+    }
+
+    // Always include function signatures
+    if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
+      result.push(line)
+      currentTokens += lineTokens
+      continue
+    }
+
+    // Skip implementation details after max reached
+    if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
+      result.push(`  // ${lines.length - result.length} lines truncated...`)
+      break
+    }
+
+    result.push(line)
+    currentTokens += lineTokens
+  }
+
+  return formatResult(result, content)
+}
+```
+
+**Files to Modify:**
+- `src/tool/truncation.ts`
+- Add `src/tool/truncation/code.ts`
+- Add `src/tool/truncation/logs.ts`
+
+---
+
+### 4. Message History Optimization
+
+**Current Behavior:** Full message history sent until compaction.
+
+**Improvements:**
+
+- **Importance Scoring:** Prioritize messages by importance
+- **Selective History:** Remove low-value messages
+- **Ephemeral Messages:** Mark transient context for removal
+
+```typescript
+// Message importance scoring
+const MESSAGE_IMPORTANCE = {
+  user_request: 100,
+  file_edit: 90,
+  agent_completion: 80,
+  tool_success: 60,
+  tool_output: 50,
+  intermediate_result: 30,
+  system_reminder: 20,
+}
+
+function scoreMessage(message: MessageV2.WithParts): number {
+  let score = 0
+
+  // Role-based scoring
+  if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
+  if (message.info.role === "assistant") {
+    if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
+      score += MESSAGE_IMPORTANCE.file_edit
+    } else {
+      score += MESSAGE_IMPORTANCE.agent_completion
+    }
+  }
+
+  // Tool call scoring
+  for (const part of message.parts) {
+    if (part.type === "tool") {
+      const toolScore = getToolImportanceScore(part.tool)
+      score += toolScore
+    }
+  }
+
+  return score
+}
+
+// Selective history retention
+async function getOptimizedHistory(
+  sessionID: string,
+  maxTokens: number
+): Promise<MessageV2.WithParts[]> {
+  const messages = await Session.messages({ sessionID })
+  const scored = messages.map(msg => ({
+    message: msg,
+    score: scoreMessage(msg),
+    tokens: Token.estimate(msg),
+  }))
+
+  scored.sort((a, b) => b.score - a.score)
+
+  const result: MessageV2.WithParts[] = []
+  let usedTokens = 0
+
+  for (const item of scored) {
+    if (usedTokens + item.tokens > maxTokens) break
+
+    // Always keep last user message
+    if (item.message.info.role === "user" &&
+        result.length > 0 &&
+        result[result.length - 1].info.id < item.message.info.id) {
+      result.push(item.message)
+      usedTokens += item.tokens
+      continue
+    }
+
+    // Keep if high importance score
+    if (item.score >= 50) {
+      result.push(item.message)
+      usedTokens += item.tokens
+    }
+  }
+
+  return result.reverse()
+}
+```
+
+**Files to Modify:**
+- `src/session/message-v2.ts`
+- `src/session/prompt.ts`
+
+---
+
+### 5. System Prompt Compression
+
+**Current Behavior:** Provider-specific prompts loaded from text files.
+
+**Improvements:**
+
+- Audit and compress prompts
+- Move optional instructions to first user message
+- Create "minimal" mode for quick tasks
+
+```typescript
+// src/session/system.ts - Compressed prompts
+export namespace SystemPrompt {
+  // Core instructions (always sent)
+  const CORE_PROMPT = `You are an expert software engineering assistant.`
+
+  // Optional instructions (sent based on context)
+  const OPTIONAL_PROMPTS = {
+    code_quality: `Focus on clean, maintainable code with proper error handling.`,
+    testing: `Always write tests for new functionality.`,
+    documentation: `Document complex logic and API surfaces.`,
+  }
+
+  export async function getCompressedPrompt(
+    model: Provider.Model,
+    context: PromptContext
+  ): Promise<string[]> {
+    const prompts: string[] = [CORE_PROMPT]
+
+    // Add model-specific base prompt
+    const basePrompt = getBasePrompt(model)
+    prompts.push(basePrompt)
+
+    // Conditionally add optional prompts
+    if (context.needsQualityFocus) {
+      prompts.push(OPTIONAL_PROMPTS.code_quality)
+    }
+    if (context.needsTesting) {
+      prompts.push(OPTIONAL_PROMPTS.testing)
+    }
+
+    return prompts
+  }
+}
+```
+
+**Files to Modify:**
+- `src/session/system.ts`
+- `src/session/prompt/*.txt`
+
+---
+
+### 6. Smart Grep Result Limits
+
+**Current Behavior:** Hard limit of 100 matches.
+
+**Improvements:**
+
+- Reduce default to 50 matches
+- Add priority scoring based on relevance
+- Group matches by file
+
+```typescript
+// src/tool/grep.ts - Enhanced result handling
+const DEFAULT_MATCH_LIMIT = 50
+const PRIORITY_WEIGHTS = {
+  recently_modified: 1.5,
+  same_directory: 1.3,
+  matching_extension: 1.2,
+  exact_match: 1.1,
+}
+
+interface MatchPriority {
+  match: Match
+  score: number
+}
+
+function scoreMatch(match: Match, context: GrepContext): number {
+  let score = 1.0
+
+  // Recently modified files
+  const fileAge = Date.now() - match.modTime
+  if (fileAge < 7 * 24 * 60 * 60 * 1000) {
+    score *= PRIORITY_WEIGHTS.recently_modified
+  }
+
+  // Same directory as current work
+  if (match.path.startsWith(context.cwd)) {
+    score *= PRIORITY_WEIGHTS.same_directory
+  }
+
+  // Matching extension
+  if (context.targetExtensions.includes(path.extname(match.path))) {
+    score *= PRIORITY_WEIGHTS.matching_extension
+  }
+
+  return score
+}
+
+export async function execute(params: GrepParams, ctx: Tool.Context) {
+  const results = await ripgrep(params)
+  const scored: MatchPriority[] = results.map(match => ({
+    match,
+    score: scoreMatch(match, ctx),
+  }))
+
+  scored.sort((a, b) => b.score - a.score)
+
+  const limit = params.limit ?? DEFAULT_MATCH_LIMIT
+  const topMatches = scored.slice(0, limit)
+
+  return formatGroupedOutput(topMatches)
+}
+
+function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
+  const byFile = groupBy(matches, m => path.dirname(m.match.path))
+
+  const output: string[] = []
+  output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)
+
+  for (const [dir, fileMatches] of byFile) {
+    output.push(`\n${dir}:`)
+    for (const { match, score } of fileMatches.slice(0, 10)) {
+      const relevance = score > 1.0 ? " [high relevance]" : ""
+      output.push(`  Line ${match.lineNum}: ${match.lineText}${relevance}`)
+    }
+    if (fileMatches.length > 10) {
+      output.push(`  ... and ${fileMatches.length - 10} more`)
+    }
+  }
+
+  return { output: output.join("\n") }
+}
+```
+
+**Files to Modify:**
+- `src/tool/grep.ts`
+
+---
+
+### 7. Web Search Context Optimization
+
+**Current Behavior:** 10,000 character default limit.
+
+**Improvements:**
+
+- Reduce default to 6,000 characters
+- Content quality scoring
+- Query-relevant extraction
+
+```typescript
+// src/tool/websearch.ts - Optimized content extraction
+const DEFAULT_CONTEXT_CHARS = 6000
+
+interface ContentQualityScore {
+  source: string
+  score: number
+  relevantSections: string[]
+}
+
+function scoreAndExtract(
+  content: string,
+  query: string
+): ContentQualityScore {
+  const paragraphs = content.split(/\n\n+/)
+  const queryTerms = query.toLowerCase().split(/\s+/)
+
+  const scored = paragraphs.map(para => {
+    const lower = para.toLowerCase()
+    const termMatches = queryTerms.filter(term => lower.includes(term)).length
+    const density = termMatches / para.length
+    const position = paragraphs.indexOf(para) / paragraphs.length
+
+    return {
+      para,
+      score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
+    }
+  })
+
+  scored.sort((a, b) => b.score - a.score)
+
+  const relevant: string[] = []
+  let usedChars = 0
+
+  for (const { para } of scored) {
+    if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
+    relevant.push(para)
+    usedChars += para.length
+  }
+
+  return {
+    source: content.substring(0, DEFAULT_CONTEXT_CHARS),
+    score: scored[0]?.score ?? 0,
+    relevantSections: relevant,
+  }
+}
+
+export async function execute(params: WebSearchParams, ctx: Tool.Context) {
+  const response = await exaSearch({
+    query: params.query,
+    numResults: params.numResults ?? 8,
+    type: params.type ?? "auto",
+    livecrawl: params.livecrawl ?? "fallback",
+    contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
+  })
+
+  const optimized = optimizeResults(response.results, params.query)
+
+  return {
+    output: formatOptimizedResults(optimized),
+    metadata: {
+      originalChars: response.totalChars,
+      optimizedChars: optimized.totalChars,
+      savings: 1 - (optimized.totalChars / response.totalChars),
+    },
+  }
+}
+```
+
+**Files to Modify:**
+- `src/tool/websearch.ts`
+
+---
+
+### 8. File Read Optimization
+
+**Current Behavior:** Full file content sent unless offset/limit specified.
+
+**Improvements:**
+
+- Default limits based on file type
+- Smart offset detection (function boundaries)
+
+```typescript
+// src/tool/read.ts - Optimized file reading
+const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
+  ".json": { defaultLimit: Infinity, truncate: false },
+  ".md": { defaultLimit: 2000, truncate: true },
+  ".ts": { defaultLimit: 400, truncate: true },
+  ".js": { defaultLimit: 400, truncate: true },
+  ".py": { defaultLimit: 400, truncate: true },
+  ".yml": { defaultLimit: 500, truncate: true },
+  ".yaml": { defaultLimit: 500, truncate: true },
+  ".txt": { defaultLimit: 1000, truncate: true },
+  default: { defaultLimit: 300, truncate: true },
+}
+
+export async function execute(params: ReadParams, ctx: Tool.Context) {
+  const ext = path.extname(params.filePath)
+  const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default
+
+  const offset = params.offset ?? 0
+  const limit = params.limit ?? config.defaultLimit
+
+  const file = Bun.file(params.filePath)
+  const content = await file.text()
+  const lines = content.split("\n")
+
+  if (!config.truncate || lines.length <= limit + offset) {
+    return {
+      output: content,
+      attachments: [],
+    }
+  }
+
+  const displayedLines = lines.slice(offset, offset + limit)
+  const output = [
+    ...displayedLines,
+    "",
+    `... ${lines.length - displayedLines.length} lines truncated ...`,
+    "",
+    `File: ${params.filePath}`,
+    `Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
+  ].join("\n")
+
+  return {
+    output,
+    attachments: [{
+      type: "file",
+      filename: params.filePath,
+      mime: mime.lookup(params.filePath) || "text/plain",
+      url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
+    }],
+  }
+}
+```
+
+**Files to Modify:**
+- `src/tool/read.ts`
+
+---
+
+### 9. Context Window Budgeting
+
+**Current Behavior:** Fixed 32K output token reservation.
+
+**Improvements:**
+
+- Dynamic budget allocation based on task type
+- Model-specific optimizations
+
+```typescript
+// src/session/prompt.ts - Dynamic budget allocation
+const TASK_BUDGETS: Record<string, TaskBudget> = {
+  code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
+  exploration: { inputRatio: 0.8, outputRatio: 0.2 },
+  qa: { inputRatio: 0.7, outputRatio: 0.3 },
+  refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
+  debugging: { inputRatio: 0.7, outputRatio: 0.3 },
+  default: { inputRatio: 0.6, outputRatio: 0.4 },
+}
+
+interface BudgetCalculation {
+  inputBudget: number
+  outputBudget: number
+  totalBudget: number
+}
+
+function calculateBudget(
+  model: Provider.Model,
+  taskType: string,
+  estimatedInputTokens: number
+): BudgetCalculation {
+  const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
+  const modelContext = model.limit.context
+  const modelMaxOutput = model.limit.output
+
+  // Dynamic budget based on task type
+  const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
+  const outputBudget = Math.min(
+    modelMaxOutput,
+    Math.floor(baseBudget * config.outputRatio),
+    SessionPrompt.OUTPUT_TOKEN_MAX
+  )
+  const inputBudget = Math.floor(baseBudget * config.inputRatio)
+
+  return {
+    inputBudget,
+    outputBudget,
+    totalBudget: inputBudget + outputBudget,
+  }
+}
+
+async function checkAndAdjustPrompt(
+  messages: ModelMessage[],
+  budget: BudgetCalculation
+): Promise<ModelMessage[]> {
+  const currentTokens = Token.estimateMessages(messages)
+
+  if (currentTokens <= budget.inputBudget) {
+    return messages
+  }
+
+  // Need to reduce - prioritize recent messages
+  const result = await pruneMessagesToBudget(messages, budget.inputBudget)
+  return result
+}
+```
+
+**Files to Modify:**
+- `src/session/prompt.ts`
+- `src/session/compaction.ts`
+
+---
+
+### 10. Duplicate Detection
+
+**Current Behavior:** No deduplication of content.
+
+**Improvements:**
+
+- Hash and track tool outputs
+- Skip identical subsequent calls
+- Cache read file contents
+
+```typescript
+// src/session/duplicate-detection.ts
+const outputHashCache = new Map<string, string>()
+
+function getContentHash(content: string): string {
+  return Bun.hash(content, "sha256").toString()
+}
+
+async function deduplicateToolOutput(
+  toolId: string,
+  input: Record<string, unknown>,
+  content: string
+): Promise<{ isDuplicate: boolean; output: string }> {
+  const hash = getContentHash(content)
+  const key = `${toolId}:${JSON.stringify(input)}:${hash}`
+
+  if (outputHashCache.has(key)) {
+    return {
+      isDuplicate: true,
+      output: outputHashCache.get(key)!,
+    }
+  }
+
+  outputHashCache.set(key, content)
+  return { isDuplicate: false, output: content }
+}
+
+// In tool execution
+async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
+  const content = await tool.execute(args)
+
+  const { isDuplicate, output } = await deduplicateToolOutput(
+    tool.id,
+    args,
+    content.output
+  )
+
+  if (isDuplicate) {
+    log.debug("Skipping duplicate tool output", { tool: tool.id })
+    return {
+      ...content,
+      output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
+      metadata: { ...content.metadata, duplicate: true },
+    }
+  }
+
+  return content
+}
+```
+
+**Files to Modify:**
+- Add `src/session/duplicate-detection.ts`
+- `src/tool/tool.ts`
+
+---
+
+## Implementation Priority
+
+### Phase 1: High Impact, Low Risk
+
+| Priority | Improvement | Estimated Token Savings | Risk |
+|----------|-------------|------------------------|------|
+| 1 | Enhanced Token Estimation | 5-15% | Low |
+| 2 | Smart Grep Limits | 10-20% | Low |
+| 3 | Web Search Optimization | 20-30% | Low |
+| 4 | System Prompt Compression | 5-10% | Low |
+
+### Phase 2: Medium Impact, Medium Risk
+
+| Priority | Improvement | Estimated Token Savings | Risk |
+|----------|-------------|------------------------|------|
+| 5 | Tool Output Management | 15-25% | Medium |
+| 6 | Message History Optimization | 20-30% | Medium |
+| 7 | File Read Limits | 10-20% | Medium |
+
+### Phase 3: High Impact, Higher Complexity
+
+| Priority | Improvement | Estimated Token Savings | Risk |
+|----------|-------------|------------------------|------|
+| 8 | Smart Compaction | 25-40% | High |
+| 9 | Context Budgeting | 15-25% | High |
+| 10 | Duplicate Detection | 10-15% | Medium |
+
+---
+
+## Quality Preservation
+
+### Testing Strategy
+
+1. **A/B Testing:** Compare outputs before/after each optimization
+2. **Quality Metrics:** Track success rate, user satisfaction, task completion
+3. **Rollback Mechanism:** Config flags to disable optimizations per-session
+
+```typescript
+// Config schema for optimization controls
+const OptimizationConfig = z.object({
+  smart_compaction: z.boolean().default(true),
+  enhanced_estimation: z.boolean().default(true),
+  smart_truncation: z.boolean().default(true),
+  message_pruning: z.boolean().default(true),
+  system_prompt_compression: z.boolean().default(true),
+  grep_optimization: z.boolean().default(true),
+  websearch_optimization: z.boolean().default(true),
+  file_read_limits: z.boolean().default(true),
+  context_budgeting: z.boolean().default(true),
+  duplicate_detection: z.boolean().default(true),
+})
+
+// Usage
+const config = await Config.get()
+const optimizations = config.optimizations ?? {}
+```
+
+### Monitoring
+
+```typescript
+// Token efficiency metrics
+export async function trackTokenMetrics(sessionID: string) {
+  const messages = await Session.messages({ sessionID })
+
+  const metrics = {
+    totalTokens: 0,
+    inputTokens: 0,
+    outputTokens: 0,
+    optimizationSavings: 0,
+    compactionCount: 0,
+    truncationCount: 0,
+  }
+
+  for (const msg of messages) {
+    metrics.totalTokens += msg.tokens.input + msg.tokens.output
+    metrics.inputTokens += msg.tokens.input
+    metrics.outputTokens += msg.tokens.output
+
+    if (msg.info.mode === "compaction") {
+      metrics.compactionCount++
+    }
+  }
+
+  return metrics
+}
+```
+
+---
+
+## Configuration
+
+### Environment Variables
+
+```bash
+# Token optimization controls
+OPENCODE_TOKEN_ESTIMATION=accurate    # accurate (tiktoken) or legacy (4:1)
+OPENCODE_TRUNCATION_MODE=smart        # smart or legacy (fixed limits)
+OPENCODE_COMPACTION_THRESHOLD=0.7      # trigger at 70% of context
+OPENCODE_GREP_LIMIT=50                 # default match limit
+OPENCODE_WEBSEARCH_CHARS=6000          # default context characters
+OPENCODE_FILE_READ_LIMIT=400          # default lines for code files
+OPENCODE_OUTPUT_BUDGET_RATIO=0.4      # percentage for output
+OPENCODE_DUPLICATE_DETECTION=true     # enable cache
+```
+
+### Per-Model Configuration
+
+```json
+{
+  "models": {
+    "gpt-4o": {
+      "context_limit": 128000,
+      "output_limit": 16384,
+      "token_budget": {
+        "code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
+        "exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
+      }
+    },
+    "claude-sonnet-4-20250514": {
+      "context_limit": 200000,
+      "output_limit": 8192,
+      "supports_prompt_cache": true
+    }
+  }
+}
+```
+
+---
+
+## Migration Guide
+
+### Upgrading from Legacy Token Estimation
+
+```typescript
+// Before (4:1 ratio)
+const tokens = content.length / 4
+
+// After (tiktoken)
+const tokens = Token.estimate(content)
+```
+
+### Upgrading from Legacy Truncation
+
+```typescript
+// Before (fixed limits)
+if (lines.length > 2000 || bytes > 51200) {
+  truncate(content)
+}
+
+// After (smart truncation)
+const result = await Truncate.smart(content, {
+  fileType: detectFileType(content),
+  maxTokens: 8000,
+})
+```
+
+---
+
+## Best Practices
+
+1. **Measure First:** Always measure token usage before and after changes
+2. **Incrementally Roll Out:** Deploy optimizations gradually
+3. **User Control:** Allow users to override defaults
+4. **Monitor Quality:** Track task success rates alongside token savings
+5. **Fallback Ready:** Have fallback mechanisms for when optimizations fail
+
+---
+
+## References
+
+- **Files:** `src/util/token.ts`, `src/tool/truncation.ts`, `src/session/compaction.ts`, `src/session/prompt.ts`, `src/session/message-v2.ts`, `src/tool/grep.ts`, `src/tool/websearch.ts`, `src/tool/read.ts`, `src/session/system.ts`
+- **Dependencies:** `tiktoken`, `@dqbd/tiktoken`
+- **Related Issues:** Context overflow handling, token tracking, prompt optimization