shopify-ai-backup/opencode/TOKEN_EFFICIENCY_GUIDE.md

# Token Efficiency Guide for OpenCode CLI

This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.

## Overview

Token efficiency is critical for:
- Reducing API costs
- Avoiding context window overflows
- Improving response latency
- Enabling longer conversations

OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.

---

## Current Token Management

### Existing Mechanisms

| Mechanism | Location | Description |
|-----------|----------|-------------|
| Compaction | `src/session/compaction.ts` | Summarizes conversation history when context is exceeded |
| Truncation | `src/tool/truncation.ts` | Limits tool outputs to 2000 lines / 50KB |
| Pruning | `src/session/compaction.ts:41-90` | Removes old tool outputs beyond 40K tokens |
| Token Estimation | `src/util/token.ts` | Uses 4:1 character-to-token ratio |

### Token Flow

```
User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API
```

---

## Recommended Improvements

### 1. Smart Compaction Strategy

**Current Behavior:** Compaction triggers reactively when tokens exceed threshold.

**Improvements:**

- **Predictive Compaction:** Analyze token growth patterns and compact proactively before reaching limits
- **Configurable Thresholds:** Allow compaction at 70-80% of context instead of 100%
- **Task-Aware Triggers:** Compact before expensive operations (file edits, builds)

```typescript
// Example: Predictive compaction logic
async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
  const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
    return acc + Token.estimate(msg.content)
  }, 0)

  const trend = recentGrowth / 5
  const projectedTotal = Token.estimate(allMessages) + (trend * 3)

  return projectedTotal > contextLimit * 0.7 ? 1 : 0
}
```

**Files to Modify:**
- `src/session/compaction.ts`
- `src/session/prompt.ts`

---

### 2. Enhanced Token Estimation

**Current Behavior:** Simple 4:1 character ratio estimation.

**Improvements:**

- Use `tiktoken` for accurate OpenAI/Anthropic tokenization
- Add provider-specific token estimators
- Cache token counts to avoid recalculation

```typescript
// src/util/token.ts - Enhanced estimation
import cl100k_base from "@dqbd/tiktoken/cl100k_base"

const encoder = cl100k_base()

export namespace Token {
  export function estimate(input: string): number {
    return encoder.encode(input).length
  }

  export function estimateMessages(messages: ModelMessage[]): number {
    const perMessageOverhead = 3 // <|start|> role content <|end|>
    const base = messages.length * perMessageOverhead
    const content = messages.reduce((acc, msg) => {
      if (typeof msg.content === "string") {
        return acc + encoder.encode(msg.content).length
      }
      return acc + encoder.encode(JSON.stringify(msg.content)).length
    }, 0)
    return base + content
  }
}
```

**Files to Modify:**
- `src/util/token.ts`
- `src/provider/provider.ts` (add token limits)

---

### 3. Intelligent Tool Output Management

**Current Behavior:** Fixed truncation at 2000 lines / 50KB.

**Improvements:**

- **Content-Aware Truncation:**
  - **Code:** Keep function signatures, truncate bodies
  - **Logs:** Keep head+tail, truncate middle
  - **JSON:** Preserve structure, truncate arrays
  - **Errors:** Never truncate

```typescript
// src/tool/truncation.ts - Smart truncation
export async function smartTruncate(
  content: string,
  options: SmartTruncateOptions = {}
): Promise<Truncate.Result> {
  const { fileType = detectFileType(content), maxTokens = 8000 } = options

  switch (fileType) {
    case "code":
      return truncateCode(content, maxTokens)
    case "logs":
      return truncateLogs(content, maxTokens)
    case "json":
      return truncateJSON(content, maxTokens)
    case "error":
      return { content, truncated: false }
    default:
      return genericTruncate(content, maxTokens)
  }
}

function truncateCode(content: string, maxTokens: number): Truncate.Result {
  const lines = content.split("\n")
  const result: string[] = []

  let currentTokens = 0
  const overheadPerLine = 2 // ~0.5 tokens per line

  for (const line of lines) {
    const lineTokens = Token.estimate(line)
    if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
      break
    }

    // Always include function signatures
    if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
      result.push(line)
      currentTokens += lineTokens
      continue
    }

    // Skip implementation details after max reached
    if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
      result.push(`  // ${lines.length - result.length} lines truncated...`)
      break
    }

    result.push(line)
    currentTokens += lineTokens
  }

  return formatResult(result, content)
}
```

**Files to Modify:**
- `src/tool/truncation.ts`
- Add `src/tool/truncation/code.ts`
- Add `src/tool/truncation/logs.ts`

---

### 4. Message History Optimization

**Current Behavior:** Full message history sent until compaction.

**Improvements:**

- **Importance Scoring:** Prioritize messages by importance
- **Selective History:** Remove low-value messages
- **Ephemeral Messages:** Mark transient context for removal

```typescript
// Message importance scoring
const MESSAGE_IMPORTANCE = {
  user_request: 100,
  file_edit: 90,
  agent_completion: 80,
  tool_success: 60,
  tool_output: 50,
  intermediate_result: 30,
  system_reminder: 20,
}

function scoreMessage(message: MessageV2.WithParts): number {
  let score = 0

  // Role-based scoring
  if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
  if (message.info.role === "assistant") {
    if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
      score += MESSAGE_IMPORTANCE.file_edit
    } else {
      score += MESSAGE_IMPORTANCE.agent_completion
    }
  }

  // Tool call scoring
  for (const part of message.parts) {
    if (part.type === "tool") {
      const toolScore = getToolImportanceScore(part.tool)
      score += toolScore
    }
  }

  return score
}

// Selective history retention
async function getOptimizedHistory(
  sessionID: string,
  maxTokens: number
): Promise<MessageV2.WithParts[]> {
  const messages = await Session.messages({ sessionID })
  const scored = messages.map(msg => ({
    message: msg,
    score: scoreMessage(msg),
    tokens: Token.estimate(msg),
  }))

  scored.sort((a, b) => b.score - a.score)

  const result: MessageV2.WithParts[] = []
  let usedTokens = 0

  for (const item of scored) {
    if (usedTokens + item.tokens > maxTokens) break

    // Always keep last user message
    if (item.message.info.role === "user" &&
        result.length > 0 &&
        result[result.length - 1].info.id < item.message.info.id) {
      result.push(item.message)
      usedTokens += item.tokens
      continue
    }

    // Keep if high importance score
    if (item.score >= 50) {
      result.push(item.message)
      usedTokens += item.tokens
    }
  }

  return result.reverse()
}
```

**Files to Modify:**
- `src/session/message-v2.ts`
- `src/session/prompt.ts`

---

### 5. System Prompt Compression

**Current Behavior:** Provider-specific prompts loaded from text files.

**Improvements:**

- Audit and compress prompts
- Move optional instructions to first user message
- Create "minimal" mode for quick tasks

```typescript
// src/session/system.ts - Compressed prompts
export namespace SystemPrompt {
  // Core instructions (always sent)
  const CORE_PROMPT = `You are an expert software engineering assistant.`

  // Optional instructions (sent based on context)
  const OPTIONAL_PROMPTS = {
    code_quality: `Focus on clean, maintainable code with proper error handling.`,
    testing: `Always write tests for new functionality.`,
    documentation: `Document complex logic and API surfaces.`,
  }

  export async function getCompressedPrompt(
    model: Provider.Model,
    context: PromptContext
  ): Promise<string[]> {
    const prompts: string[] = [CORE_PROMPT]

    // Add model-specific base prompt
    const basePrompt = getBasePrompt(model)
    prompts.push(basePrompt)

    // Conditionally add optional prompts
    if (context.needsQualityFocus) {
      prompts.push(OPTIONAL_PROMPTS.code_quality)
    }
    if (context.needsTesting) {
      prompts.push(OPTIONAL_PROMPTS.testing)
    }

    return prompts
  }
}
```

**Files to Modify:**
- `src/session/system.ts`
- `src/session/prompt/*.txt`

---

### 6. Smart Grep Result Limits

**Current Behavior:** Hard limit of 100 matches.

**Improvements:**

- Reduce default to 50 matches
- Add priority scoring based on relevance
- Group matches by file

```typescript
// src/tool/grep.ts - Enhanced result handling
const DEFAULT_MATCH_LIMIT = 50
const PRIORITY_WEIGHTS = {
  recently_modified: 1.5,
  same_directory: 1.3,
  matching_extension: 1.2,
  exact_match: 1.1,
}

interface MatchPriority {
  match: Match
  score: number
}

function scoreMatch(match: Match, context: GrepContext): number {
  let score = 1.0

  // Recently modified files
  const fileAge = Date.now() - match.modTime
  if (fileAge < 7 * 24 * 60 * 60 * 1000) {
    score *= PRIORITY_WEIGHTS.recently_modified
  }

  // Same directory as current work
  if (match.path.startsWith(context.cwd)) {
    score *= PRIORITY_WEIGHTS.same_directory
  }

  // Matching extension
  if (context.targetExtensions.includes(path.extname(match.path))) {
    score *= PRIORITY_WEIGHTS.matching_extension
  }

  return score
}

export async function execute(params: GrepParams, ctx: Tool.Context) {
  const results = await ripgrep(params)
  const scored: MatchPriority[] = results.map(match => ({
    match,
    score: scoreMatch(match, ctx),
  }))

  scored.sort((a, b) => b.score - a.score)

  const limit = params.limit ?? DEFAULT_MATCH_LIMIT
  const topMatches = scored.slice(0, limit)

  return formatGroupedOutput(topMatches)
}

function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
  const byFile = groupBy(matches, m => path.dirname(m.match.path))

  const output: string[] = []
  output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)

  for (const [dir, fileMatches] of byFile) {
    output.push(`\n${dir}:`)
    for (const { match, score } of fileMatches.slice(0, 10)) {
      const relevance = score > 1.0 ? " [high relevance]" : ""
      output.push(`  Line ${match.lineNum}: ${match.lineText}${relevance}`)
    }
    if (fileMatches.length > 10) {
      output.push(`  ... and ${fileMatches.length - 10} more`)
    }
  }

  return { output: output.join("\n") }
}
```

**Files to Modify:**
- `src/tool/grep.ts`

---

### 7. Web Search Context Optimization

**Current Behavior:** 10,000 character default limit.

**Improvements:**

- Reduce default to 6,000 characters
- Content quality scoring
- Query-relevant extraction

```typescript
// src/tool/websearch.ts - Optimized content extraction
const DEFAULT_CONTEXT_CHARS = 6000

interface ContentQualityScore {
  source: string
  score: number
  relevantSections: string[]
}

function scoreAndExtract(
  content: string,
  query: string
): ContentQualityScore {
  const paragraphs = content.split(/\n\n+/)
  const queryTerms = query.toLowerCase().split(/\s+/)

  const scored = paragraphs.map(para => {
    const lower = para.toLowerCase()
    const termMatches = queryTerms.filter(term => lower.includes(term)).length
    const density = termMatches / para.length
    const position = paragraphs.indexOf(para) / paragraphs.length

    return {
      para,
      score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
    }
  })

  scored.sort((a, b) => b.score - a.score)

  const relevant: string[] = []
  let usedChars = 0

  for (const { para } of scored) {
    if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
    relevant.push(para)
    usedChars += para.length
  }

  return {
    source: content.substring(0, DEFAULT_CONTEXT_CHARS),
    score: scored[0]?.score ?? 0,
    relevantSections: relevant,
  }
}

export async function execute(params: WebSearchParams, ctx: Tool.Context) {
  const response = await exaSearch({
    query: params.query,
    numResults: params.numResults ?? 8,
    type: params.type ?? "auto",
    livecrawl: params.livecrawl ?? "fallback",
    contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
  })

  const optimized = optimizeResults(response.results, params.query)

  return {
    output: formatOptimizedResults(optimized),
    metadata: {
      originalChars: response.totalChars,
      optimizedChars: optimized.totalChars,
      savings: 1 - (optimized.totalChars / response.totalChars),
    },
  }
}
```

**Files to Modify:**
- `src/tool/websearch.ts`

---

### 8. File Read Optimization

**Current Behavior:** Full file content sent unless offset/limit specified.

**Improvements:**

- Default limits based on file type
- Smart offset detection (function boundaries)

```typescript
// src/tool/read.ts - Optimized file reading
const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
  ".json": { defaultLimit: Infinity, truncate: false },
  ".md": { defaultLimit: 2000, truncate: true },
  ".ts": { defaultLimit: 400, truncate: true },
  ".js": { defaultLimit: 400, truncate: true },
  ".py": { defaultLimit: 400, truncate: true },
  ".yml": { defaultLimit: 500, truncate: true },
  ".yaml": { defaultLimit: 500, truncate: true },
  ".txt": { defaultLimit: 1000, truncate: true },
  default: { defaultLimit: 300, truncate: true },
}

export async function execute(params: ReadParams, ctx: Tool.Context) {
  const ext = path.extname(params.filePath)
  const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default

  const offset = params.offset ?? 0
  const limit = params.limit ?? config.defaultLimit

  const file = Bun.file(params.filePath)
  const content = await file.text()
  const lines = content.split("\n")

  if (!config.truncate || lines.length <= limit + offset) {
    return {
      output: content,
      attachments: [],
    }
  }

  const displayedLines = lines.slice(offset, offset + limit)
  const output = [
    ...displayedLines,
    "",
    `... ${lines.length - displayedLines.length} lines truncated ...`,
    "",
    `File: ${params.filePath}`,
    `Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
  ].join("\n")

  return {
    output,
    attachments: [{
      type: "file",
      filename: params.filePath,
      mime: mime.lookup(params.filePath) || "text/plain",
      url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
    }],
  }
}
```

**Files to Modify:**
- `src/tool/read.ts`

---

### 9. Context Window Budgeting

**Current Behavior:** Fixed 32K output token reservation.

**Improvements:**

- Dynamic budget allocation based on task type
- Model-specific optimizations

```typescript
// src/session/prompt.ts - Dynamic budget allocation
const TASK_BUDGETS: Record<string, TaskBudget> = {
  code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
  exploration: { inputRatio: 0.8, outputRatio: 0.2 },
  qa: { inputRatio: 0.7, outputRatio: 0.3 },
  refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
  debugging: { inputRatio: 0.7, outputRatio: 0.3 },
  default: { inputRatio: 0.6, outputRatio: 0.4 },
}

interface BudgetCalculation {
  inputBudget: number
  outputBudget: number
  totalBudget: number
}

function calculateBudget(
  model: Provider.Model,
  taskType: string,
  estimatedInputTokens: number
): BudgetCalculation {
  const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
  const modelContext = model.limit.context
  const modelMaxOutput = model.limit.output

  // Dynamic budget based on task type
  const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
  const outputBudget = Math.min(
    modelMaxOutput,
    Math.floor(baseBudget * config.outputRatio),
    SessionPrompt.OUTPUT_TOKEN_MAX
  )
  const inputBudget = Math.floor(baseBudget * config.inputRatio)

  return {
    inputBudget,
    outputBudget,
    totalBudget: inputBudget + outputBudget,
  }
}

async function checkAndAdjustPrompt(
  messages: ModelMessage[],
  budget: BudgetCalculation
): Promise<ModelMessage[]> {
  const currentTokens = Token.estimateMessages(messages)

  if (currentTokens <= budget.inputBudget) {
    return messages
  }

  // Need to reduce - prioritize recent messages
  const result = await pruneMessagesToBudget(messages, budget.inputBudget)
  return result
}
```

**Files to Modify:**
- `src/session/prompt.ts`
- `src/session/compaction.ts`

---

### 10. Duplicate Detection

**Current Behavior:** No deduplication of content.

**Improvements:**

- Hash and track tool outputs
- Skip identical subsequent calls
- Cache read file contents

```typescript
// src/session/duplicate-detection.ts
const outputHashCache = new Map<string, string>()

function getContentHash(content: string): string {
  return Bun.hash(content, "sha256").toString()
}

async function deduplicateToolOutput(
  toolId: string,
  input: Record<string, unknown>,
  content: string
): Promise<{ isDuplicate: boolean; output: string }> {
  const hash = getContentHash(content)
  const key = `${toolId}:${JSON.stringify(input)}:${hash}`

  if (outputHashCache.has(key)) {
    return {
      isDuplicate: true,
      output: outputHashCache.get(key)!,
    }
  }

  outputHashCache.set(key, content)
  return { isDuplicate: false, output: content }
}

// In tool execution
async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
  const content = await tool.execute(args)

  const { isDuplicate, output } = await deduplicateToolOutput(
    tool.id,
    args,
    content.output
  )

  if (isDuplicate) {
    log.debug("Skipping duplicate tool output", { tool: tool.id })
    return {
      ...content,
      output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
      metadata: { ...content.metadata, duplicate: true },
    }
  }

  return content
}
```

**Files to Modify:**
- Add `src/session/duplicate-detection.ts`
- `src/tool/tool.ts`

---

## Implementation Priority

### Phase 1: High Impact, Low Risk

| Priority | Improvement | Estimated Token Savings | Risk |
|----------|-------------|------------------------|------|
| 1 | Enhanced Token Estimation | 5-15% | Low |
| 2 | Smart Grep Limits | 10-20% | Low |
| 3 | Web Search Optimization | 20-30% | Low |
| 4 | System Prompt Compression | 5-10% | Low |

### Phase 2: Medium Impact, Medium Risk

| Priority | Improvement | Estimated Token Savings | Risk |
|----------|-------------|------------------------|------|
| 5 | Tool Output Management | 15-25% | Medium |
| 6 | Message History Optimization | 20-30% | Medium |
| 7 | File Read Limits | 10-20% | Medium |

### Phase 3: High Impact, Higher Complexity

| Priority | Improvement | Estimated Token Savings | Risk |
|----------|-------------|------------------------|------|
| 8 | Smart Compaction | 25-40% | High |
| 9 | Context Budgeting | 15-25% | High |
| 10 | Duplicate Detection | 10-15% | Medium |

---

## Quality Preservation

### Testing Strategy

1. **A/B Testing:** Compare outputs before/after each optimization
2. **Quality Metrics:** Track success rate, user satisfaction, task completion
3. **Rollback Mechanism:** Config flags to disable optimizations per-session

```typescript
// Config schema for optimization controls
const OptimizationConfig = z.object({
  smart_compaction: z.boolean().default(true),
  enhanced_estimation: z.boolean().default(true),
  smart_truncation: z.boolean().default(true),
  message_pruning: z.boolean().default(true),
  system_prompt_compression: z.boolean().default(true),
  grep_optimization: z.boolean().default(true),
  websearch_optimization: z.boolean().default(true),
  file_read_limits: z.boolean().default(true),
  context_budgeting: z.boolean().default(true),
  duplicate_detection: z.boolean().default(true),
})

// Usage
const config = await Config.get()
const optimizations = config.optimizations ?? {}
```

### Monitoring

```typescript
// Token efficiency metrics
export async function trackTokenMetrics(sessionID: string) {
  const messages = await Session.messages({ sessionID })

  const metrics = {
    totalTokens: 0,
    inputTokens: 0,
    outputTokens: 0,
    optimizationSavings: 0,
    compactionCount: 0,
    truncationCount: 0,
  }

  for (const msg of messages) {
    metrics.totalTokens += msg.tokens.input + msg.tokens.output
    metrics.inputTokens += msg.tokens.input
    metrics.outputTokens += msg.tokens.output

    if (msg.info.mode === "compaction") {
      metrics.compactionCount++
    }
  }

  return metrics
}
```

---

## Configuration

### Environment Variables

```bash
# Token optimization controls
OPENCODE_TOKEN_ESTIMATION=accurate    # accurate (tiktoken) or legacy (4:1)
OPENCODE_TRUNCATION_MODE=smart        # smart or legacy (fixed limits)
OPENCODE_COMPACTION_THRESHOLD=0.7      # trigger at 70% of context
OPENCODE_GREP_LIMIT=50                 # default match limit
OPENCODE_WEBSEARCH_CHARS=6000          # default context characters
OPENCODE_FILE_READ_LIMIT=400          # default lines for code files
OPENCODE_OUTPUT_BUDGET_RATIO=0.4      # percentage for output
OPENCODE_DUPLICATE_DETECTION=true     # enable cache
```

### Per-Model Configuration

```json
{
  "models": {
    "gpt-4o": {
      "context_limit": 128000,
      "output_limit": 16384,
      "token_budget": {
        "code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
        "exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
      }
    },
    "claude-sonnet-4-20250514": {
      "context_limit": 200000,
      "output_limit": 8192,
      "supports_prompt_cache": true
    }
  }
}
```

---

## Migration Guide

### Upgrading from Legacy Token Estimation

```typescript
// Before (4:1 ratio)
const tokens = content.length / 4

// After (tiktoken)
const tokens = Token.estimate(content)
```

### Upgrading from Legacy Truncation

```typescript
// Before (fixed limits)
if (lines.length > 2000 || bytes > 51200) {
  truncate(content)
}

// After (smart truncation)
const result = await Truncate.smart(content, {
  fileType: detectFileType(content),
  maxTokens: 8000,
})
```

---

## Best Practices

1. **Measure First:** Always measure token usage before and after changes
2. **Incrementally Roll Out:** Deploy optimizations gradually
3. **User Control:** Allow users to override defaults
4. **Monitor Quality:** Track task success rates alongside token savings
5. **Fallback Ready:** Have fallback mechanisms for when optimizations fail

---

## References

- **Files:** `src/util/token.ts`, `src/tool/truncation.ts`, `src/session/compaction.ts`, `src/session/prompt.ts`, `src/session/message-v2.ts`, `src/tool/grep.ts`, `src/tool/websearch.ts`, `src/tool/read.ts`, `src/session/system.ts`
- **Dependencies:** `tiktoken`, `@dqbd/tiktoken`
- **Related Issues:** Context overflow handling, token tracking, prompt optimization