- Add custom loaders for bytez, llm7, aimlapi, routeway, and g4f providers - Add provider definitions to models-api.json with sample models - Add provider icon names to types.ts - Chutes loader already exists and should work with CHUTES_API_KEY env var Providers added: - bytez: Uses BYTEZ_API_KEY, OpenAI-compatible - llm7: Uses LLM7_API_KEY (optional), OpenAI-compatible - aimlapi: Uses AIMLAPI_API_KEY, OpenAI-compatible - routeway: Uses ROUTEWAY_API_KEY, OpenAI-compatible - g4f: Uses G4F_API_KEY (optional), free tier available
880 lines
22 KiB
Markdown
880 lines
22 KiB
Markdown
# Token Efficiency Guide for OpenCode CLI
|
|
|
|
This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.
|
|
|
|
## Overview
|
|
|
|
Token efficiency is critical for:
|
|
- Reducing API costs
|
|
- Avoiding context window overflows
|
|
- Improving response latency
|
|
- Enabling longer conversations
|
|
|
|
OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.
|
|
|
|
---
|
|
|
|
## Current Token Management
|
|
|
|
### Existing Mechanisms
|
|
|
|
| Mechanism | Location | Description |
|
|
|-----------|----------|-------------|
|
|
| Compaction | `src/session/compaction.ts` | Summarizes conversation history when context is exceeded |
|
|
| Truncation | `src/tool/truncation.ts` | Limits tool outputs to 2000 lines / 50KB |
|
|
| Pruning | `src/session/compaction.ts:41-90` | Removes old tool outputs beyond 40K tokens |
|
|
| Token Estimation | `src/util/token.ts` | Uses 4:1 character-to-token ratio |
|
|
|
|
### Token Flow
|
|
|
|
```
|
|
User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API
|
|
```
|
|
|
|
---
|
|
|
|
## Recommended Improvements
|
|
|
|
### 1. Smart Compaction Strategy
|
|
|
|
**Current Behavior:** Compaction triggers reactively when tokens exceed threshold.
|
|
|
|
**Improvements:**
|
|
|
|
- **Predictive Compaction:** Analyze token growth patterns and compact proactively before reaching limits
|
|
- **Configurable Thresholds:** Allow compaction at 70-80% of context instead of 100%
|
|
- **Task-Aware Triggers:** Compact before expensive operations (file edits, builds)
|
|
|
|
```typescript
|
|
// Example: Predictive compaction logic
|
|
async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
|
|
const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
|
|
return acc + Token.estimate(msg.content)
|
|
}, 0)
|
|
|
|
const trend = recentGrowth / 5
|
|
const projectedTotal = Token.estimate(allMessages) + (trend * 3)
|
|
|
|
return projectedTotal > contextLimit * 0.7 ? 1 : 0
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/session/compaction.ts`
|
|
- `src/session/prompt.ts`
|
|
|
|
---
|
|
|
|
### 2. Enhanced Token Estimation
|
|
|
|
**Current Behavior:** Simple 4:1 character ratio estimation.
|
|
|
|
**Improvements:**
|
|
|
|
- Use `tiktoken` for accurate OpenAI/Anthropic tokenization
|
|
- Add provider-specific token estimators
|
|
- Cache token counts to avoid recalculation
|
|
|
|
```typescript
|
|
// src/util/token.ts - Enhanced estimation
|
|
import cl100k_base from "@dqbd/tiktoken/cl100k_base"
|
|
|
|
const encoder = cl100k_base()
|
|
|
|
export namespace Token {
|
|
export function estimate(input: string): number {
|
|
return encoder.encode(input).length
|
|
}
|
|
|
|
export function estimateMessages(messages: ModelMessage[]): number {
|
|
const perMessageOverhead = 3 // <|start|> role content <|end|>
|
|
const base = messages.length * perMessageOverhead
|
|
const content = messages.reduce((acc, msg) => {
|
|
if (typeof msg.content === "string") {
|
|
return acc + encoder.encode(msg.content).length
|
|
}
|
|
return acc + encoder.encode(JSON.stringify(msg.content)).length
|
|
}, 0)
|
|
return base + content
|
|
}
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/util/token.ts`
|
|
- `src/provider/provider.ts` (add token limits)
|
|
|
|
---
|
|
|
|
### 3. Intelligent Tool Output Management
|
|
|
|
**Current Behavior:** Fixed truncation at 2000 lines / 50KB.
|
|
|
|
**Improvements:**
|
|
|
|
- **Content-Aware Truncation:**
|
|
- **Code:** Keep function signatures, truncate bodies
|
|
- **Logs:** Keep head+tail, truncate middle
|
|
- **JSON:** Preserve structure, truncate arrays
|
|
- **Errors:** Never truncate
|
|
|
|
```typescript
|
|
// src/tool/truncation.ts - Smart truncation
|
|
export async function smartTruncate(
|
|
content: string,
|
|
options: SmartTruncateOptions = {}
|
|
): Promise<Truncate.Result> {
|
|
const { fileType = detectFileType(content), maxTokens = 8000 } = options
|
|
|
|
switch (fileType) {
|
|
case "code":
|
|
return truncateCode(content, maxTokens)
|
|
case "logs":
|
|
return truncateLogs(content, maxTokens)
|
|
case "json":
|
|
return truncateJSON(content, maxTokens)
|
|
case "error":
|
|
return { content, truncated: false }
|
|
default:
|
|
return genericTruncate(content, maxTokens)
|
|
}
|
|
}
|
|
|
|
function truncateCode(content: string, maxTokens: number): Truncate.Result {
|
|
const lines = content.split("\n")
|
|
const result: string[] = []
|
|
|
|
let currentTokens = 0
|
|
const overheadPerLine = 2 // ~0.5 tokens per line
|
|
|
|
for (const line of lines) {
|
|
const lineTokens = Token.estimate(line)
|
|
if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
|
|
break
|
|
}
|
|
|
|
// Always include function signatures
|
|
if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
|
|
result.push(line)
|
|
currentTokens += lineTokens
|
|
continue
|
|
}
|
|
|
|
// Skip implementation details after max reached
|
|
if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
|
|
result.push(` // ${lines.length - result.length} lines truncated...`)
|
|
break
|
|
}
|
|
|
|
result.push(line)
|
|
currentTokens += lineTokens
|
|
}
|
|
|
|
return formatResult(result, content)
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/tool/truncation.ts`
|
|
- Add `src/tool/truncation/code.ts`
|
|
- Add `src/tool/truncation/logs.ts`
|
|
|
|
---
|
|
|
|
### 4. Message History Optimization
|
|
|
|
**Current Behavior:** Full message history sent until compaction.
|
|
|
|
**Improvements:**
|
|
|
|
- **Importance Scoring:** Prioritize messages by importance
|
|
- **Selective History:** Remove low-value messages
|
|
- **Ephemeral Messages:** Mark transient context for removal
|
|
|
|
```typescript
|
|
// Message importance scoring
|
|
const MESSAGE_IMPORTANCE = {
|
|
user_request: 100,
|
|
file_edit: 90,
|
|
agent_completion: 80,
|
|
tool_success: 60,
|
|
tool_output: 50,
|
|
intermediate_result: 30,
|
|
system_reminder: 20,
|
|
}
|
|
|
|
function scoreMessage(message: MessageV2.WithParts): number {
|
|
let score = 0
|
|
|
|
// Role-based scoring
|
|
if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
|
|
if (message.info.role === "assistant") {
|
|
if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
|
|
score += MESSAGE_IMPORTANCE.file_edit
|
|
} else {
|
|
score += MESSAGE_IMPORTANCE.agent_completion
|
|
}
|
|
}
|
|
|
|
// Tool call scoring
|
|
for (const part of message.parts) {
|
|
if (part.type === "tool") {
|
|
const toolScore = getToolImportanceScore(part.tool)
|
|
score += toolScore
|
|
}
|
|
}
|
|
|
|
return score
|
|
}
|
|
|
|
// Selective history retention
|
|
async function getOptimizedHistory(
|
|
sessionID: string,
|
|
maxTokens: number
|
|
): Promise<MessageV2.WithParts[]> {
|
|
const messages = await Session.messages({ sessionID })
|
|
const scored = messages.map(msg => ({
|
|
message: msg,
|
|
score: scoreMessage(msg),
|
|
tokens: Token.estimate(msg),
|
|
}))
|
|
|
|
scored.sort((a, b) => b.score - a.score)
|
|
|
|
const result: MessageV2.WithParts[] = []
|
|
let usedTokens = 0
|
|
|
|
for (const item of scored) {
|
|
if (usedTokens + item.tokens > maxTokens) break
|
|
|
|
// Always keep last user message
|
|
if (item.message.info.role === "user" &&
|
|
result.length > 0 &&
|
|
result[result.length - 1].info.id < item.message.info.id) {
|
|
result.push(item.message)
|
|
usedTokens += item.tokens
|
|
continue
|
|
}
|
|
|
|
// Keep if high importance score
|
|
if (item.score >= 50) {
|
|
result.push(item.message)
|
|
usedTokens += item.tokens
|
|
}
|
|
}
|
|
|
|
return result.reverse()
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/session/message-v2.ts`
|
|
- `src/session/prompt.ts`
|
|
|
|
---
|
|
|
|
### 5. System Prompt Compression
|
|
|
|
**Current Behavior:** Provider-specific prompts loaded from text files.
|
|
|
|
**Improvements:**
|
|
|
|
- Audit and compress prompts
|
|
- Move optional instructions to first user message
|
|
- Create "minimal" mode for quick tasks
|
|
|
|
```typescript
|
|
// src/session/system.ts - Compressed prompts
|
|
export namespace SystemPrompt {
|
|
// Core instructions (always sent)
|
|
const CORE_PROMPT = `You are an expert software engineering assistant.`
|
|
|
|
// Optional instructions (sent based on context)
|
|
const OPTIONAL_PROMPTS = {
|
|
code_quality: `Focus on clean, maintainable code with proper error handling.`,
|
|
testing: `Always write tests for new functionality.`,
|
|
documentation: `Document complex logic and API surfaces.`,
|
|
}
|
|
|
|
export async function getCompressedPrompt(
|
|
model: Provider.Model,
|
|
context: PromptContext
|
|
): Promise<string[]> {
|
|
const prompts: string[] = [CORE_PROMPT]
|
|
|
|
// Add model-specific base prompt
|
|
const basePrompt = getBasePrompt(model)
|
|
prompts.push(basePrompt)
|
|
|
|
// Conditionally add optional prompts
|
|
if (context.needsQualityFocus) {
|
|
prompts.push(OPTIONAL_PROMPTS.code_quality)
|
|
}
|
|
if (context.needsTesting) {
|
|
prompts.push(OPTIONAL_PROMPTS.testing)
|
|
}
|
|
|
|
return prompts
|
|
}
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/session/system.ts`
|
|
- `src/session/prompt/*.txt`
|
|
|
|
---
|
|
|
|
### 6. Smart Grep Result Limits
|
|
|
|
**Current Behavior:** Hard limit of 100 matches.
|
|
|
|
**Improvements:**
|
|
|
|
- Reduce default to 50 matches
|
|
- Add priority scoring based on relevance
|
|
- Group matches by file
|
|
|
|
```typescript
|
|
// src/tool/grep.ts - Enhanced result handling
|
|
const DEFAULT_MATCH_LIMIT = 50
|
|
const PRIORITY_WEIGHTS = {
|
|
recently_modified: 1.5,
|
|
same_directory: 1.3,
|
|
matching_extension: 1.2,
|
|
exact_match: 1.1,
|
|
}
|
|
|
|
interface MatchPriority {
|
|
match: Match
|
|
score: number
|
|
}
|
|
|
|
function scoreMatch(match: Match, context: GrepContext): number {
|
|
let score = 1.0
|
|
|
|
// Recently modified files
|
|
const fileAge = Date.now() - match.modTime
|
|
if (fileAge < 7 * 24 * 60 * 60 * 1000) {
|
|
score *= PRIORITY_WEIGHTS.recently_modified
|
|
}
|
|
|
|
// Same directory as current work
|
|
if (match.path.startsWith(context.cwd)) {
|
|
score *= PRIORITY_WEIGHTS.same_directory
|
|
}
|
|
|
|
// Matching extension
|
|
if (context.targetExtensions.includes(path.extname(match.path))) {
|
|
score *= PRIORITY_WEIGHTS.matching_extension
|
|
}
|
|
|
|
return score
|
|
}
|
|
|
|
export async function execute(params: GrepParams, ctx: Tool.Context) {
|
|
const results = await ripgrep(params)
|
|
const scored: MatchPriority[] = results.map(match => ({
|
|
match,
|
|
score: scoreMatch(match, ctx),
|
|
}))
|
|
|
|
scored.sort((a, b) => b.score - a.score)
|
|
|
|
const limit = params.limit ?? DEFAULT_MATCH_LIMIT
|
|
const topMatches = scored.slice(0, limit)
|
|
|
|
return formatGroupedOutput(topMatches)
|
|
}
|
|
|
|
function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
|
|
const byFile = groupBy(matches, m => path.dirname(m.match.path))
|
|
|
|
const output: string[] = []
|
|
output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)
|
|
|
|
for (const [dir, fileMatches] of byFile) {
|
|
output.push(`\n${dir}:`)
|
|
for (const { match, score } of fileMatches.slice(0, 10)) {
|
|
const relevance = score > 1.0 ? " [high relevance]" : ""
|
|
output.push(` Line ${match.lineNum}: ${match.lineText}${relevance}`)
|
|
}
|
|
if (fileMatches.length > 10) {
|
|
output.push(` ... and ${fileMatches.length - 10} more`)
|
|
}
|
|
}
|
|
|
|
return { output: output.join("\n") }
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/tool/grep.ts`
|
|
|
|
---
|
|
|
|
### 7. Web Search Context Optimization
|
|
|
|
**Current Behavior:** 10,000 character default limit.
|
|
|
|
**Improvements:**
|
|
|
|
- Reduce default to 6,000 characters
|
|
- Content quality scoring
|
|
- Query-relevant extraction
|
|
|
|
```typescript
|
|
// src/tool/websearch.ts - Optimized content extraction
|
|
const DEFAULT_CONTEXT_CHARS = 6000
|
|
|
|
interface ContentQualityScore {
|
|
source: string
|
|
score: number
|
|
relevantSections: string[]
|
|
}
|
|
|
|
function scoreAndExtract(
|
|
content: string,
|
|
query: string
|
|
): ContentQualityScore {
|
|
const paragraphs = content.split(/\n\n+/)
|
|
const queryTerms = query.toLowerCase().split(/\s+/)
|
|
|
|
const scored = paragraphs.map(para => {
|
|
const lower = para.toLowerCase()
|
|
const termMatches = queryTerms.filter(term => lower.includes(term)).length
|
|
const density = termMatches / para.length
|
|
const position = paragraphs.indexOf(para) / paragraphs.length
|
|
|
|
return {
|
|
para,
|
|
score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
|
|
}
|
|
})
|
|
|
|
scored.sort((a, b) => b.score - a.score)
|
|
|
|
const relevant: string[] = []
|
|
let usedChars = 0
|
|
|
|
for (const { para } of scored) {
|
|
if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
|
|
relevant.push(para)
|
|
usedChars += para.length
|
|
}
|
|
|
|
return {
|
|
source: content.substring(0, DEFAULT_CONTEXT_CHARS),
|
|
score: scored[0]?.score ?? 0,
|
|
relevantSections: relevant,
|
|
}
|
|
}
|
|
|
|
export async function execute(params: WebSearchParams, ctx: Tool.Context) {
|
|
const response = await exaSearch({
|
|
query: params.query,
|
|
numResults: params.numResults ?? 8,
|
|
type: params.type ?? "auto",
|
|
livecrawl: params.livecrawl ?? "fallback",
|
|
contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
|
|
})
|
|
|
|
const optimized = optimizeResults(response.results, params.query)
|
|
|
|
return {
|
|
output: formatOptimizedResults(optimized),
|
|
metadata: {
|
|
originalChars: response.totalChars,
|
|
optimizedChars: optimized.totalChars,
|
|
savings: 1 - (optimized.totalChars / response.totalChars),
|
|
},
|
|
}
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/tool/websearch.ts`
|
|
|
|
---
|
|
|
|
### 8. File Read Optimization
|
|
|
|
**Current Behavior:** Full file content sent unless offset/limit specified.
|
|
|
|
**Improvements:**
|
|
|
|
- Default limits based on file type
|
|
- Smart offset detection (function boundaries)
|
|
|
|
```typescript
|
|
// src/tool/read.ts - Optimized file reading
|
|
const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
|
|
".json": { defaultLimit: Infinity, truncate: false },
|
|
".md": { defaultLimit: 2000, truncate: true },
|
|
".ts": { defaultLimit: 400, truncate: true },
|
|
".js": { defaultLimit: 400, truncate: true },
|
|
".py": { defaultLimit: 400, truncate: true },
|
|
".yml": { defaultLimit: 500, truncate: true },
|
|
".yaml": { defaultLimit: 500, truncate: true },
|
|
".txt": { defaultLimit: 1000, truncate: true },
|
|
default: { defaultLimit: 300, truncate: true },
|
|
}
|
|
|
|
export async function execute(params: ReadParams, ctx: Tool.Context) {
|
|
const ext = path.extname(params.filePath)
|
|
const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default
|
|
|
|
const offset = params.offset ?? 0
|
|
const limit = params.limit ?? config.defaultLimit
|
|
|
|
const file = Bun.file(params.filePath)
|
|
const content = await file.text()
|
|
const lines = content.split("\n")
|
|
|
|
if (!config.truncate || lines.length <= limit + offset) {
|
|
return {
|
|
output: content,
|
|
attachments: [],
|
|
}
|
|
}
|
|
|
|
const displayedLines = lines.slice(offset, offset + limit)
|
|
const output = [
|
|
...displayedLines,
|
|
"",
|
|
`... ${lines.length - displayedLines.length} lines truncated ...`,
|
|
"",
|
|
`File: ${params.filePath}`,
|
|
`Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
|
|
].join("\n")
|
|
|
|
return {
|
|
output,
|
|
attachments: [{
|
|
type: "file",
|
|
filename: params.filePath,
|
|
mime: mime.lookup(params.filePath) || "text/plain",
|
|
url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
|
|
}],
|
|
}
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/tool/read.ts`
|
|
|
|
---
|
|
|
|
### 9. Context Window Budgeting
|
|
|
|
**Current Behavior:** Fixed 32K output token reservation.
|
|
|
|
**Improvements:**
|
|
|
|
- Dynamic budget allocation based on task type
|
|
- Model-specific optimizations
|
|
|
|
```typescript
|
|
// src/session/prompt.ts - Dynamic budget allocation
|
|
const TASK_BUDGETS: Record<string, TaskBudget> = {
|
|
code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
|
|
exploration: { inputRatio: 0.8, outputRatio: 0.2 },
|
|
qa: { inputRatio: 0.7, outputRatio: 0.3 },
|
|
refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
|
|
debugging: { inputRatio: 0.7, outputRatio: 0.3 },
|
|
default: { inputRatio: 0.6, outputRatio: 0.4 },
|
|
}
|
|
|
|
interface BudgetCalculation {
|
|
inputBudget: number
|
|
outputBudget: number
|
|
totalBudget: number
|
|
}
|
|
|
|
function calculateBudget(
|
|
model: Provider.Model,
|
|
taskType: string,
|
|
estimatedInputTokens: number
|
|
): BudgetCalculation {
|
|
const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
|
|
const modelContext = model.limit.context
|
|
const modelMaxOutput = model.limit.output
|
|
|
|
// Dynamic budget based on task type
|
|
const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
|
|
const outputBudget = Math.min(
|
|
modelMaxOutput,
|
|
Math.floor(baseBudget * config.outputRatio),
|
|
SessionPrompt.OUTPUT_TOKEN_MAX
|
|
)
|
|
const inputBudget = Math.floor(baseBudget * config.inputRatio)
|
|
|
|
return {
|
|
inputBudget,
|
|
outputBudget,
|
|
totalBudget: inputBudget + outputBudget,
|
|
}
|
|
}
|
|
|
|
async function checkAndAdjustPrompt(
|
|
messages: ModelMessage[],
|
|
budget: BudgetCalculation
|
|
): Promise<ModelMessage[]> {
|
|
const currentTokens = Token.estimateMessages(messages)
|
|
|
|
if (currentTokens <= budget.inputBudget) {
|
|
return messages
|
|
}
|
|
|
|
// Need to reduce - prioritize recent messages
|
|
const result = await pruneMessagesToBudget(messages, budget.inputBudget)
|
|
return result
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- `src/session/prompt.ts`
|
|
- `src/session/compaction.ts`
|
|
|
|
---
|
|
|
|
### 10. Duplicate Detection
|
|
|
|
**Current Behavior:** No deduplication of content.
|
|
|
|
**Improvements:**
|
|
|
|
- Hash and track tool outputs
|
|
- Skip identical subsequent calls
|
|
- Cache read file contents
|
|
|
|
```typescript
|
|
// src/session/duplicate-detection.ts
|
|
const outputHashCache = new Map<string, string>()
|
|
|
|
function getContentHash(content: string): string {
|
|
return Bun.hash(content, "sha256").toString()
|
|
}
|
|
|
|
async function deduplicateToolOutput(
|
|
toolId: string,
|
|
input: Record<string, unknown>,
|
|
content: string
|
|
): Promise<{ isDuplicate: boolean; output: string }> {
|
|
const hash = getContentHash(content)
|
|
const key = `${toolId}:${JSON.stringify(input)}:${hash}`
|
|
|
|
if (outputHashCache.has(key)) {
|
|
return {
|
|
isDuplicate: true,
|
|
output: outputHashCache.get(key)!,
|
|
}
|
|
}
|
|
|
|
outputHashCache.set(key, content)
|
|
return { isDuplicate: false, output: content }
|
|
}
|
|
|
|
// In tool execution
|
|
async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
|
|
const content = await tool.execute(args)
|
|
|
|
const { isDuplicate, output } = await deduplicateToolOutput(
|
|
tool.id,
|
|
args,
|
|
content.output
|
|
)
|
|
|
|
if (isDuplicate) {
|
|
log.debug("Skipping duplicate tool output", { tool: tool.id })
|
|
return {
|
|
...content,
|
|
output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
|
|
metadata: { ...content.metadata, duplicate: true },
|
|
}
|
|
}
|
|
|
|
return content
|
|
}
|
|
```
|
|
|
|
**Files to Modify:**
|
|
- Add `src/session/duplicate-detection.ts`
|
|
- `src/tool/tool.ts`
|
|
|
|
---
|
|
|
|
## Implementation Priority
|
|
|
|
### Phase 1: High Impact, Low Risk
|
|
|
|
| Priority | Improvement | Estimated Token Savings | Risk |
|
|
|----------|-------------|------------------------|------|
|
|
| 1 | Enhanced Token Estimation | 5-15% | Low |
|
|
| 2 | Smart Grep Limits | 10-20% | Low |
|
|
| 3 | Web Search Optimization | 20-30% | Low |
|
|
| 4 | System Prompt Compression | 5-10% | Low |
|
|
|
|
### Phase 2: Medium Impact, Medium Risk
|
|
|
|
| Priority | Improvement | Estimated Token Savings | Risk |
|
|
|----------|-------------|------------------------|------|
|
|
| 5 | Tool Output Management | 15-25% | Medium |
|
|
| 6 | Message History Optimization | 20-30% | Medium |
|
|
| 7 | File Read Limits | 10-20% | Medium |
|
|
|
|
### Phase 3: High Impact, Higher Complexity
|
|
|
|
| Priority | Improvement | Estimated Token Savings | Risk |
|
|
|----------|-------------|------------------------|------|
|
|
| 8 | Smart Compaction | 25-40% | High |
|
|
| 9 | Context Budgeting | 15-25% | High |
|
|
| 10 | Duplicate Detection | 10-15% | Medium |
|
|
|
|
---
|
|
|
|
## Quality Preservation
|
|
|
|
### Testing Strategy
|
|
|
|
1. **A/B Testing:** Compare outputs before/after each optimization
|
|
2. **Quality Metrics:** Track success rate, user satisfaction, task completion
|
|
3. **Rollback Mechanism:** Config flags to disable optimizations per-session
|
|
|
|
```typescript
|
|
// Config schema for optimization controls
|
|
const OptimizationConfig = z.object({
|
|
smart_compaction: z.boolean().default(true),
|
|
enhanced_estimation: z.boolean().default(true),
|
|
smart_truncation: z.boolean().default(true),
|
|
message_pruning: z.boolean().default(true),
|
|
system_prompt_compression: z.boolean().default(true),
|
|
grep_optimization: z.boolean().default(true),
|
|
websearch_optimization: z.boolean().default(true),
|
|
file_read_limits: z.boolean().default(true),
|
|
context_budgeting: z.boolean().default(true),
|
|
duplicate_detection: z.boolean().default(true),
|
|
})
|
|
|
|
// Usage
|
|
const config = await Config.get()
|
|
const optimizations = config.optimizations ?? {}
|
|
```
|
|
|
|
### Monitoring
|
|
|
|
```typescript
|
|
// Token efficiency metrics
|
|
export async function trackTokenMetrics(sessionID: string) {
|
|
const messages = await Session.messages({ sessionID })
|
|
|
|
const metrics = {
|
|
totalTokens: 0,
|
|
inputTokens: 0,
|
|
outputTokens: 0,
|
|
optimizationSavings: 0,
|
|
compactionCount: 0,
|
|
truncationCount: 0,
|
|
}
|
|
|
|
for (const msg of messages) {
|
|
metrics.totalTokens += msg.tokens.input + msg.tokens.output
|
|
metrics.inputTokens += msg.tokens.input
|
|
metrics.outputTokens += msg.tokens.output
|
|
|
|
if (msg.info.mode === "compaction") {
|
|
metrics.compactionCount++
|
|
}
|
|
}
|
|
|
|
return metrics
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Token optimization controls
|
|
OPENCODE_TOKEN_ESTIMATION=accurate # accurate (tiktoken) or legacy (4:1)
|
|
OPENCODE_TRUNCATION_MODE=smart # smart or legacy (fixed limits)
|
|
OPENCODE_COMPACTION_THRESHOLD=0.7 # trigger at 70% of context
|
|
OPENCODE_GREP_LIMIT=50 # default match limit
|
|
OPENCODE_WEBSEARCH_CHARS=6000 # default context characters
|
|
OPENCODE_FILE_READ_LIMIT=400 # default lines for code files
|
|
OPENCODE_OUTPUT_BUDGET_RATIO=0.4 # percentage for output
|
|
OPENCODE_DUPLICATE_DETECTION=true # enable cache
|
|
```
|
|
|
|
### Per-Model Configuration
|
|
|
|
```json
|
|
{
|
|
"models": {
|
|
"gpt-4o": {
|
|
"context_limit": 128000,
|
|
"output_limit": 16384,
|
|
"token_budget": {
|
|
"code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
|
|
"exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
|
|
}
|
|
},
|
|
"claude-sonnet-4-20250514": {
|
|
"context_limit": 200000,
|
|
"output_limit": 8192,
|
|
"supports_prompt_cache": true
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Migration Guide
|
|
|
|
### Upgrading from Legacy Token Estimation
|
|
|
|
```typescript
|
|
// Before (4:1 ratio)
|
|
const tokens = content.length / 4
|
|
|
|
// After (tiktoken)
|
|
const tokens = Token.estimate(content)
|
|
```
|
|
|
|
### Upgrading from Legacy Truncation
|
|
|
|
```typescript
|
|
// Before (fixed limits)
|
|
if (lines.length > 2000 || bytes > 51200) {
|
|
truncate(content)
|
|
}
|
|
|
|
// After (smart truncation)
|
|
const result = await Truncate.smart(content, {
|
|
fileType: detectFileType(content),
|
|
maxTokens: 8000,
|
|
})
|
|
```
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
1. **Measure First:** Always measure token usage before and after changes
|
|
2. **Incrementally Roll Out:** Deploy optimizations gradually
|
|
3. **User Control:** Allow users to override defaults
|
|
4. **Monitor Quality:** Track task success rates alongside token savings
|
|
5. **Fallback Ready:** Have fallback mechanisms for when optimizations fail
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- **Files:** `src/util/token.ts`, `src/tool/truncation.ts`, `src/session/compaction.ts`, `src/session/prompt.ts`, `src/session/message-v2.ts`, `src/tool/grep.ts`, `src/tool/websearch.ts`, `src/tool/read.ts`, `src/session/system.ts`
|
|
- **Dependencies:** `tiktoken`, `@dqbd/tiktoken`
|
|
- **Related Issues:** Context overflow handling, token tracking, prompt optimization
|