feat: Add support for multiple AI providers (bytez, llm7.io, aimlapi.com, routeway.ai, g4f.dev) and fix Chutes loader
- Add custom loaders for bytez, llm7, aimlapi, routeway, and g4f providers - Add provider definitions to models-api.json with sample models - Add provider icon names to types.ts - Chutes loader already exists and should work with CHUTES_API_KEY env var Providers added: - bytez: Uses BYTEZ_API_KEY, OpenAI-compatible - llm7: Uses LLM7_API_KEY (optional), OpenAI-compatible - aimlapi: Uses AIMLAPI_API_KEY, OpenAI-compatible - routeway: Uses ROUTEWAY_API_KEY, OpenAI-compatible - g4f: Uses G4F_API_KEY (optional), free tier available
This commit is contained in:
879
opencode/TOKEN_EFFICIENCY_GUIDE.md
Normal file
879
opencode/TOKEN_EFFICIENCY_GUIDE.md
Normal file
@@ -0,0 +1,879 @@
|
||||
# Token Efficiency Guide for OpenCode CLI
|
||||
|
||||
This guide documents strategies to optimize token usage in the OpenCode CLI without compromising output quality.
|
||||
|
||||
## Overview
|
||||
|
||||
Token efficiency is critical for:
|
||||
- Reducing API costs
|
||||
- Avoiding context window overflows
|
||||
- Improving response latency
|
||||
- Enabling longer conversations
|
||||
|
||||
OpenCode already has several optimization mechanisms (compaction, truncation, pruning), but there are opportunities to improve further.
|
||||
|
||||
---
|
||||
|
||||
## Current Token Management
|
||||
|
||||
### Existing Mechanisms
|
||||
|
||||
| Mechanism | Location | Description |
|
||||
|-----------|----------|-------------|
|
||||
| Compaction | `src/session/compaction.ts` | Summarizes conversation history when context is exceeded |
|
||||
| Truncation | `src/tool/truncation.ts` | Limits tool outputs to 2000 lines / 50KB |
|
||||
| Pruning | `src/session/compaction.ts:41-90` | Removes old tool outputs beyond 40K tokens |
|
||||
| Token Estimation | `src/util/token.ts` | Uses 4:1 character-to-token ratio |
|
||||
|
||||
### Token Flow
|
||||
|
||||
```
|
||||
User Input → Message Construction → System Prompts → Tool Definitions → Message History → LLM API
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommended Improvements
|
||||
|
||||
### 1. Smart Compaction Strategy
|
||||
|
||||
**Current Behavior:** Compaction triggers reactively when tokens exceed threshold.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- **Predictive Compaction:** Analyze token growth patterns and compact proactively before reaching limits
|
||||
- **Configurable Thresholds:** Allow compaction at 70-80% of context instead of 100%
|
||||
- **Task-Aware Triggers:** Compact before expensive operations (file edits, builds)
|
||||
|
||||
```typescript
|
||||
// Example: Predictive compaction logic
|
||||
async function predictCompactionNeed(messages: MessageV2.WithParts[]): Promise<number> {
|
||||
const recentGrowth = messages.slice(-5).reduce((acc, msg) => {
|
||||
return acc + Token.estimate(msg.content)
|
||||
}, 0)
|
||||
|
||||
const trend = recentGrowth / 5
|
||||
const projectedTotal = Token.estimate(allMessages) + (trend * 3)
|
||||
|
||||
return projectedTotal > contextLimit * 0.7 ? 1 : 0
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/session/compaction.ts`
|
||||
- `src/session/prompt.ts`
|
||||
|
||||
---
|
||||
|
||||
### 2. Enhanced Token Estimation
|
||||
|
||||
**Current Behavior:** Simple 4:1 character ratio estimation.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- Use `tiktoken` for accurate OpenAI/Anthropic tokenization
|
||||
- Add provider-specific token estimators
|
||||
- Cache token counts to avoid recalculation
|
||||
|
||||
```typescript
|
||||
// src/util/token.ts - Enhanced estimation
|
||||
import cl100k_base from "@dqbd/tiktoken/cl100k_base"
|
||||
|
||||
const encoder = cl100k_base()
|
||||
|
||||
export namespace Token {
|
||||
export function estimate(input: string): number {
|
||||
return encoder.encode(input).length
|
||||
}
|
||||
|
||||
export function estimateMessages(messages: ModelMessage[]): number {
|
||||
const perMessageOverhead = 3 // <|start|> role content <|end|>
|
||||
const base = messages.length * perMessageOverhead
|
||||
const content = messages.reduce((acc, msg) => {
|
||||
if (typeof msg.content === "string") {
|
||||
return acc + encoder.encode(msg.content).length
|
||||
}
|
||||
return acc + encoder.encode(JSON.stringify(msg.content)).length
|
||||
}, 0)
|
||||
return base + content
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/util/token.ts`
|
||||
- `src/provider/provider.ts` (add token limits)
|
||||
|
||||
---
|
||||
|
||||
### 3. Intelligent Tool Output Management
|
||||
|
||||
**Current Behavior:** Fixed truncation at 2000 lines / 50KB.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- **Content-Aware Truncation:**
|
||||
- **Code:** Keep function signatures, truncate bodies
|
||||
- **Logs:** Keep head+tail, truncate middle
|
||||
- **JSON:** Preserve structure, truncate arrays
|
||||
- **Errors:** Never truncate
|
||||
|
||||
```typescript
|
||||
// src/tool/truncation.ts - Smart truncation
|
||||
export async function smartTruncate(
|
||||
content: string,
|
||||
options: SmartTruncateOptions = {}
|
||||
): Promise<Truncate.Result> {
|
||||
const { fileType = detectFileType(content), maxTokens = 8000 } = options
|
||||
|
||||
switch (fileType) {
|
||||
case "code":
|
||||
return truncateCode(content, maxTokens)
|
||||
case "logs":
|
||||
return truncateLogs(content, maxTokens)
|
||||
case "json":
|
||||
return truncateJSON(content, maxTokens)
|
||||
case "error":
|
||||
return { content, truncated: false }
|
||||
default:
|
||||
return genericTruncate(content, maxTokens)
|
||||
}
|
||||
}
|
||||
|
||||
function truncateCode(content: string, maxTokens: number): Truncate.Result {
|
||||
const lines = content.split("\n")
|
||||
const result: string[] = []
|
||||
|
||||
let currentTokens = 0
|
||||
const overheadPerLine = 2 // ~0.5 tokens per line
|
||||
|
||||
for (const line of lines) {
|
||||
const lineTokens = Token.estimate(line)
|
||||
if (currentTokens + lineTokens + overheadPerLine > maxTokens) {
|
||||
break
|
||||
}
|
||||
|
||||
// Always include function signatures
|
||||
if (line.match(/^(function|class|const|let|var|export|interface|type)/)) {
|
||||
result.push(line)
|
||||
currentTokens += lineTokens
|
||||
continue
|
||||
}
|
||||
|
||||
// Skip implementation details after max reached
|
||||
if (result.length > 0 && result[result.length - 1].match(/^{$/)) {
|
||||
result.push(` // ${lines.length - result.length} lines truncated...`)
|
||||
break
|
||||
}
|
||||
|
||||
result.push(line)
|
||||
currentTokens += lineTokens
|
||||
}
|
||||
|
||||
return formatResult(result, content)
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/tool/truncation.ts`
|
||||
- Add `src/tool/truncation/code.ts`
|
||||
- Add `src/tool/truncation/logs.ts`
|
||||
|
||||
---
|
||||
|
||||
### 4. Message History Optimization
|
||||
|
||||
**Current Behavior:** Full message history sent until compaction.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- **Importance Scoring:** Prioritize messages by importance
|
||||
- **Selective History:** Remove low-value messages
|
||||
- **Ephemeral Messages:** Mark transient context for removal
|
||||
|
||||
```typescript
|
||||
// Message importance scoring
|
||||
const MESSAGE_IMPORTANCE = {
|
||||
user_request: 100,
|
||||
file_edit: 90,
|
||||
agent_completion: 80,
|
||||
tool_success: 60,
|
||||
tool_output: 50,
|
||||
intermediate_result: 30,
|
||||
system_reminder: 20,
|
||||
}
|
||||
|
||||
function scoreMessage(message: MessageV2.WithParts): number {
|
||||
let score = 0
|
||||
|
||||
// Role-based scoring
|
||||
if (message.info.role === "user") score += MESSAGE_IMPORTANCE.user_request
|
||||
if (message.info.role === "assistant") {
|
||||
if (message.parts.some(p => p.type === "tool" && p.tool === "edit")) {
|
||||
score += MESSAGE_IMPORTANCE.file_edit
|
||||
} else {
|
||||
score += MESSAGE_IMPORTANCE.agent_completion
|
||||
}
|
||||
}
|
||||
|
||||
// Tool call scoring
|
||||
for (const part of message.parts) {
|
||||
if (part.type === "tool") {
|
||||
const toolScore = getToolImportanceScore(part.tool)
|
||||
score += toolScore
|
||||
}
|
||||
}
|
||||
|
||||
return score
|
||||
}
|
||||
|
||||
// Selective history retention
|
||||
async function getOptimizedHistory(
|
||||
sessionID: string,
|
||||
maxTokens: number
|
||||
): Promise<MessageV2.WithParts[]> {
|
||||
const messages = await Session.messages({ sessionID })
|
||||
const scored = messages.map(msg => ({
|
||||
message: msg,
|
||||
score: scoreMessage(msg),
|
||||
tokens: Token.estimate(msg),
|
||||
}))
|
||||
|
||||
scored.sort((a, b) => b.score - a.score)
|
||||
|
||||
const result: MessageV2.WithParts[] = []
|
||||
let usedTokens = 0
|
||||
|
||||
for (const item of scored) {
|
||||
if (usedTokens + item.tokens > maxTokens) break
|
||||
|
||||
// Always keep last user message
|
||||
if (item.message.info.role === "user" &&
|
||||
result.length > 0 &&
|
||||
result[result.length - 1].info.id < item.message.info.id) {
|
||||
result.push(item.message)
|
||||
usedTokens += item.tokens
|
||||
continue
|
||||
}
|
||||
|
||||
// Keep if high importance score
|
||||
if (item.score >= 50) {
|
||||
result.push(item.message)
|
||||
usedTokens += item.tokens
|
||||
}
|
||||
}
|
||||
|
||||
return result.reverse()
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/session/message-v2.ts`
|
||||
- `src/session/prompt.ts`
|
||||
|
||||
---
|
||||
|
||||
### 5. System Prompt Compression
|
||||
|
||||
**Current Behavior:** Provider-specific prompts loaded from text files.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- Audit and compress prompts
|
||||
- Move optional instructions to first user message
|
||||
- Create "minimal" mode for quick tasks
|
||||
|
||||
```typescript
|
||||
// src/session/system.ts - Compressed prompts
|
||||
export namespace SystemPrompt {
|
||||
// Core instructions (always sent)
|
||||
const CORE_PROMPT = `You are an expert software engineering assistant.`
|
||||
|
||||
// Optional instructions (sent based on context)
|
||||
const OPTIONAL_PROMPTS = {
|
||||
code_quality: `Focus on clean, maintainable code with proper error handling.`,
|
||||
testing: `Always write tests for new functionality.`,
|
||||
documentation: `Document complex logic and API surfaces.`,
|
||||
}
|
||||
|
||||
export async function getCompressedPrompt(
|
||||
model: Provider.Model,
|
||||
context: PromptContext
|
||||
): Promise<string[]> {
|
||||
const prompts: string[] = [CORE_PROMPT]
|
||||
|
||||
// Add model-specific base prompt
|
||||
const basePrompt = getBasePrompt(model)
|
||||
prompts.push(basePrompt)
|
||||
|
||||
// Conditionally add optional prompts
|
||||
if (context.needsQualityFocus) {
|
||||
prompts.push(OPTIONAL_PROMPTS.code_quality)
|
||||
}
|
||||
if (context.needsTesting) {
|
||||
prompts.push(OPTIONAL_PROMPTS.testing)
|
||||
}
|
||||
|
||||
return prompts
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/session/system.ts`
|
||||
- `src/session/prompt/*.txt`
|
||||
|
||||
---
|
||||
|
||||
### 6. Smart Grep Result Limits
|
||||
|
||||
**Current Behavior:** Hard limit of 100 matches.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- Reduce default to 50 matches
|
||||
- Add priority scoring based on relevance
|
||||
- Group matches by file
|
||||
|
||||
```typescript
|
||||
// src/tool/grep.ts - Enhanced result handling
|
||||
const DEFAULT_MATCH_LIMIT = 50
|
||||
const PRIORITY_WEIGHTS = {
|
||||
recently_modified: 1.5,
|
||||
same_directory: 1.3,
|
||||
matching_extension: 1.2,
|
||||
exact_match: 1.1,
|
||||
}
|
||||
|
||||
interface MatchPriority {
|
||||
match: Match
|
||||
score: number
|
||||
}
|
||||
|
||||
function scoreMatch(match: Match, context: GrepContext): number {
|
||||
let score = 1.0
|
||||
|
||||
// Recently modified files
|
||||
const fileAge = Date.now() - match.modTime
|
||||
if (fileAge < 7 * 24 * 60 * 60 * 1000) {
|
||||
score *= PRIORITY_WEIGHTS.recently_modified
|
||||
}
|
||||
|
||||
// Same directory as current work
|
||||
if (match.path.startsWith(context.cwd)) {
|
||||
score *= PRIORITY_WEIGHTS.same_directory
|
||||
}
|
||||
|
||||
// Matching extension
|
||||
if (context.targetExtensions.includes(path.extname(match.path))) {
|
||||
score *= PRIORITY_WEIGHTS.matching_extension
|
||||
}
|
||||
|
||||
return score
|
||||
}
|
||||
|
||||
export async function execute(params: GrepParams, ctx: Tool.Context) {
|
||||
const results = await ripgrep(params)
|
||||
const scored: MatchPriority[] = results.map(match => ({
|
||||
match,
|
||||
score: scoreMatch(match, ctx),
|
||||
}))
|
||||
|
||||
scored.sort((a, b) => b.score - a.score)
|
||||
|
||||
const limit = params.limit ?? DEFAULT_MATCH_LIMIT
|
||||
const topMatches = scored.slice(0, limit)
|
||||
|
||||
return formatGroupedOutput(topMatches)
|
||||
}
|
||||
|
||||
function formatGroupedOutput(matches: MatchPriority[]): ToolResult {
|
||||
const byFile = groupBy(matches, m => path.dirname(m.match.path))
|
||||
|
||||
const output: string[] = []
|
||||
output.push(`Found ${matches.length} matches across ${byFile.size} files\n`)
|
||||
|
||||
for (const [dir, fileMatches] of byFile) {
|
||||
output.push(`\n${dir}:`)
|
||||
for (const { match, score } of fileMatches.slice(0, 10)) {
|
||||
const relevance = score > 1.0 ? " [high relevance]" : ""
|
||||
output.push(` Line ${match.lineNum}: ${match.lineText}${relevance}`)
|
||||
}
|
||||
if (fileMatches.length > 10) {
|
||||
output.push(` ... and ${fileMatches.length - 10} more`)
|
||||
}
|
||||
}
|
||||
|
||||
return { output: output.join("\n") }
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/tool/grep.ts`
|
||||
|
||||
---
|
||||
|
||||
### 7. Web Search Context Optimization
|
||||
|
||||
**Current Behavior:** 10,000 character default limit.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- Reduce default to 6,000 characters
|
||||
- Content quality scoring
|
||||
- Query-relevant extraction
|
||||
|
||||
```typescript
|
||||
// src/tool/websearch.ts - Optimized content extraction
|
||||
const DEFAULT_CONTEXT_CHARS = 6000
|
||||
|
||||
interface ContentQualityScore {
|
||||
source: string
|
||||
score: number
|
||||
relevantSections: string[]
|
||||
}
|
||||
|
||||
function scoreAndExtract(
|
||||
content: string,
|
||||
query: string
|
||||
): ContentQualityScore {
|
||||
const paragraphs = content.split(/\n\n+/)
|
||||
const queryTerms = query.toLowerCase().split(/\s+/)
|
||||
|
||||
const scored = paragraphs.map(para => {
|
||||
const lower = para.toLowerCase()
|
||||
const termMatches = queryTerms.filter(term => lower.includes(term)).length
|
||||
const density = termMatches / para.length
|
||||
const position = paragraphs.indexOf(para) / paragraphs.length
|
||||
|
||||
return {
|
||||
para,
|
||||
score: termMatches * 2 + density * 1000 + (1 - position) * 0.5,
|
||||
}
|
||||
})
|
||||
|
||||
scored.sort((a, b) => b.score - a.score)
|
||||
|
||||
const relevant: string[] = []
|
||||
let usedChars = 0
|
||||
|
||||
for (const { para } of scored) {
|
||||
if (usedChars + para.length > DEFAULT_CONTEXT_CHARS) break
|
||||
relevant.push(para)
|
||||
usedChars += para.length
|
||||
}
|
||||
|
||||
return {
|
||||
source: content.substring(0, DEFAULT_CONTEXT_CHARS),
|
||||
score: scored[0]?.score ?? 0,
|
||||
relevantSections: relevant,
|
||||
}
|
||||
}
|
||||
|
||||
export async function execute(params: WebSearchParams, ctx: Tool.Context) {
|
||||
const response = await exaSearch({
|
||||
query: params.query,
|
||||
numResults: params.numResults ?? 8,
|
||||
type: params.type ?? "auto",
|
||||
livecrawl: params.livecrawl ?? "fallback",
|
||||
contextMaxCharacters: DEFAULT_CONTEXT_CHARS,
|
||||
})
|
||||
|
||||
const optimized = optimizeResults(response.results, params.query)
|
||||
|
||||
return {
|
||||
output: formatOptimizedResults(optimized),
|
||||
metadata: {
|
||||
originalChars: response.totalChars,
|
||||
optimizedChars: optimized.totalChars,
|
||||
savings: 1 - (optimized.totalChars / response.totalChars),
|
||||
},
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/tool/websearch.ts`
|
||||
|
||||
---
|
||||
|
||||
### 8. File Read Optimization
|
||||
|
||||
**Current Behavior:** Full file content sent unless offset/limit specified.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- Default limits based on file type
|
||||
- Smart offset detection (function boundaries)
|
||||
|
||||
```typescript
|
||||
// src/tool/read.ts - Optimized file reading
|
||||
const FILE_TYPE_CONFIGS: Record<string, FileReadConfig> = {
|
||||
".json": { defaultLimit: Infinity, truncate: false },
|
||||
".md": { defaultLimit: 2000, truncate: true },
|
||||
".ts": { defaultLimit: 400, truncate: true },
|
||||
".js": { defaultLimit: 400, truncate: true },
|
||||
".py": { defaultLimit: 400, truncate: true },
|
||||
".yml": { defaultLimit: 500, truncate: true },
|
||||
".yaml": { defaultLimit: 500, truncate: true },
|
||||
".txt": { defaultLimit: 1000, truncate: true },
|
||||
default: { defaultLimit: 300, truncate: true },
|
||||
}
|
||||
|
||||
export async function execute(params: ReadParams, ctx: Tool.Context) {
|
||||
const ext = path.extname(params.filePath)
|
||||
const config = FILE_TYPE_CONFIGS[ext] ?? FILE_TYPE_CONFIGS.default
|
||||
|
||||
const offset = params.offset ?? 0
|
||||
const limit = params.limit ?? config.defaultLimit
|
||||
|
||||
const file = Bun.file(params.filePath)
|
||||
const content = await file.text()
|
||||
const lines = content.split("\n")
|
||||
|
||||
if (!config.truncate || lines.length <= limit + offset) {
|
||||
return {
|
||||
output: content,
|
||||
attachments: [],
|
||||
}
|
||||
}
|
||||
|
||||
const displayedLines = lines.slice(offset, offset + limit)
|
||||
const output = [
|
||||
...displayedLines,
|
||||
"",
|
||||
`... ${lines.length - displayedLines.length} lines truncated ...`,
|
||||
"",
|
||||
`File: ${params.filePath}`,
|
||||
`Lines: ${offset + 1}-${offset + limit} of ${lines.length}`,
|
||||
].join("\n")
|
||||
|
||||
return {
|
||||
output,
|
||||
attachments: [{
|
||||
type: "file",
|
||||
filename: params.filePath,
|
||||
mime: mime.lookup(params.filePath) || "text/plain",
|
||||
url: `data:text/plain;base64,${Buffer.from(content).toString("base64")}`,
|
||||
}],
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/tool/read.ts`
|
||||
|
||||
---
|
||||
|
||||
### 9. Context Window Budgeting
|
||||
|
||||
**Current Behavior:** Fixed 32K output token reservation.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- Dynamic budget allocation based on task type
|
||||
- Model-specific optimizations
|
||||
|
||||
```typescript
|
||||
// src/session/prompt.ts - Dynamic budget allocation
|
||||
const TASK_BUDGETS: Record<string, TaskBudget> = {
|
||||
code_generation: { inputRatio: 0.5, outputRatio: 0.5 },
|
||||
exploration: { inputRatio: 0.8, outputRatio: 0.2 },
|
||||
qa: { inputRatio: 0.7, outputRatio: 0.3 },
|
||||
refactoring: { inputRatio: 0.6, outputRatio: 0.4 },
|
||||
debugging: { inputRatio: 0.7, outputRatio: 0.3 },
|
||||
default: { inputRatio: 0.6, outputRatio: 0.4 },
|
||||
}
|
||||
|
||||
interface BudgetCalculation {
|
||||
inputBudget: number
|
||||
outputBudget: number
|
||||
totalBudget: number
|
||||
}
|
||||
|
||||
function calculateBudget(
|
||||
model: Provider.Model,
|
||||
taskType: string,
|
||||
estimatedInputTokens: number
|
||||
): BudgetCalculation {
|
||||
const config = TASK_BUDGETS[taskType] ?? TASK_BUDGETS.default
|
||||
const modelContext = model.limit.context
|
||||
const modelMaxOutput = model.limit.output
|
||||
|
||||
// Dynamic budget based on task type
|
||||
const baseBudget = Math.min(modelContext, estimatedInputTokens * 2)
|
||||
const outputBudget = Math.min(
|
||||
modelMaxOutput,
|
||||
Math.floor(baseBudget * config.outputRatio),
|
||||
SessionPrompt.OUTPUT_TOKEN_MAX
|
||||
)
|
||||
const inputBudget = Math.floor(baseBudget * config.inputRatio)
|
||||
|
||||
return {
|
||||
inputBudget,
|
||||
outputBudget,
|
||||
totalBudget: inputBudget + outputBudget,
|
||||
}
|
||||
}
|
||||
|
||||
async function checkAndAdjustPrompt(
|
||||
messages: ModelMessage[],
|
||||
budget: BudgetCalculation
|
||||
): Promise<ModelMessage[]> {
|
||||
const currentTokens = Token.estimateMessages(messages)
|
||||
|
||||
if (currentTokens <= budget.inputBudget) {
|
||||
return messages
|
||||
}
|
||||
|
||||
// Need to reduce - prioritize recent messages
|
||||
const result = await pruneMessagesToBudget(messages, budget.inputBudget)
|
||||
return result
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- `src/session/prompt.ts`
|
||||
- `src/session/compaction.ts`
|
||||
|
||||
---
|
||||
|
||||
### 10. Duplicate Detection
|
||||
|
||||
**Current Behavior:** No deduplication of content.
|
||||
|
||||
**Improvements:**
|
||||
|
||||
- Hash and track tool outputs
|
||||
- Skip identical subsequent calls
|
||||
- Cache read file contents
|
||||
|
||||
```typescript
|
||||
// src/session/duplicate-detection.ts
|
||||
const outputHashCache = new Map<string, string>()
|
||||
|
||||
function getContentHash(content: string): string {
|
||||
return Bun.hash(content, "sha256").toString()
|
||||
}
|
||||
|
||||
async function deduplicateToolOutput(
|
||||
toolId: string,
|
||||
input: Record<string, unknown>,
|
||||
content: string
|
||||
): Promise<{ isDuplicate: boolean; output: string }> {
|
||||
const hash = getContentHash(content)
|
||||
const key = `${toolId}:${JSON.stringify(input)}:${hash}`
|
||||
|
||||
if (outputHashCache.has(key)) {
|
||||
return {
|
||||
isDuplicate: true,
|
||||
output: outputHashCache.get(key)!,
|
||||
}
|
||||
}
|
||||
|
||||
outputHashCache.set(key, content)
|
||||
return { isDuplicate: false, output: content }
|
||||
}
|
||||
|
||||
// In tool execution
|
||||
async function executeTool(tool: Tool.Info, args: Record<string, unknown>) {
|
||||
const content = await tool.execute(args)
|
||||
|
||||
const { isDuplicate, output } = await deduplicateToolOutput(
|
||||
tool.id,
|
||||
args,
|
||||
content.output
|
||||
)
|
||||
|
||||
if (isDuplicate) {
|
||||
log.debug("Skipping duplicate tool output", { tool: tool.id })
|
||||
return {
|
||||
...content,
|
||||
output: `[Previous identical output: ${content.output.substring(0, 100)}...]`,
|
||||
metadata: { ...content.metadata, duplicate: true },
|
||||
}
|
||||
}
|
||||
|
||||
return content
|
||||
}
|
||||
```
|
||||
|
||||
**Files to Modify:**
|
||||
- Add `src/session/duplicate-detection.ts`
|
||||
- `src/tool/tool.ts`
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Phase 1: High Impact, Low Risk
|
||||
|
||||
| Priority | Improvement | Estimated Token Savings | Risk |
|
||||
|----------|-------------|------------------------|------|
|
||||
| 1 | Enhanced Token Estimation | 5-15% | Low |
|
||||
| 2 | Smart Grep Limits | 10-20% | Low |
|
||||
| 3 | Web Search Optimization | 20-30% | Low |
|
||||
| 4 | System Prompt Compression | 5-10% | Low |
|
||||
|
||||
### Phase 2: Medium Impact, Medium Risk
|
||||
|
||||
| Priority | Improvement | Estimated Token Savings | Risk |
|
||||
|----------|-------------|------------------------|------|
|
||||
| 5 | Tool Output Management | 15-25% | Medium |
|
||||
| 6 | Message History Optimization | 20-30% | Medium |
|
||||
| 7 | File Read Limits | 10-20% | Medium |
|
||||
|
||||
### Phase 3: High Impact, Higher Complexity
|
||||
|
||||
| Priority | Improvement | Estimated Token Savings | Risk |
|
||||
|----------|-------------|------------------------|------|
|
||||
| 8 | Smart Compaction | 25-40% | High |
|
||||
| 9 | Context Budgeting | 15-25% | High |
|
||||
| 10 | Duplicate Detection | 10-15% | Medium |
|
||||
|
||||
---
|
||||
|
||||
## Quality Preservation
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
1. **A/B Testing:** Compare outputs before/after each optimization
|
||||
2. **Quality Metrics:** Track success rate, user satisfaction, task completion
|
||||
3. **Rollback Mechanism:** Config flags to disable optimizations per-session
|
||||
|
||||
```typescript
|
||||
// Config schema for optimization controls
|
||||
const OptimizationConfig = z.object({
|
||||
smart_compaction: z.boolean().default(true),
|
||||
enhanced_estimation: z.boolean().default(true),
|
||||
smart_truncation: z.boolean().default(true),
|
||||
message_pruning: z.boolean().default(true),
|
||||
system_prompt_compression: z.boolean().default(true),
|
||||
grep_optimization: z.boolean().default(true),
|
||||
websearch_optimization: z.boolean().default(true),
|
||||
file_read_limits: z.boolean().default(true),
|
||||
context_budgeting: z.boolean().default(true),
|
||||
duplicate_detection: z.boolean().default(true),
|
||||
})
|
||||
|
||||
// Usage
|
||||
const config = await Config.get()
|
||||
const optimizations = config.optimizations ?? {}
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
|
||||
```typescript
|
||||
// Token efficiency metrics
|
||||
export async function trackTokenMetrics(sessionID: string) {
|
||||
const messages = await Session.messages({ sessionID })
|
||||
|
||||
const metrics = {
|
||||
totalTokens: 0,
|
||||
inputTokens: 0,
|
||||
outputTokens: 0,
|
||||
optimizationSavings: 0,
|
||||
compactionCount: 0,
|
||||
truncationCount: 0,
|
||||
}
|
||||
|
||||
for (const msg of messages) {
|
||||
metrics.totalTokens += msg.tokens.input + msg.tokens.output
|
||||
metrics.inputTokens += msg.tokens.input
|
||||
metrics.outputTokens += msg.tokens.output
|
||||
|
||||
if (msg.info.mode === "compaction") {
|
||||
metrics.compactionCount++
|
||||
}
|
||||
}
|
||||
|
||||
return metrics
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Token optimization controls
|
||||
OPENCODE_TOKEN_ESTIMATION=accurate # accurate (tiktoken) or legacy (4:1)
|
||||
OPENCODE_TRUNCATION_MODE=smart # smart or legacy (fixed limits)
|
||||
OPENCODE_COMPACTION_THRESHOLD=0.7 # trigger at 70% of context
|
||||
OPENCODE_GREP_LIMIT=50 # default match limit
|
||||
OPENCODE_WEBSEARCH_CHARS=6000 # default context characters
|
||||
OPENCODE_FILE_READ_LIMIT=400 # default lines for code files
|
||||
OPENCODE_OUTPUT_BUDGET_RATIO=0.4 # percentage for output
|
||||
OPENCODE_DUPLICATE_DETECTION=true # enable cache
|
||||
```
|
||||
|
||||
### Per-Model Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"models": {
|
||||
"gpt-4o": {
|
||||
"context_limit": 128000,
|
||||
"output_limit": 16384,
|
||||
"token_budget": {
|
||||
"code_generation": { "input_ratio": 0.5, "output_ratio": 0.5 },
|
||||
"exploration": { "input_ratio": 0.8, "output_ratio": 0.2 }
|
||||
}
|
||||
},
|
||||
"claude-sonnet-4-20250514": {
|
||||
"context_limit": 200000,
|
||||
"output_limit": 8192,
|
||||
"supports_prompt_cache": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### Upgrading from Legacy Token Estimation
|
||||
|
||||
```typescript
|
||||
// Before (4:1 ratio)
|
||||
const tokens = content.length / 4
|
||||
|
||||
// After (tiktoken)
|
||||
const tokens = Token.estimate(content)
|
||||
```
|
||||
|
||||
### Upgrading from Legacy Truncation
|
||||
|
||||
```typescript
|
||||
// Before (fixed limits)
|
||||
if (lines.length > 2000 || bytes > 51200) {
|
||||
truncate(content)
|
||||
}
|
||||
|
||||
// After (smart truncation)
|
||||
const result = await Truncate.smart(content, {
|
||||
fileType: detectFileType(content),
|
||||
maxTokens: 8000,
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Measure First:** Always measure token usage before and after changes
|
||||
2. **Incrementally Roll Out:** Deploy optimizations gradually
|
||||
3. **User Control:** Allow users to override defaults
|
||||
4. **Monitor Quality:** Track task success rates alongside token savings
|
||||
5. **Fallback Ready:** Have fallback mechanisms for when optimizations fail
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- **Files:** `src/util/token.ts`, `src/tool/truncation.ts`, `src/session/compaction.ts`, `src/session/prompt.ts`, `src/session/message-v2.ts`, `src/tool/grep.ts`, `src/tool/websearch.ts`, `src/tool/read.ts`, `src/session/system.ts`
|
||||
- **Dependencies:** `tiktoken`, `@dqbd/tiktoken`
|
||||
- **Related Issues:** Context overflow handling, token tracking, prompt optimization
|
||||
Reference in New Issue
Block a user