Files

southseact-3d ed67b7741b Restore to commit 74e578279624c6045ca440a3459ebfa1f8d54191

2026-02-07 20:32:41 +00:00

10 KiB

Raw Blame History

Token Tracking Improvements

Summary

Fixed all identified token tracking issues to ensure accurate token reporting from OpenCode. The system now provides comprehensive logging at every extraction attempt with detailed failure reasons, making it clear when estimation is being used and why.

Issues Fixed

1. ✅ Multiple Fallback Layers Obscure Real Usage

Before: Each fallback layer failed silently, making it unclear which method provided the count. After:

Added tokenExtractionLog array that tracks every extraction attempt
Each method logs success/failure with detailed reasons
Final log shows complete extraction history
Token source is explicitly tracked: 'stream', 'response-openai', 'response-google', 'response-direct', 'output-json', 'output-text', 'session', or 'estimate-improved'

2. ✅ Token Extraction from OpenCode Output is Unreliable

Before: Relied on regex pattern matching without verification. After:

Added validation for all extracted token values
Logs exactly which pattern matched and what value was found
If pattern matches but validation fails, it's logged with reason
Tracks ANSI stripping and format issues in logs

3. ✅ Session Info API Not Guaranteed to Work

Before: Multiple command variations tried but failures were generic. After:

Enhanced getOpencodeSessionTokenUsage() with comprehensive logging:
- Logs each command attempt with full command string
- Logs response details (stdout/stderr sample)
- Logs JSON parsing attempts and available keys
- Logs validation of extracted values
- Returns detailed attempt history
Uses emojis for visual clarity: 🔍 (searching), ✓ (success), ✗ (failed), ⚠️ (warning)

4. ✅ Streaming Capture is Opportunistic

Before: Silently failed if tokens didn't appear during streaming. After:

Logs when stream capture is attempted
Logs if message.opencodeTokensUsed is not set
Validates captured tokens before accepting
Tracks whether streaming was available

5. ✅ No Verification or Retry Logic

Before: No validation that extracted tokens are reasonable. After:

Added validateTokenCount() function that checks:
- Token is a positive finite number
- Token doesn't exceed reasonable maximum (1M)
- Chars-per-token ratio is within reasonable bounds (0.5-15)
All extraction methods validate tokens before accepting
Failed validations are logged with specific reason

6. ✅ Estimation is Always Used as Fallback

Before: Impossible to know when real tracking failed vs when estimation was needed. After:

Estimation only used after all extraction methods fail
Logs comprehensive details when falling back to estimation:
- Input/output token breakdown
- Content lengths
- Calculation formula
- Complete extraction attempt history
Emits console warning: [TOKEN_TRACKING] ⚠️ ESTIMATION USED: with full context
Token source explicitly marked as 'estimate-improved'

7. ✅ Provider Usage Recording May Not Match Actual Consumption

Before: Provider limits based on potentially inaccurate estimates. After:

Provider usage recording now happens after validation
Token source is tracked alongside usage
Logs clearly indicate when estimated vs actual tokens are recorded
Can audit provider usage accuracy via token source

8. ✅ Early Termination/Error Cases Default to Estimation

Before: Error paths skipped real token counting. After:

Extraction attempts still occur even in error cases
Logs show why extraction failed (missing session, no output, etc.)
Estimation details include context about error condition
Can distinguish "no tokens found" from "extraction failed"

New Logging Structure

Successful Token Extraction

✓ Token extraction: Using token usage captured from stream
  { tokensUsed: 1234, tokenSource: 'stream', messageId: 'msg_abc' }
✅ Token extraction successful
  { tokensUsed: 1234, tokenSource: 'stream', extractionLog: [...] }

Failed Extraction → Estimation

✗ Token extraction: Stream tokens failed validation
  { tokens: 9999999, reason: 'tokens exceeds reasonable maximum of 1000000' }
✗ Token extraction: No token fields found in parsed response
  { availableKeys: ['reply', 'model'], checkedFields: [...] }
✗ Token extraction: Session query returned 0 tokens
⚠️ Token extraction: All methods failed, will fall back to estimation
  { extractionLog: [...full history...], hadStream: false, hadParsed: true }
[TOKEN_TRACKING] ⚠️ ESTIMATION USED: {
  "messageId": "msg_abc",
  "model": "gpt-4",
  "estimatedTokens": 150,
  "inputTokens": 100,
  "outputTokens": 80,
  "reason": "All token extraction methods failed"
}

Session Query Logging

🔍 getOpencodeSessionTokenUsage: Starting session token query
  { sessionId: 'ses_123', candidateCount: 5 }
  → Trying: opencode session info --id ses_123 --json
  ← Response received { hasStdout: true, stdoutLength: 245 }
  ✓ JSON parse successful { parsedKeys: ['session', 'tokens'] }
✅ getOpencodeSessionTokenUsage: Successfully extracted tokens from JSON
  { tokens: 1234, command: 'session info --id ses_123 --json' }

Code Changes

Files Modified

chat/server.js - Main implementation file

Functions Enhanced

sendToOpencode() (lines ~5990-6160)
- Added tokenExtractionLog array
- Added validation for all extraction methods
- Enhanced logging with ✓/✗ indicators
- Returns { ..., tokenSource, tokenExtractionLog }
getOpencodeSessionTokenUsage() (lines ~9651-9750)
- Complete rewrite with detailed logging
- Tracks all command attempts
- Validates extracted values
- Returns attempt history in logs
sendToOpencodeWithFallback() (lines ~6324-6430)
- Uses improved token extraction from sendToOpencode()
- Adds validation and logging at fallback level
- Emits console warnings for estimation
- Passes through tokenSource and tokenExtractionLog
processMessage() (lines ~6630-6730)
- Uses tokenSource and tokenExtractionLog from result
- Adds comprehensive estimation logging
- Tracks extraction attempts across entire message flow
- Emits detailed console warnings
NEW: validateTokenCount() (lines ~4271-4330)
- Validates token is positive finite number
- Checks against reasonable maximum (1M)
- Validates chars-per-token ratio (0.5-15)
- Returns { valid, reason, adjustedTokens? }

Testing Instructions

1. Monitor Logs for Token Extraction

Start the server and watch logs:

cd chat
npm start

Send a message and look for token extraction logs:

Should see 🔍 for search attempts
Should see ✓ for successful extraction
Should see ✗ for failed attempts
Should see ⚠️ if estimation is used

2. Verify Token Source Tracking

Check console output for each message:

[USAGE] processMessage: recording tokens user=user_123 tokens=150 model=gpt-4 source=stream

Token source should be one of:

stream - Captured during CLI streaming
response-openai - From OpenAI format response
response-google - From Google/Gemini format
response-direct - From direct token fields
output-json - Parsed from JSON in output
output-text - Parsed from text patterns
session - From session info query
estimate-improved - Estimation fallback

3. Test Estimation Scenarios

To verify estimation logging:

Use a model that doesn't report tokens
Disable session info (remove workspace dir)
Check for console warning:

[TOKEN_TRACKING] ⚠️ ESTIMATION USED: {
  "messageId": "...",
  "estimatedTokens": 150,
  "inputTokens": 100,
  "outputTokens": 80,
  "reason": "All token extraction methods failed",
  "extractionLog": [...]
}

4. Verify Validation

Test validation by checking logs for:

Very large token counts (> 1M) → should be rejected
Suspicious chars-per-token ratios → should be logged
Token validation failures → should fall back to next method

5. Session Info Query Testing

Check session query logs:

# Look for detailed session query logs
grep "getOpencodeSessionTokenUsage" chat/logs/*.log

Should see:

All command attempts
Response samples
JSON parsing results
Validation results

Debugging Guide

When Tokens Are Estimated

If you see estimation warnings, check the extractionLog field to see why:

{
  "extractionLog": [
    { "method": "stream", "success": false, "reason": "message.opencodeTokensUsed not set during streaming" },
    { "method": "parsed_response", "success": false, "reason": "no parsed response available" },
    { "method": "text_parsing", "success": false, "reason": "no finalOutput available" },
    { "method": "session_query", "success": false, "reason": "no opencodeSessionId" }
  ]
}

This tells you exactly which methods were tried and why each failed.

Common Failure Reasons

Stream capture fails:

message.opencodeTokensUsed not set during streaming - OpenCode didn't output tokens in expected format during streaming

Parsed response fails:

None of the expected token fields found - Response doesn't have standard token fields
parsed is not an object - Response couldn't be parsed as JSON

Text parsing fails:

No JSON token patterns matched - Output doesn't contain JSON token data
No text token patterns matched - Output doesn't contain text like "tokens: 123"

Session query fails:

no opencodeSessionId - Session ID not available
session query returned 0 tokens - Command succeeded but found no tokens
Command execution failed - OpenCode CLI command failed (see error details)

Validation fails:

tokens exceeds reasonable maximum - Suspiciously high token count
chars per token is suspiciously low/high - Token count doesn't match content length

Future Improvements

Potential enhancements (not implemented yet):

Store tokenSource in message record for historical analysis
Add metrics dashboard showing token source distribution
Implement retry logic with exponential backoff for session queries
Add post-completion verification step that double-checks token counts
Query OpenCode session logs/files directly if API calls fail
Implement confidence scoring for extracted tokens
Add alerting when estimation rate exceeds threshold

Files Reference

Implementation: chat/server.js
This Document: TOKEN_TRACKING_IMPROVEMENTS.md
Related Docs:
- TOKEN_TRACKING_FIXES.md - Previous multiplier fixes
- TOKEN_USAGE_IMPLEMENTATION.md - Overall system architecture
- TOKEN_USAGE_COMPLETION_SUMMARY.md - Completion summary

10 KiB Raw Blame History