10 KiB
Token Tracking Improvements
Summary
Fixed all identified token tracking issues to ensure accurate token reporting from OpenCode. The system now provides comprehensive logging at every extraction attempt with detailed failure reasons, making it clear when estimation is being used and why.
Issues Fixed
1. ✅ Multiple Fallback Layers Obscure Real Usage
Before: Each fallback layer failed silently, making it unclear which method provided the count. After:
- Added
tokenExtractionLogarray that tracks every extraction attempt - Each method logs success/failure with detailed reasons
- Final log shows complete extraction history
- Token source is explicitly tracked:
'stream','response-openai','response-google','response-direct','output-json','output-text','session', or'estimate-improved'
2. ✅ Token Extraction from OpenCode Output is Unreliable
Before: Relied on regex pattern matching without verification. After:
- Added validation for all extracted token values
- Logs exactly which pattern matched and what value was found
- If pattern matches but validation fails, it's logged with reason
- Tracks ANSI stripping and format issues in logs
3. ✅ Session Info API Not Guaranteed to Work
Before: Multiple command variations tried but failures were generic. After:
- Enhanced
getOpencodeSessionTokenUsage()with comprehensive logging:- Logs each command attempt with full command string
- Logs response details (stdout/stderr sample)
- Logs JSON parsing attempts and available keys
- Logs validation of extracted values
- Returns detailed attempt history
- Uses emojis for visual clarity: 🔍 (searching), ✓ (success), ✗ (failed), ⚠️ (warning)
4. ✅ Streaming Capture is Opportunistic
Before: Silently failed if tokens didn't appear during streaming. After:
- Logs when stream capture is attempted
- Logs if
message.opencodeTokensUsedis not set - Validates captured tokens before accepting
- Tracks whether streaming was available
5. ✅ No Verification or Retry Logic
Before: No validation that extracted tokens are reasonable. After:
- Added
validateTokenCount()function that checks:- Token is a positive finite number
- Token doesn't exceed reasonable maximum (1M)
- Chars-per-token ratio is within reasonable bounds (0.5-15)
- All extraction methods validate tokens before accepting
- Failed validations are logged with specific reason
6. ✅ Estimation is Always Used as Fallback
Before: Impossible to know when real tracking failed vs when estimation was needed. After:
- Estimation only used after all extraction methods fail
- Logs comprehensive details when falling back to estimation:
- Input/output token breakdown
- Content lengths
- Calculation formula
- Complete extraction attempt history
- Emits console warning:
[TOKEN_TRACKING] ⚠️ ESTIMATION USED:with full context - Token source explicitly marked as
'estimate-improved'
7. ✅ Provider Usage Recording May Not Match Actual Consumption
Before: Provider limits based on potentially inaccurate estimates. After:
- Provider usage recording now happens after validation
- Token source is tracked alongside usage
- Logs clearly indicate when estimated vs actual tokens are recorded
- Can audit provider usage accuracy via token source
8. ✅ Early Termination/Error Cases Default to Estimation
Before: Error paths skipped real token counting. After:
- Extraction attempts still occur even in error cases
- Logs show why extraction failed (missing session, no output, etc.)
- Estimation details include context about error condition
- Can distinguish "no tokens found" from "extraction failed"
New Logging Structure
Successful Token Extraction
✓ Token extraction: Using token usage captured from stream
{ tokensUsed: 1234, tokenSource: 'stream', messageId: 'msg_abc' }
✅ Token extraction successful
{ tokensUsed: 1234, tokenSource: 'stream', extractionLog: [...] }
Failed Extraction → Estimation
✗ Token extraction: Stream tokens failed validation
{ tokens: 9999999, reason: 'tokens exceeds reasonable maximum of 1000000' }
✗ Token extraction: No token fields found in parsed response
{ availableKeys: ['reply', 'model'], checkedFields: [...] }
✗ Token extraction: Session query returned 0 tokens
⚠️ Token extraction: All methods failed, will fall back to estimation
{ extractionLog: [...full history...], hadStream: false, hadParsed: true }
[TOKEN_TRACKING] ⚠️ ESTIMATION USED: {
"messageId": "msg_abc",
"model": "gpt-4",
"estimatedTokens": 150,
"inputTokens": 100,
"outputTokens": 80,
"reason": "All token extraction methods failed"
}
Session Query Logging
🔍 getOpencodeSessionTokenUsage: Starting session token query
{ sessionId: 'ses_123', candidateCount: 5 }
→ Trying: opencode session info --id ses_123 --json
← Response received { hasStdout: true, stdoutLength: 245 }
✓ JSON parse successful { parsedKeys: ['session', 'tokens'] }
✅ getOpencodeSessionTokenUsage: Successfully extracted tokens from JSON
{ tokens: 1234, command: 'session info --id ses_123 --json' }
Code Changes
Files Modified
chat/server.js- Main implementation file
Functions Enhanced
-
sendToOpencode()(lines ~5990-6160)- Added
tokenExtractionLogarray - Added validation for all extraction methods
- Enhanced logging with ✓/✗ indicators
- Returns
{ ..., tokenSource, tokenExtractionLog }
- Added
-
getOpencodeSessionTokenUsage()(lines ~9651-9750)- Complete rewrite with detailed logging
- Tracks all command attempts
- Validates extracted values
- Returns attempt history in logs
-
sendToOpencodeWithFallback()(lines ~6324-6430)- Uses improved token extraction from
sendToOpencode() - Adds validation and logging at fallback level
- Emits console warnings for estimation
- Passes through
tokenSourceandtokenExtractionLog
- Uses improved token extraction from
-
processMessage()(lines ~6630-6730)- Uses
tokenSourceandtokenExtractionLogfrom result - Adds comprehensive estimation logging
- Tracks extraction attempts across entire message flow
- Emits detailed console warnings
- Uses
-
NEW:
validateTokenCount()(lines ~4271-4330)- Validates token is positive finite number
- Checks against reasonable maximum (1M)
- Validates chars-per-token ratio (0.5-15)
- Returns
{ valid, reason, adjustedTokens? }
Testing Instructions
1. Monitor Logs for Token Extraction
Start the server and watch logs:
cd chat
npm start
Send a message and look for token extraction logs:
- Should see 🔍 for search attempts
- Should see ✓ for successful extraction
- Should see ✗ for failed attempts
- Should see ⚠️ if estimation is used
2. Verify Token Source Tracking
Check console output for each message:
[USAGE] processMessage: recording tokens user=user_123 tokens=150 model=gpt-4 source=stream
Token source should be one of:
stream- Captured during CLI streamingresponse-openai- From OpenAI format responseresponse-google- From Google/Gemini formatresponse-direct- From direct token fieldsoutput-json- Parsed from JSON in outputoutput-text- Parsed from text patternssession- From session info queryestimate-improved- Estimation fallback
3. Test Estimation Scenarios
To verify estimation logging:
- Use a model that doesn't report tokens
- Disable session info (remove workspace dir)
- Check for console warning:
[TOKEN_TRACKING] ⚠️ ESTIMATION USED: {
"messageId": "...",
"estimatedTokens": 150,
"inputTokens": 100,
"outputTokens": 80,
"reason": "All token extraction methods failed",
"extractionLog": [...]
}
4. Verify Validation
Test validation by checking logs for:
- Very large token counts (> 1M) → should be rejected
- Suspicious chars-per-token ratios → should be logged
- Token validation failures → should fall back to next method
5. Session Info Query Testing
Check session query logs:
# Look for detailed session query logs
grep "getOpencodeSessionTokenUsage" chat/logs/*.log
Should see:
- All command attempts
- Response samples
- JSON parsing results
- Validation results
Debugging Guide
When Tokens Are Estimated
If you see estimation warnings, check the extractionLog field to see why:
{
"extractionLog": [
{ "method": "stream", "success": false, "reason": "message.opencodeTokensUsed not set during streaming" },
{ "method": "parsed_response", "success": false, "reason": "no parsed response available" },
{ "method": "text_parsing", "success": false, "reason": "no finalOutput available" },
{ "method": "session_query", "success": false, "reason": "no opencodeSessionId" }
]
}
This tells you exactly which methods were tried and why each failed.
Common Failure Reasons
Stream capture fails:
message.opencodeTokensUsed not set during streaming- OpenCode didn't output tokens in expected format during streaming
Parsed response fails:
None of the expected token fields found- Response doesn't have standard token fieldsparsed is not an object- Response couldn't be parsed as JSON
Text parsing fails:
No JSON token patterns matched- Output doesn't contain JSON token dataNo text token patterns matched- Output doesn't contain text like "tokens: 123"
Session query fails:
no opencodeSessionId- Session ID not availablesession query returned 0 tokens- Command succeeded but found no tokensCommand execution failed- OpenCode CLI command failed (see error details)
Validation fails:
tokens exceeds reasonable maximum- Suspiciously high token countchars per token is suspiciously low/high- Token count doesn't match content length
Future Improvements
Potential enhancements (not implemented yet):
- Store
tokenSourcein message record for historical analysis - Add metrics dashboard showing token source distribution
- Implement retry logic with exponential backoff for session queries
- Add post-completion verification step that double-checks token counts
- Query OpenCode session logs/files directly if API calls fail
- Implement confidence scoring for extracted tokens
- Add alerting when estimation rate exceeds threshold
Files Reference
- Implementation:
chat/server.js - This Document:
TOKEN_TRACKING_IMPROVEMENTS.md - Related Docs:
TOKEN_TRACKING_FIXES.md- Previous multiplier fixesTOKEN_USAGE_IMPLEMENTATION.md- Overall system architectureTOKEN_USAGE_COMPLETION_SUMMARY.md- Completion summary