How the measurement was made

Capture one real SDK request, count three valid variants, then subtract adjacent results.

Capture the request
Run the Agent SDK's query() against a loopback endpoint and record the outgoing Messages request before inference. The capture uses isolated settings and no claude_code preset.
- Capture procedure SDK request capture
- Observed request Captured SDK request
- Provenance record Capture provenance
Count three valid requests
Because count_tokens accepts complete Messages requests, each valid variant removes one category from the same captured request while holding everything else fixed.
1. A Full captured request All captured countable context 31,432 tokens
2. B Without tool definitions The same request with tools removed 1,787 tokens
3. C Without tools or skill descriptions The same request with both removed 301 tokens
- Measurement specification Measurement plan
- Measurement ledger Raw count records
Subtract adjacent counts
Subtract adjacent results to isolate the marginal count of each removed category. The three derived values reconstruct the full request.

Tools included

31,432 - 1,787

29,645 tokens

Skill descriptions

1,787 - 301

1,486 tokens

Base request

301 remaining

301 tokens

Removing the user prompt returned 31,428 tokens, a 4-token difference. This supports the headline but is outside the additive accounting.
- Derivation procedure Token measurement
- Generated result Machine-readable result

How to read the numbers

The overview is additive. Item estimates are not.

Controls

Each comparison keeps the model, system, messages, thinking, output configuration, beta headers, and other fields fixed.

Meaning

Anthropic calls token counting an estimate and may add optimization tokens. The differences measure marginal request cost, not standalone JSON tokenization.

Item estimates

The 40 per-item calls compare items against one baseline. Use them to rank items; do not sum them.

Interpretation constraints Primary sources

Reproduce the investigation

One entrypoint rebuilds the evidence chain.

The entrypoint captures, counts, and rebuilds the publication.

Replay entrypoint Pipeline entrypoint