How the measurement was made

Capture one real SDK request, count three valid variants, then subtract adjacent results.

  1. Capture the request

    Run the Agent SDK's query() against a loopback endpoint and record the outgoing Messages request before inference. The capture uses isolated settings and no claude_code preset.

  2. Count three valid requests

    Because count_tokens accepts complete Messages requests, each valid variant removes one category from the same captured request while holding everything else fixed.

    1. A Full captured request All captured countable context 31,432 tokens
    2. B Without tool definitions The same request with tools removed 1,787 tokens
    3. C Without tools or skill descriptions The same request with both removed 301 tokens
  3. Subtract adjacent counts

    Subtract adjacent results to isolate the marginal count of each removed category. The three derived values reconstruct the full request.

    Tools included
    31,432 - 1,787
    29,645 tokens
    Skill descriptions
    1,787 - 301
    1,486 tokens
    Base request
    301 remaining
    301 tokens

    Removing the user prompt returned 31,428 tokens, a 4-token difference. This supports the headline but is outside the additive accounting.

How to read the numbers

The overview is additive. Item estimates are not.

Controls

Each comparison keeps the model, system, messages, thinking, output configuration, beta headers, and other fields fixed.

Meaning

Anthropic calls token counting an estimate and may add optimization tokens. The differences measure marginal request cost, not standalone JSON tokenization.

Item estimates

The 40 per-item calls compare items against one baseline. Use them to rank items; do not sum them.

Reproduce the investigation

One entrypoint rebuilds the evidence chain.

The entrypoint captures, counts, and rebuilds the publication.