Why does the same text produce different token counts on different models?

Each model is trained with its own tokenizer and its own vocabulary. GPT-4o's o200k_base vocabulary, for example, is much more efficient on non-English text than GPT-4's cl100k_base. Claude and Llama use SentencePiece-based tokenizers that split text differently again. The same paragraph can easily differ by 20-40% across models.

Tiktoken is OpenAI's open-source byte-pair-encoding (BPE) tokenizer. cl100k_base is the vocabulary used by GPT-3.5 and GPT-4. o200k_base is used by GPT-4o and the newer o1/o3 reasoning models. This tool ships a JavaScript port of tiktoken so the same counts you'd get from the OpenAI API are produced locally.

Why are non-English prompts often more expensive?

Most tokenizer vocabularies were trained predominantly on English text. Languages with non-Latin scripts, accented characters, or compound words often split into more tokens per character. GPT-4o's o200k_base partially fixes this for major languages, but it is still worth checking with this tool before sending non-English prompts at scale.

LLM Token Counter for GPT, Claude, Llama & Gemini

Count tokens for ChatGPT, Claude, Llama, Gemini & Mistral instantly

Use this online LLM token counter to see exactly how many tokens your prompt becomes for any major language model. Counts are exact for OpenAI models (cl100k_base for GPT-4 / GPT-3.5 and o200k_base for GPT-4o, GPT-4.1, o1, o3 via tiktoken) and estimated for Claude, Llama 3, Gemini, and Mistral. Everything runs in your browser, nothing is uploaded.

From the blog May 25, 2026

PostgreSQL MVCC and Autovacuum Explained

How dead tuples cause table bloat, why default autovacuum settings fall behind on busy tables, and how to tune it for high-traffic production workloads

Read

Developer tools Latest posts Explainers

Your prompt or text:

Target model:

Loading tokenizer...

Tokens

cl100k_base

Characters

incl. whitespace

Words

whitespace split

Chars / Token

—

density

Try These Examples

Tokenizer Reference

How each model is counted

Model	Method	Vocab / Ratio	Notes
GPT-4o / GPT-4.1	Exact	`o200k_base`	200k token vocab, better on non-English text
GPT-4 / GPT-3.5	Exact	`cl100k_base`	100k token vocab, original ChatGPT tokenizer
o1 / o3	Exact	`o200k_base`	Same vocabulary as GPT-4o
Claude 3 / 3.5 / 4	Approx	`~3.5 chars/token`	No public tokenizer; ratio per Anthropic docs
Llama 3	Approx	`~3.8 chars/token`	SentencePiece, 128k vocab
Gemini 1.5 / 2	Approx	`~4.0 chars/token`	SentencePiece BPE, ~256k vocab
Mistral / Mixtral	Approx	`~3.7 chars/token`	SentencePiece, derived from Llama 2

Understanding LLM Tokens

What is a token?

A token is the unit of text a language model actually sees. Modern LLMs do not read characters or words. they read sequences of tokens drawn from a fixed vocabulary, typically built with byte-pair encoding (BPE) or SentencePiece.

A token can be a whole word ("the"), a fragment ("token", "ization"), a punctuation mark, or even a single byte. The same string is broken into different tokens depending on the model.

Why token counts matter

Context window: every model has a hard limit (e.g. 128k for GPT-4o, 200k for Claude). Exceeding it truncates or errors out.
Cost: APIs bill per million input and output tokens. A 10k-token prompt costs roughly 10x a 1k-token prompt.
Latency: longer inputs and outputs take longer. Counting tokens up front lets you predict response time.
Streaming: output token count tells you how big a response can be before you hit a stop condition.

Exact vs. approximate counts

OpenAI ships tiktoken as open source, so we can run the real tokenizer in your browser using a JavaScript port. The number you see for GPT-4o, GPT-4, GPT-3.5, o1, and o3 matches what the OpenAI API will charge you for.

Anthropic, Google, Meta, and Mistral do not ship browser-friendly tokenizers. For those models we use each provider's published average chars-per-token ratio. The estimate is typically within 5–10% of the real count, which is fine for budgeting and prompt design.

If you need a perfectly exact count for Claude, hit Anthropic's messages/count_tokens endpoint. For Llama / Mistral, run the model's tokenizer.json through Hugging Face's tokenizers library locally.

Tips for shorter prompts

Compress instructions: "Reply concisely" beats a 5-sentence preamble.
Strip code comments and trailing whitespace before sending source.
Use shorthand: JSON Lines or compact YAML beats prose for structured data.
Cache system prompts via the model's prompt-caching feature if available.
Switch tokenizers: non-English text is often half the tokens on GPT-4o (o200k_base) vs GPT-4 (cl100k_base).

LLM Token Counter FAQ

What is an LLM token?

A token is the smallest unit of text an LLM processes. It can be a whole word, part of a word, a punctuation mark, or even a single character. On average, 1 token ≈ 4 characters or ¾ of an English word, but the exact split depends on the model's tokenizer.

How accurate is this token counter?

For OpenAI models the count is exact, the tool runs the real tiktoken vocabulary in your browser. For Claude, Llama, Gemini, and Mistral, we estimate using each provider's published average chars-per-token ratio. Estimates are typically within 5–10% of the true count.

Why does the same text produce different counts on different models?

Each model has its own tokenizer trained on its own vocabulary. GPT-4o's o200k_base is much more efficient on non-English text than GPT-4's cl100k_base. The same paragraph can easily differ by 20–40% across models.

Does this tool send my prompt to a server?

No. Everything runs in your browser. Your prompt never leaves your device, which makes this tool safe for confidential or sensitive text.

How do I estimate API cost?

Multiply input tokens by the model's per-million input price, and expected output tokens by the per-million output price. Example: GPT-4o at $2.50 per 1M input tokens means a 1,000-token prompt costs $0.0025.

What is tiktoken?

Tiktoken is OpenAI's open-source BPE tokenizer. cl100k_base powers GPT-3.5 and GPT-4; o200k_base powers GPT-4o and the o1 / o3 reasoning models. This tool ships a JavaScript port so you get the same numbers as the OpenAI API.

Does Anthropic publish a Claude tokenizer?

No downloadable tokenizer is published for Claude 3 / 3.5 / Sonnet 4. For exact counts, use Anthropic's messages/count_tokens endpoint. For browser-based estimation, Anthropic recommends ~3.5 characters per token.

Why are non-English prompts more expensive?

Most tokenizer vocabularies are English-heavy. Non-Latin scripts and accented characters often split into more tokens. o200k_base partially fixes this for major languages, but it is worth checking before sending non-English prompts at scale.

Token Visualization