LLM Token Counter for GPT, Claude, Llama & Gemini

Count tokens for ChatGPT, Claude, Llama, Gemini & Mistral instantly

Use this online LLM token counter to see exactly how many tokens your prompt becomes for any major language model. Counts are exact for OpenAI models (cl100k_base for GPT-4 / GPT-3.5 and o200k_base for GPT-4o, GPT-4.1, o1, o3 via tiktoken) and estimated for Claude, Llama 3, Gemini, and Mistral. Everything runs in your browser, nothing is uploaded.

Loading tokenizer...
Tokens
0
cl100k_base
Characters
0
incl. whitespace
Words
0
whitespace split
Chars / Token
density

Try These Examples

Tokenizer Reference

How each model is counted

Model Method Vocab / Ratio Notes
GPT-4o / GPT-4.1 Exact o200k_base 200k token vocab, better on non-English text
GPT-4 / GPT-3.5 Exact cl100k_base 100k token vocab, original ChatGPT tokenizer
o1 / o3 Exact o200k_base Same vocabulary as GPT-4o
Claude 3 / 3.5 / 4 Approx ~3.5 chars/token No public tokenizer; ratio per Anthropic docs
Llama 3 Approx ~3.8 chars/token SentencePiece, 128k vocab
Gemini 1.5 / 2 Approx ~4.0 chars/token SentencePiece BPE, ~256k vocab
Mistral / Mixtral Approx ~3.7 chars/token SentencePiece, derived from Llama 2

Understanding LLM Tokens

What is a token?

A token is the unit of text a language model actually sees. Modern LLMs do not read characters or words. they read sequences of tokens drawn from a fixed vocabulary, typically built with byte-pair encoding (BPE) or SentencePiece.

A token can be a whole word ("the"), a fragment ("token", "ization"), a punctuation mark, or even a single byte. The same string is broken into different tokens depending on the model.

Why token counts matter

  • Context window: every model has a hard limit (e.g. 128k for GPT-4o, 200k for Claude). Exceeding it truncates or errors out.
  • Cost: APIs bill per million input and output tokens. A 10k-token prompt costs roughly 10x a 1k-token prompt.
  • Latency: longer inputs and outputs take longer. Counting tokens up front lets you predict response time.
  • Streaming: output token count tells you how big a response can be before you hit a stop condition.

Exact vs. approximate counts

OpenAI ships tiktoken as open source, so we can run the real tokenizer in your browser using a JavaScript port. The number you see for GPT-4o, GPT-4, GPT-3.5, o1, and o3 matches what the OpenAI API will charge you for.

Anthropic, Google, Meta, and Mistral do not ship browser-friendly tokenizers. For those models we use each provider's published average chars-per-token ratio. The estimate is typically within 5–10% of the real count, which is fine for budgeting and prompt design.

If you need a perfectly exact count for Claude, hit Anthropic's messages/count_tokens endpoint. For Llama / Mistral, run the model's tokenizer.json through Hugging Face's tokenizers library locally.

Tips for shorter prompts

  • Compress instructions: "Reply concisely" beats a 5-sentence preamble.
  • Strip code comments and trailing whitespace before sending source.
  • Use shorthand: JSON Lines or compact YAML beats prose for structured data.
  • Cache system prompts via the model's prompt-caching feature if available.
  • Switch tokenizers: non-English text is often half the tokens on GPT-4o (o200k_base) vs GPT-4 (cl100k_base).

LLM Token Counter FAQ

What is an LLM token?

A token is the smallest unit of text an LLM processes. It can be a whole word, part of a word, a punctuation mark, or even a single character. On average, 1 token ≈ 4 characters or ¾ of an English word, but the exact split depends on the model's tokenizer.

How accurate is this token counter?

For OpenAI models the count is exact, the tool runs the real tiktoken vocabulary in your browser. For Claude, Llama, Gemini, and Mistral, we estimate using each provider's published average chars-per-token ratio. Estimates are typically within 5–10% of the true count.

Why does the same text produce different counts on different models?

Each model has its own tokenizer trained on its own vocabulary. GPT-4o's o200k_base is much more efficient on non-English text than GPT-4's cl100k_base. The same paragraph can easily differ by 20–40% across models.

Does this tool send my prompt to a server?

No. Everything runs in your browser. Your prompt never leaves your device, which makes this tool safe for confidential or sensitive text.

How do I estimate API cost?

Multiply input tokens by the model's per-million input price, and expected output tokens by the per-million output price. Example: GPT-4o at $2.50 per 1M input tokens means a 1,000-token prompt costs $0.0025.

What is tiktoken?

Tiktoken is OpenAI's open-source BPE tokenizer. cl100k_base powers GPT-3.5 and GPT-4; o200k_base powers GPT-4o and the o1 / o3 reasoning models. This tool ships a JavaScript port so you get the same numbers as the OpenAI API.

Does Anthropic publish a Claude tokenizer?

No downloadable tokenizer is published for Claude 3 / 3.5 / Sonnet 4. For exact counts, use Anthropic's messages/count_tokens endpoint. For browser-based estimation, Anthropic recommends ~3.5 characters per token.

Why are non-English prompts more expensive?

Most tokenizer vocabularies are English-heavy. Non-Latin scripts and accented characters often split into more tokens. o200k_base partially fixes this for major languages, but it is worth checking before sending non-English prompts at scale.