💰

LLM Cost Calculator

Compare token costs across cloud APIs and local GPU inference. See when it pays to own the hardware.

Provider / Model Input/$1M Output/$1M Monthly cost Context Note
OpenAI GPT-4o $2.50 $10.00 128K Best general-purpose API
OpenAI GPT-4o mini $0.15 $0.60 128K Cheapest OpenAI model
Anthropic Claude 3.5 Sonnet $3.00 $15.00 200K Best coding / reasoning
Anthropic Claude 3 Haiku $0.25 $1.25 200K Fast Anthropic option
Google Gemini 1.5 Flash $0.07 $0.30 1000K Lowest API cost, huge context
Google Gemini 1.5 Pro $1.25 $5.00 2000K Largest context window
Groq Llama 3 70B $0.59 $0.79 8K Fastest API inference
Groq Mixtral 8x7B $0.27 $0.27 32K Fast and cheap
Local RTX 4090 (Llama 3 8B) $0.00 $0.00 8K Hardware + electricity only
Local RTX 4090 (Llama 3 70B) $0.00 $0.00 8K Requires quantization

When does local win?

At your selected volume, cloud APIs cost roughly /month.

A local RTX 4090 setup breaks even in about months if electricity is $0.12/kWh and the GPU is used 40% of the time.

Local inference avoids per-token pricing but requires upfront hardware, maintenance, and uptime. Use this calculator to find your crossover point.

🚀 Get AI automation insights daily

15:00 MST. One-click unsubscribe.

Subscribe