💰
LLM Cost Calculator
Compare token costs across cloud APIs and local GPU inference. See when it pays to own the hardware.
| Provider / Model | Input/$1M | Output/$1M | Monthly cost | Context | Note |
|---|---|---|---|---|---|
| OpenAI GPT-4o | $2.50 | $10.00 | — | 128K | Best general-purpose API |
| OpenAI GPT-4o mini | $0.15 | $0.60 | — | 128K | Cheapest OpenAI model |
| Anthropic Claude 3.5 Sonnet | $3.00 | $15.00 | — | 200K | Best coding / reasoning |
| Anthropic Claude 3 Haiku | $0.25 | $1.25 | — | 200K | Fast Anthropic option |
| Google Gemini 1.5 Flash | $0.07 | $0.30 | — | 1000K | Lowest API cost, huge context |
| Google Gemini 1.5 Pro | $1.25 | $5.00 | — | 2000K | Largest context window |
| Groq Llama 3 70B | $0.59 | $0.79 | — | 8K | Fastest API inference |
| Groq Mixtral 8x7B | $0.27 | $0.27 | — | 32K | Fast and cheap |
| Local RTX 4090 (Llama 3 8B) | $0.00 | $0.00 | — | 8K | Hardware + electricity only |
| Local RTX 4090 (Llama 3 70B) | $0.00 | $0.00 | — | 8K | Requires quantization |
When does local win?
At your selected volume, cloud APIs cost roughly —/month.
A local RTX 4090 setup breaks even in about — months if electricity is $0.12/kWh and the GPU is used 40% of the time.
Local inference avoids per-token pricing but requires upfront hardware, maintenance, and uptime. Use this calculator to find your crossover point.