Is it cheaper to run LLMs locally or use an API?

For low volume, APIs are cheaper. Above roughly 10-50 million tokens per month, a local GPU like an RTX 4090 can beat API pricing, especially for smaller models.

Which LLM API has the lowest cost per 1M tokens?

Google Gemini 1.5 Flash is currently the cheapest major API at $0.075 per 1M input tokens, followed by Groq Mixtral 8x7B at $0.27 per 1M tokens.

💰

LLM Cost Calculator

Compare token costs across cloud APIs and local GPU inference. See when it pays to own the hardware.

Input tokens per call

Output tokens per call

Calls per month

Provider / Model	Input/$1M	Output/$1M	Monthly cost	Context	Note
OpenAI GPT-4o	$2.50	$10.00	—	128K	Best general-purpose API
OpenAI GPT-4o mini	$0.15	$0.60	—	128K	Cheapest OpenAI model
Anthropic Claude 3.5 Sonnet	$3.00	$15.00	—	200K	Best coding / reasoning
Anthropic Claude 3 Haiku	$0.25	$1.25	—	200K	Fast Anthropic option
Google Gemini 1.5 Flash	$0.07	$0.30	—	1000K	Lowest API cost, huge context
Google Gemini 1.5 Pro	$1.25	$5.00	—	2000K	Largest context window
Groq Llama 3 70B	$0.59	$0.79	—	8K	Fastest API inference
Groq Mixtral 8x7B	$0.27	$0.27	—	32K	Fast and cheap
Local RTX 4090 (Llama 3 8B)	$0.00	$0.00	—	8K	Hardware + electricity only
Local RTX 4090 (Llama 3 70B)	$0.00	$0.00	—	8K	Requires quantization

When does local win?

At your selected volume, cloud APIs cost roughly —/month.

A local RTX 4090 setup breaks even in about — months if electricity is $0.12/kWh and the GPU is used 40% of the time.

Local inference avoids per-token pricing but requires upfront hardware, maintenance, and uptime. Use this calculator to find your crossover point.

LLM Cost Calculator

When does local win?

Wait — Don't Miss Tomorrow's Dispatch