AI Agent API Pricing Comparator
Compare LLM API costs for your agent workload. Pick providers, set your token volume, and see the cheapest option ranked by monthly cost.
Workload
Prices per 1M tokens. Last updated: 2026-06-29. See notes.
| Provider / Model | Input | Cached input | Output | Monthly cost | Best for |
|---|---|---|---|---|---|
| OpenAI GPT-4o gpt-4o | $2.50 | $1.250 | $10.00 | — | General agent tasks, vision, tool use |
| OpenAI GPT-4o-mini gpt-4o-mini | $0.15 | $0.075 | $0.60 | — | Cheap high-volume agents |
| Anthropic Claude 3.5 Sonnet claude-3-5-sonnet-20240620 | $3.00 | $3.750 | $15.00 | — | Long context, reasoning, coding |
| Anthropic Claude 3 Haiku claude-3-haiku-20240307 | $0.25 | $0.030 | $1.25 | — | Fast, cheap agent calls |
| Google Gemini 1.5 Pro gemini-1.5-pro | $3.50 | — | $10.50 | — | Massive context, multimodal |
| Google Gemini 1.5 Flash gemini-1.5-flash | $0.35 | — | $1.05 | — | Low-latency, high-volume |
| Groq Llama 3.1 70B llama-3.1-70b-versatile | $0.59 | — | $0.79 | — | Fast open-weight inference |
| Together AI Llama 3.1 70B llama-3.1-70b | $0.90 | — | $0.90 | — | Open-weight fine-tuning hub |
| Cohere Command R+ command-r-plus | $2.50 | — | $10.00 | — | RAG and enterprise search |
Cheapest option
Enter a workload to see the cheapest provider.
Frequently asked questions
Which LLM API is cheapest for high-volume agents?
For most high-volume text agents, GPT-4o-mini, Claude 3 Haiku, and Gemini 1.5 Flash are the cheapest per 1M tokens. Groq Llama 3.1 70B is competitive for fast open-weight inference. Use this calculator with your actual token counts to find the cheapest provider for your workload.
How does cached input pricing work?
OpenAI and Anthropic discount input tokens that hit a context cache. Enter your typical cache-hit percentage and the calculator applies the cached-input rate to that portion while billing the rest at the standard input rate.
What is not included in this estimate?
This estimate covers token and request fees only. It does not include image/audio tokens, fine-tuning, storage, bandwidth, vector DB, or fallback/retry overhead. Use the LLM Cost Calculator and Agent Memory Cost Calculator for full agent TCO.
How often are prices updated?
Prices are reviewed monthly. Last updated: 2026-06-29. Always confirm current rates on the provider's pricing page before committing to a budget.
Prices are per 1M tokens in USD. Check provider sites for current rates.