THE RIG — COMPARISON

Best Local LLM Hardware

Mini PCs vs. GPUs for running Ollama and local AI agents. Benchmarked on llama3.2 3B Q4_K_M.

Agent Score (0–100):

A practical rating for local-agent inference. 80–100 means desktop-class VRAM and fast CUDA/ROCm throughput. 40–60 means usable for light agents but not for large models or long contexts.

Product	Type	Price	Agent Score	Best For	Action
Geekom A8	mini-pc	$899	52 tok/s	Entry to mid-range local inference	View →
Beelink SER7 Pro	mini-pc	$649	42 tok/s	Entry to mid-range local inference	View →
Intel NUC 13 Pro	mini-pc	$$480	tok/s	Entry to mid-range local inference	View →
NVIDIA RTX 4090 NVIDIA	gpu	$1599	95 tok/s	Large models / maximum speed	View →
AMD Radeon RX 7900 XTX	gpu	$999	78 tok/s	Large models / maximum speed	View →
AMD RX 7900 XT	gpu	$$899	tok/s	Large models / maximum speed	View →
NVIDIA RTX 4080 SUPER	gpu	$$999	tok/s	Large models / maximum speed	View →

FAQ

What is the best mini PC for running local LLMs?

The Beelink SER7 Pro is the best value for most users, offering ~42 tok/s on llama3.2 3B. For more headroom, the Geekom A8 with Ryzen 9 8945HS reaches ~52 tok/s.

Do I need a dedicated GPU for local AI inference?

No. Modern AMD Ryzen mini PCs with RDNA 3 integrated graphics can run models up to 7B-13B parameters. A dedicated GPU like the RTX 4090 is only needed for large models or maximum speed.

How much RAM do I need for Ollama?

32GB is the sweet spot for 7B models. 64GB lets you run 13B models comfortably or multiple smaller models at once.

Best Local LLM Hardware

FAQ

Wait — Don't Miss Tomorrow's Dispatch