How to Build a Local LLM Rig
Step-by-step guide to building a local LLM rig: GPU, CPU, RAM, SSD, power supply, and software setup for running large language models at home.
Choose your GPU
For local LLMs, VRAM is the bottleneck. The NVIDIA RTX 4090 has 24GB and is the best consumer option. The AMD RX 7900 XTX offers 24GB at a lower price but has weaker CUDA ecosystem support.
Verdict: For most builders, start with the RTX 4090 for compatibility and ecosystem.
Select CPU, RAM, and storage
Pair the GPU with a modern 8-core+ CPU, 64GB DDR5 RAM, and a 2TB NVMe SSD. LLM model files are large, so fast storage matters for loading and context caching.
Assemble the rig
Install CPU and RAM on the motherboard, mount the NVMe SSD, install the GPU in the top PCIe slot, and connect the PSU cables. Ensure adequate airflow; these GPUs pull 450W+ under load.
Install software
Install Linux or Windows, then install Ollama, LM Studio, or llama.cpp. Download a quantized model like Llama 3 70B Q4 or Mixtral 8x7B Q4 to fit within 24GB VRAM.
Benchmark and deploy
Run inference benchmarks, measure tokens per second, and expose the model via API for agent workflows or chat interfaces.
Compare AI Hardware →