Can a crypto mining GPU be used for local LLM inference?

Yes. GPUs that mined Ethereum Classic, Ravencoin, ERGO, or similar algorithms are the same NVIDIA/AMD cards used for local LLM inference. The limiting factors are VRAM size, FP16/FP8 compute, and memory bandwidth.

How do you convert GPU hashrate to LLM tokens per second?

Mining hashrate is not directly equivalent to inference throughput, but you can estimate FP16 compute using the algorithm's approximate efficiency and the card's known performance, then divide by a model's tokens-per-teraflop ratio to get a rough tok/s figure.

Which mining GPUs are best for local AI inference?

RTX 3090/3090 Ti and RTX 4090 are popular because they combine high FP16 throughput with 24 GB VRAM, enough for a 70B parameter model at Q4 quantization.

Why does inference speed depend on more than teraflops?

LLM token generation is memory-bandwidth bound for large models and compute-bound for small batches. Real tok/s also depends on quantization, context length, batch size, and software stack.

⛏️

Hashrate to Inference Converter

Turn your crypto-mining hashrate into an estimated local LLM inference capacity. See which models fit and how many tokens/sec you might generate.

Mining setup

Hashrate

Mining algorithm

Ethereum Classic, ETH historical. Memory-bound; flops per hash is low.

Power efficiency (J/MH)

Typical range: 0.18-0.7 J/MH

Electricity ($/kWh)

Inference utilization (% of peak compute)

Real inference rarely uses 100% of peak FP16.

Available VRAM (GB)

Estimated peak FP16 compute

—

Inference power

—

Power cost / month

—

Best-fit model

—

Estimated tok/s by model

Throughput depends on quantization, context length, and batch size. Treat these as rough directional estimates.

Model	VRAM needed	Est. tok/s	Fit
Llama 3.1 8B Q4 Fast local chat	6.5 GB	—	—
Llama 3.1 70B Q4 High-capability agent	42 GB	—	—
Qwen2.5 14B Q4 Balanced coding assistant	10 GB	—	—
DeepSeek-V3 / R1 Q4 (MoE) Reasoning / coding heavy	75 GB	—	—
Mistral Small 22B Q4 Agentic reasoning	15 GB	—	—

How the estimate works

Hashrate → flops: we use the algorithm's approximate MH/s : GFLOPS ratio to back out compute.
Utilization: real inference uses 40-90% of peak FP16 depending on batch size and memory bandwidth.
VRAM fit: only models that fit in your available VRAM are marked green.
Power: inferred from efficiency and hashrate, then costed at your electricity rate.

Reality checks

• Mining and inference stress different parts of the card. A mined card may have degraded memory.
• Large models are memory-bandwidth bound; teraflops alone overstate speed.
• Batch processing and prompt prefill can swing real tok/s by 2-5×.
• Always verify VRAM headroom; quantization tables are approximate.

Miner Pivot Calculator → VRAM Calculator → LLM Cost Calculator →

Hashrate to Inference Converter

RTX 4090

RTX 4080 Super

RTX 3090 Ti

RTX 3090

RTX 4070 Ti Super

RX 7900 XTX

RX 6950 XT

RTX 3090 x4 rig

Mining setup

Estimated tok/s by model

How the estimate works

Reality checks

Hashrate to Inference Converter

RTX 4090

RTX 4080 Super

RTX 3090 Ti

RTX 3090

RTX 4070 Ti Super

RX 7900 XTX

RX 6950 XT

RTX 3090 x4 rig

Mining setup

Estimated tok/s by model

How the estimate works

Reality checks

Wait — Don't Miss Tomorrow's Dispatch