⛏️
Hashrate to Inference Converter
Turn your crypto-mining hashrate into an estimated local LLM inference capacity. See which models fit and how many tokens/sec you might generate.
Mining setup
Ethereum Classic, ETH historical. Memory-bound; flops per hash is low.
Typical range: 0.18-0.7 J/MH
Real inference rarely uses 100% of peak FP16.
Estimated peak FP16 compute
—
Inference power
—
Power cost / month
—
Best-fit model
—
—
Estimated tok/s by model
Throughput depends on quantization, context length, and batch size. Treat these as rough directional estimates.
| Model | VRAM needed | Est. tok/s | Fit |
|---|---|---|---|
| Llama 3.1 8B Q4 Fast local chat | 6.5 GB | — | — |
| Llama 3.1 70B Q4 High-capability agent | 42 GB | — | — |
| Qwen2.5 14B Q4 Balanced coding assistant | 10 GB | — | — |
| DeepSeek-V3 / R1 Q4 (MoE) Reasoning / coding heavy | 75 GB | — | — |
| Mistral Small 22B Q4 Agentic reasoning | 15 GB | — | — |
How the estimate works
- Hashrate → flops: we use the algorithm's approximate MH/s : GFLOPS ratio to back out compute.
- Utilization: real inference uses 40-90% of peak FP16 depending on batch size and memory bandwidth.
- VRAM fit: only models that fit in your available VRAM are marked green.
- Power: inferred from efficiency and hashrate, then costed at your electricity rate.
Reality checks
- • Mining and inference stress different parts of the card. A mined card may have degraded memory.
- • Large models are memory-bandwidth bound; teraflops alone overstate speed.
- • Batch processing and prompt prefill can swing real tok/s by 2-5×.
- • Always verify VRAM headroom; quantization tables are approximate.