LUMINAL
Benchmarks
Blog
Inference Benchmarks
GPU Benchmark Reports
Performance reports for popular open-source models on production GPU hardware.
MiniMaxAI/MiniMax-M2.5
8xH200
456B MoE, 45.9B active
145.2 tok/s
Peak Decode Speed
deepseek-ai/DeepSeek-R1-0528
8xH200
671B MoE, 37B active
93.3 tok/s
Peak Decode Speed
moonshotai/Kimi-K2.5
8xH200
1T MoE, 32B active
115.1 tok/s
Peak Decode Speed
New
Qwen/Qwen3.5-35B-A3B-FP8
1xH100
35B MoE, 3B active, FP8
172.5 tok/s
Peak Decode Speed
Qwen/Qwen3-Coder-30B-A3B-Instruct
NVIDIA H100 PCIe
30B MoE, 3B active
152.9 tok/s
Peak Decode Speed