Inference Benchmarks

GPU Benchmark Reports

Performance reports for popular open-source models on production GPU hardware.

MiniMaxAI/MiniMax-M2.5

8xH200456B MoE, 45.9B active
145.2 tok/s
Peak Decode Speed

deepseek-ai/DeepSeek-R1-0528

8xH200671B MoE, 37B active
93.3 tok/s
Peak Decode Speed

moonshotai/Kimi-K2.5

8xH2001T MoE, 32B active
115.1 tok/s
Peak Decode Speed
New

Qwen/Qwen3.5-35B-A3B-FP8

1xH10035B MoE, 3B active, FP8
172.5 tok/s
Peak Decode Speed

Qwen/Qwen3-Coder-30B-A3B-Instruct

NVIDIA H100 PCIe30B MoE, 3B active
152.9 tok/s
Peak Decode Speed