Inference Benchmarks

GPU Benchmark Reports

Performance reports for popular open-source models on production GPU hardware.

MiniMaxAI/MiniMax-M2.5

8xH200456B MoE, 45.9B active

145.2 tok/s

Peak Decode Speed

deepseek-ai/DeepSeek-R1-0528

8xH200671B MoE, 37B active

93.3 tok/s

Peak Decode Speed

moonshotai/Kimi-K2.5

8xH2001T MoE, 32B active

115.1 tok/s

Peak Decode Speed

Qwen/Qwen3.5-35B-A3B-FP8

1xH10035B MoE, 3B active, FP8

172.5 tok/s

Peak Decode Speed

Qwen/Qwen3-Coder-30B-A3B-Instruct

NVIDIA H100 PCIe30B MoE, 3B active

152.9 tok/s

Peak Decode Speed