Mixture‑of‑Experts (MoE) Model Speed Calculator

Use this tool to estimate the theoretical per‑token throughput of large Mixture‑of‑Experts models on a two‑GPU setup. Fill in the hardware characteristics and model details below, or choose a predefined model from the drop‑down to prefill the fields. Real life performance is lower, typically around half of the theoretical max performance. All calculations assume memory bandwidth is the limiting factor, not the compute capability; this may not be accurate for certain models (like gpt-oss on GPUs that do not natively support FP4) or certain GPUs (like most non-NVIDIA GPUs or any GPU without tensor cores).

This calculator assumes that the first GPU is reserved for the dense parameters and kv cache (context). The second GPU is used for the MoE parameters. PCIe speed is mostly irrelevant if weights are not being transferred between devices.

Set the 2nd GPU bandwidth to the same as system RAM if you want to ignore the 2nd GPU. Set both 1st and 2nd GPU bandwidth to 0 if you want to estimate RAM-only speed.

GPU Parameters

GPU 1 preset Single GPU

GPU 2 preset

GPU 1 VRAM capacity (GB)

GPU 2 VRAM capacity (GB)

GPU 1 VRAM bandwidth (GB/s)

GPU 2 VRAM bandwidth (GB/s)

System RAM bandwidth (GB/s)

Model Parameters

Choose Model:

Model total size (parameters)

Active dense parameters per token

Active MoE parameters per token

KV cache size (GB)

Quantization level