Tag

MoE

1 post tagged MoE.

May 22, 202613 min read

Switch Transformers: What the Sparse MoE Scaling Paper Actually Says

Every modern large model — Mixtral, DeepSeek, Gemini — routes tokens through sparse experts. The design decisions in all of them trace back to one 2021 Google Brain paper. The paper is worth reading because the failure modes live in the routing logic, not the math.

AISystemsProductionLLMsTrainingMoE