The road to MoE
◀ Prev | 2025-08-01, access: $$$ Pro | Next ▶
model-intro text DeepSeek MoE General coverage of the "Mixture of Experts" (MoE) technique, and specific details of DeepSeek's "fine-grained expert segmentation" and "shared expert isolation" enhancements to it, as well as some load-balancing tricks, all of which went into their recently-notable model.