Tag: DeepSeek
Table lookups again, with Engram
2026-03-30 DeepSeek MoE RAG text Popular techniques in language modelling, including RAG, MoE, and attention itself, amount to replacing as much as possible of a neural network model with different kinds of table lookups. In this recent paper from DeepSeek's research group, they attempt another such replacement: shifting factual knowledge out of the model weights as such, into a separate hash table. Access: $ Basic
The road to MoE
2025-08-01 model-intro text DeepSeek MoE General coverage of the "Mixture of Experts" (MoE) technique, and specific details of DeepSeek's "fine-grained expert segmentation" and "shared expert isolation" enhancements to it, as well as some load-balancing tricks, all of which went into their recently-notable model. Access: $$$ Pro