[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/paper/iclr2025/kbhiclr2025_jin_moe_zero_computation_experts.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "ICLR2025 Jin: MOE++ zero computation experts" source: https://www.jemoka.com/posts/kbhiclr2025_jin_moe_zero_computation_experts/ --- Motivation A fixed amount of experts is activated per task. Key Insight MoE++ allows the amount of expert distribution to be adaptive. Method Three key contributions: zero-computation experts: discarding input \(E\qty(x) = 0\), copy input \(E\qty(x) = x\) (“skip”), const \(E(x) = \alpha_{a} x +\alpha_{b} v_{\theta}\) (plus normallFFN experts) pathway-aware router (with additional loss augmentation where we learn a \(\tau_{\theta}\) to decide something else I missed zero-computation experts simple to handle easy tokens quickly new experts is relatively low cost