[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/paper/emnlp2025/kbhemnlp2025_eo_expert_generalization_in_moe.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "EMNLP2025 Eo: Expert Generalization in MoE in IFT" source: https://www.jemoka.com/posts/kbhemnlp2025_eo_expert_generalization_in_moe/ --- One-Liner cluster the input, activate a seperate expert group for cluster target. Motivation heterogeneity of input instruction tuning data poses difficulty for MoE routing only operates at token level, so can’t deal with sequence level generalization Novelty Architecture to enable hierarchical expert routing. Notable Methods Mixure of Clustered Experts Mixture of Clustered Experts Dual-stage routing mechanism. group the \(M\) experts into groups of \(N\) expert (i.e. \(M = \qty(N, \dots, N)\) k-means clustering the sequence embedding at input given the assigned cluster, only route to the assigned subgroup Results outperforms MoE baselines demonstrate expert group specialization