[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/paper/iclr2025/kbhiclr2025_li_moe_is_secretly_an_embedding.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "ICLR2025 Li: MoE is secretly an embedding" source: https://www.jemoka.com/posts/kbhiclr2025_li_moe_is_secretly_an_embedding/ --- motivation Can we directly extract embeddings from MoE forwarding routing weights (i.e., compared to traditional residual stream information)? Key Insight Using residual states vs. forwarding weights as semantic searc embeddings offer complementary strengths (i.e., when one method fails, the other one succeeds more) Method Create an aggregate embedding: \begin{equation} E_{j} = X_{j} + \alpha W_{j} \end{equation} where \(W_{j}\) is the routing weight of the residual, and \(X_{j}\) is the residual.