[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/paper/iclr2025/kbhiclr2025_wu_retrieval_head_explains_long_context.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "ICLR2025 Wu: Retrieval Head Explains Long Context" source: https://www.jemoka.com/posts/kbhiclr2025_wu_retrieval_head_explains_long_context/ --- Motivation Previous works contain “heads” that perform some specific mechanism from context retrieval. Retrieval Head Authors shows that Retrieval Heads exist in transformers: using Needle in a Haystack framework. Key Insight There exists certain heads which performs retrieval, as measured by the retrieval score. Methods Measuring Retrieval Behavior “retrieval score”: how often does a head engage in copy-paste behavior. token inclusion: current generated token \(w\) is in the edle maximal attention: same token gives the maximum attenion score