[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/papers/acl2025.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "ACL 2025 Paper Notes" type: paper-collection source-count: 11 status: active --- # ACL 2025 Paper Notes 11 papers reviewed. ### ACL2025 Huang: Making in Multi-Hop QA Question: can we find a good context permutation to improve reasoning capabilities. One-Liner Notable Methods Two key evaluations: evalutanig relationships between gold documents; notice that performance relates to distance between documents (but FTing helps) investigate the effects between different attention masks (i.e., the use of prefix vs continuation masks) IC Score attention-based context attribution method New Concepts Key insight: correct answers will have single peak of IC scores at go... *[Full note](https://www.jemoka.com/posts/kbhacl2025_huang_making_in_multi_hop_qa/)* ### ACL2025 Index Talks ACL2025 Keynote: Luke Zettemoyer ACL2025 Orals: Language Modeling 1 ACL2025 Orals: QA Posters ACL2025 Monday Morning Posters ACL2025 Tuesday Morning Posters ACL2025 Tuesday Afternoon Posters Takes mayhaps we can apply thoughtbubbbles intutiton to BLT token pruning? *[Full note](https://www.jemoka.com/posts/kbhacl2025_index/)* ### ACL2025 Keynote: Luke Zettemoyer Naively: “almost everything comes from pretraining.” How much simple supervision will it radically change the behavior of our language model. Key Directions data long-tail: tokenizer free LLMs data modules: how to we specialize quickly? Tokenizer-Free LM Byte-Level LMs are just more expensive (i.e., there is just a bunch more residual streams! and that’s pretty bad). High level intution: takes the input bytes, create some “strides”/“patches”, and then se... *[Full note](https://www.jemoka.com/posts/kbhacl2025_keynote/)* ### ACL2025 Li: TokAlign Token Alignment Method to adapt tokenization across models. Notable Methods use pairwise cosine similarity between token embeddings to create a grid of alignment initialize new adapted embeddings for each id’s most similar tokens tune *[Full note](https://www.jemoka.com/posts/kbhacl2025_li/)* ### ACL2025 Monday Morning Posters ACL2025 Zhang: FaithfulRAG: Fact level conflict modeling Key insight: RAG performance degrades wen model has context and parametric knowledge mismatch, identifying those and use three step iterative method to improve context faithfulness. ACL2025 Ding: LLM reasoning capability via scalable question synthesis Key insight: generate free-from questions conditioned only in BOS, then distill and DPO to get a nice question generation dataset and directly fine tune ACL2025 Wen: synthetic data strategy ... *[Full note](https://www.jemoka.com/posts/kbhacl2025_monday_morning_posters/)* ### ACL2025 Orals: Efficient NLP *[Full note](https://www.jemoka.com/posts/kbhacl2025_orals_efficient_nlp/)* ### ACL2025 Orals: Language Modeling 1 ACL2025 Li: TokAlign Token Alignment ACL2025 Pagoni: Patches Scale Better Than Tokens *[Full note](https://www.jemoka.com/posts/kbhacl2025_orals_language_modeling_1/)* ### ACL2025 Orals: QA ACL2025 Huang: Making in Multi-Hop QA *[Full note](https://www.jemoka.com/posts/kbhacl2025_orals/)* ### ACL2025 Pagoni: Patches Scale Better Than Tokens One-Liner “Patches in groups of tokenization scale better than tokens” Motivation / Novelty typical byte-level LMs don’t are very expensive because many tokens its hard to go beyond 4-6 bytes per token: Zipf’s Law so, we model them as token patches Notable Methods token patch “how do we segment the byte sequence into patches?” — insight: group predicable tokens after every hard choice! i.e., once you train a model, there are “obvious” patcher... *[Full note](https://www.jemoka.com/posts/kbhacl2025_pagoni/)* ### ACL2025 Tuesday Morning Posters ACL2025 Katz: segment based attention masking Key insight: allow by directional attention ACL2025 Monodorf: exploring modular sturctures transformer based language models Key insight: learn circuit compositions by learning a binary mask for both faithfulness and scarcity ACL2025 Li: some more samples of next token prediction Key insight: when there’s a high difference between generation probability and ground truth, those samples when intervene will cause a more dramatic effect ACL2025 Kim... *[Full note](https://www.jemoka.com/posts/kbhacl2025_tuesday_morning_posters/)* ### ACL2025 Workshop: Web Agents *[Full note](https://www.jemoka.com/posts/kbhacl2025_workshop_web_agents/)*