Suggest edit — ACL 2025 Paper Notes

Title

Name

Note

---
title: "ACL 2025 Paper Notes"
type: paper-collection
source-count: 11
status: active
---

# ACL 2025 Paper Notes

11 papers reviewed.

### ACL2025 Huang: Making in Multi-Hop QA

Question: can we find a good context permutation to improve reasoning capabilities.
One-Liner Notable Methods Two key evaluations:
evalutanig relationships between gold documents; notice that performance relates to distance between documents (but FTing helps) investigate the effects between different attention masks (i.e., the use of prefix vs continuation masks) IC Score attention-based context attribution method
New Concepts Key insight: correct answers will have single peak of IC scores at go...

*[Full note](https://www.jemoka.com/posts/kbhacl2025_huang_making_in_multi_hop_qa/)*

### ACL2025 Index

Talks ACL2025 Keynote: Luke Zettemoyer ACL2025 Orals: Language Modeling 1 ACL2025 Orals: QA Posters ACL2025 Monday Morning Posters ACL2025 Tuesday Morning Posters ACL2025 Tuesday Afternoon Posters Takes mayhaps we can apply thoughtbubbbles intutiton to BLT token pruning?

*[Full note](https://www.jemoka.com/posts/kbhacl2025_index/)*

### ACL2025 Keynote: Luke Zettemoyer

Naively: &ldquo;almost everything comes from pretraining.&rdquo; How much simple supervision will it radically change the behavior of our language model.
Key Directions data long-tail: tokenizer free LLMs data modules: how to we specialize quickly? Tokenizer-Free LM Byte-Level LMs are just more expensive (i.e., there is just a bunch more residual streams! and that&rsquo;s pretty bad). High level intution: takes the input bytes, create some &ldquo;strides&rdquo;/&ldquo;patches&rdquo;, and then se...

*[Full note](https://www.jemoka.com/posts/kbhacl2025_keynote/)*

### ACL2025 Li: TokAlign Token Alignment

Method to adapt tokenization across models.
Notable Methods use pairwise cosine similarity between token embeddings to create a grid of alignment initialize new adapted embeddings for each id&rsquo;s most similar tokens tune

*[Full note](https://www.jemoka.com/posts/kbhacl2025_li/)*

### ACL2025 Monday Morning Posters

ACL2025 Zhang: FaithfulRAG: Fact level conflict modeling Key insight: RAG performance degrades wen model has context and parametric knowledge mismatch, identifying those and use three step iterative method to improve context faithfulness.
ACL2025 Ding: LLM reasoning capability via scalable question synthesis Key insight: generate free-from questions conditioned only in BOS, then distill and DPO to get a nice question generation dataset and directly fine tune
ACL2025 Wen: synthetic data strategy ...

*[Full note](https://www.jemoka.com/posts/kbhacl2025_monday_morning_posters/)*

### ACL2025 Orals: Efficient NLP

*[Full note](https://www.jemoka.com/posts/kbhacl2025_orals_efficient_nlp/)*

### ACL2025 Orals: Language Modeling 1

ACL2025 Li: TokAlign Token Alignment ACL2025 Pagoni: Patches Scale Better Than Tokens

*[Full note](https://www.jemoka.com/posts/kbhacl2025_orals_language_modeling_1/)*

### ACL2025 Orals: QA

ACL2025 Huang: Making in Multi-Hop QA

*[Full note](https://www.jemoka.com/posts/kbhacl2025_orals/)*

### ACL2025 Pagoni: Patches Scale Better Than Tokens

One-Liner &ldquo;Patches in groups of tokenization scale better than tokens&rdquo;
Motivation / Novelty typical byte-level LMs don&rsquo;t are very expensive because many tokens its hard to go beyond 4-6 bytes per token: Zipf&rsquo;s Law so, we model them as token patches Notable Methods token patch &ldquo;how do we segment the byte sequence into patches?&rdquo; &mdash; insight: group predicable tokens after every hard choice! i.e., once you train a model, there are &ldquo;obvious&rdquo;
patcher...

*[Full note](https://www.jemoka.com/posts/kbhacl2025_pagoni/)*

### ACL2025 Tuesday Morning Posters

ACL2025 Katz: segment based attention masking Key insight: allow by directional attention
ACL2025 Monodorf: exploring modular sturctures transformer based language models Key insight: learn circuit compositions by learning a binary mask for both faithfulness and scarcity
ACL2025 Li: some more samples of next token prediction Key insight: when there&rsquo;s a high difference between generation probability and ground truth, those samples when intervene will cause a more dramatic effect
ACL2025 Kim...

*[Full note](https://www.jemoka.com/posts/kbhacl2025_tuesday_morning_posters/)*

### ACL2025 Workshop: Web Agents

*[Full note](https://www.jemoka.com/posts/kbhacl2025_workshop_web_agents/)*