[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/paper/acl2025/kbhacl2025_pagoni.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "ACL2025 Pagoni: Patches Scale Better Than Tokens" source: https://www.jemoka.com/posts/kbhacl2025_pagoni/ --- One-Liner “Patches in groups of tokenization scale better than tokens” Motivation / Novelty typical byte-level LMs don’t are very expensive because many tokens its hard to go beyond 4-6 bytes per token: Zipf’s Law so, we model them as token patches Notable Methods token patch “how do we segment the byte sequence into patches?” — insight: group predicable tokens after every hard choice! i.e., once you train a model, there are “obvious” patcher and unpatcher cross attend Key Figs New Concepts Notes