[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/paper/iclr2025/kbhiclr2025_kilani_mrt5_tokenizer_free.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "ICLR2025 Kilani: MrT5 Tokenizer-Free" source: https://www.jemoka.com/posts/kbhiclr2025_kilani_mrt5_tokenizer_free/ --- Motivation ByteT5 is very expensive (because you have to have a residual on every damn token) MrT5 MrT5 uses a soft attention masking gate at pretraining time to delete unused tokens; at inference time we use a hard cut. Cool: MrT5 learns language independent compression rate (different languages have different rates).