Suggest edit — ICLR2025 Snell: Optimality of Scaling LLM Test-Time Compute

Title

Name

Note

---
title: "ICLR2025 Snell: Optimality of Scaling LLM Test-Time Compute"
source: https://www.jemoka.com/posts/kbhiclr2025_snell_scaling_llm_test_time_compute/
---

Compute-Optimal Scaling Compute-Optimal Scaling is the notion of selecting the optimal configuration (beam width, search budget, etc.) dynamically / for binned question.
Approaches to &ldquo;Scaling Test-Time Compute&rdquo; Three primary approaches:
best-of-n: roll out a bunch, reject Beam Search: check against intermediate lookahead search: MCTSish (do lookahead rollouts) Key insight On easy qusetion, beam search shows over-optimization and best of n is good on medium/hard questions, beam search is better Lookahead seems bad?
Method Learning a Value Function [Wang 2312.08935]
Sequential vs. Parallel Sampling in Scaling Test Time Compute What&rsquo;s the trade-off between sampling a bunch in parallel vs. thinking sequentially.
Key insight on easy questions, sequential is best on harder questions, there&rsquo;s a good ratio of trade offs Test Time vs. Pretraining compute Key insight on easy questions, test time scaling is good on medium/hard questions, pretraining scaling is better Questions why do you think lookahead search works worse/not better than others? is there some problem specific heuristics that trigger this?