[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/betazero.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "BetaZero" type: concept related: [Monte Carlo Tree Search, Belief, Double Progressive Widening, Letsdrive, Despot] source: https://www.jemoka.com/posts/kbhbetazero/ confidence: high status: active --- Background recall AlphaZero Selection (UCB 1, or DTW, etc.) Expansion (generate possible belief notes) Simulation (if its a brand new node, Rollout, etc.) Backpropegation (backpropegate your values up) Key Idea Remove the need for heuristics for MCTS—removing inductive bias Approach We keep the ol’ neural network: \begin{equation} f_{\theta}(b_{t}) = (p_{t}, v_{t}) \end{equation} Policy Evaluation Do \(n\) episodes of MCTS, then use cross entropy to improve \(f\) Ground truth policy Action Selection Uses Double Progressive Widening Importantly, no need to use a heuristic (or worst yet random Rollouts) for action selection. Difference vs. LetsDrive LetsDrive uses DESPOT BetaZero uses MCTS with belief states.