[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhpomcp.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "POMCP" source: https://www.jemoka.com/posts/kbhpomcp/ --- Previous monte-carlo tree search methods which are not competitive to PBVI, SARSOP, etc., but those are affected by close-up history. key point: monte-cargo roll outs best-first tree search + unweighted particle filter (instead of categorical beliefs) Background History: a trajectory of some \(h = \{a_1, o_1, …\}\) generative model: we perform a random sample of possible next state (weighted by the action you took, meaning an instantiation of \(s’ \sim T(\cdot | s,a)\)) and reward \(R(s,a)\) from current state Rollout: keep sampling at each point, rolling out and calculating future reward monte-carlo tree search loop: sample \(s\) from the belief distribution \(B(h)\) for each node and call that the node state loop until we reach a leaf: sample exploratino using UCB 1 via the belief get observation, reward, next state add leaf node, add node for each available action Rollout backpropegate the obtained value with discounts backwards via POMDP Bellman Backup During runtime, we choose the action with the best action, prune the tree given what you observed, and do this again in a different.