[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhpomcpow.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "POMCPOW" source: https://www.jemoka.com/posts/kbhpomcpow/ --- POMDPs with continuous actions are hard. So POMCP or (belief update + MCTS). So instead, let’s try improving that. Unlike just POMCP, not only do we have \(B(h)\), we also have \(W(h)\), which is the weight of a specific state sampled. Naively applying POMCP on continuous states will give a wide-ass tree because each sampled state will not be the same as before. double progressive widening We want to use sampling to sample from observation. This will eventually lead to a suboptimal QMDP policy—this is because there are no state uncertainty? POMCPOW get an action from ActionProgressiveWiden function Get an observation, if the observation we got has to many children we prune discard the observation and stick the next state onto previous observation weighted by the observation likelihood system \(Z(o|s,a,s’)\) \(k, \alpha, C\) PFTDTW MCTS Particle filters Double Progressive Widening