[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/qmdp.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "QMDP" type: concept related: [Alpha Vector] source: https://www.jemoka.com/posts/kbhqmdp/ confidence: high status: active --- One alpha vector per action: \begin{equation} \alpha^{(k+1)}_{a}(s) = R(s,a) + \gamma \sum_{s’}^{}T(s’|s,a) \max_{a’} \alpha^{(k)}_{a’} (s’) \end{equation} This is going to give you a set of alpha vectors, one corresponding to each action. time complexity: \(O(|S|^{2}|A|^{2})\) you will note we don’t ever actually use anything partially-observable in this. Once we get the alpha vector, we need to use one-step lookahead in POMDP (which does use transitions) to actually turn this alpha vector into a policy, which then does create you We can deal with continuous state space by using some estimation of the value function (instead of alpha-vectors, we will just use a value-function estimate like q learning)