Suggest edit — Partially Observable Markov Decision Process

Title

Name

Note

---
title: "Partially Observable Markov Decision Process"
source: https://www.jemoka.com/posts/kbhpartially_observable_markov_decision_process/
---

Partially Observable Markov Decision Process is a with .
Components:
states actions (given state) transition function (given state and actions) reward function Belief System beliefs observations observation model \(O(o|a,s&rsquo;)\) As always we desire to find a \(\pi\) such that we can:
\begin{equation} \underset{\pi \in \Pi}{\text{maximize}}\ \mathbb{E} \qty[ \sum_{t=0}^{\infty} \gamma^{t} R(b_{t}, \pi(b_{t}))] \end{equation}
whereby our \(\pi\) instead of taking in a state for input takes in a belief (over possible states) as input.
observation and states &ldquo;where are we, and how sure are we about that?&rdquo;
beliefs and filters
policy representations &ldquo;how do we represent a policy&rdquo;
a tree: conditional plan a graph: with utility: + just take the top action of the conditional plan the alpha-vector was computed from policy evaluations &ldquo;how good is our policy / what&rsquo;s the utility?&rdquo;
conditional plan evaluation policy solutions &ldquo;how do we make that policy better?&rdquo;
exact solutions optimal value function for POMDP POMDP value-iteration approximate solutions estimate an , and then use a policy representation: upper-bounds for s lower-bounds for s online solutions Online POMDP Methods