[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhcontroller.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "controller" source: https://www.jemoka.com/posts/kbhcontroller/ --- a controller is a that maintains its own state. constituents \(X\): a set of nodes (hidden, internal states) \(\Psi(a|x)\): probability of taking an action \(\eta(x’|x,a,o)\) : transition probability between hidden states requirements Controllers are nice because we: don’t have to maintain a belief over time: we need an initial belief, and then we can create beliefs as we’d like without much worry controllers can be made shorter than conditional plans additional information finite state controller A finite state controller has a finite amount of hidden internal state. Consider the crying baby problem. We will declare two internal state: \begin{equation} x_1, x_2 \end{equation} Given our observations and our internal states, we can declare transitions and an action probability \(\Psi\): We essentially declare a policy vis a vi your observations. It can be a sequence, for instance, if we want to declare a policy whereby if you cry twice then you feed, you can declare: finite state controller evaluation \begin{equation} U(x, s) = \sum_{a}^{} \Psi(a|x) \qty[R(s,a) + \gamma \qty(\sum_{s’}^{} T(s’|s,a) \sum_{o}^{} O(o|a, s’) \sum_{x’}^{} \eta(x’|x,a,o) U(x’, s’)) ] \end{equation} which is a conditional plan evaluation but we know even litle and, to construct alpha vectors: \begin{equation} \alpha_{x} = \qty[U(x, s_1), \dots, U(x, s_{n})] \end{equation} we just make one alpha vector per node. So the entire plan is represented as usual by \(\Gamma\) a set of alpha vectors. And yes you can alpha vector pruning. \begin{align} U(x,b) = b^{\top} \alpha_{x} \end{align} node we want to start at: \begin{equation} X^{*} = \arg\max_{x} U(x,b) \end{equation} solving for \(\Psi\) and \(\eta\) policy iteration: incrementally add nodes and evaluate it nonlinear programming: this can be a nonlinear optimization problem controller gradient ascent