[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/hsvi.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "HSVI" type: concept related: [Point Based Value Iteration, Fast Informed Bound, Hsvi, Alpha Vector, Blind Lower Bound] source: https://www.jemoka.com/posts/kbhhsvi/ confidence: high status: active --- Improving PBVI without sacrificing quality. Initialization We first initialize HSVI with a set of alpha vectors \(\Gamma\), representing the lower-bound, and a list of tuples of \((b, U(b))\) named \(\Upsilon\), representing the upper-bound. We call the value functions they generate as \(\bar{V}\) and \(\underline V\). Lower Bound Set of alpha vectors: best-action worst-state (HSVI1), blind lower bound (HSVI2) Calculating \(\underline{V}(b)\) \begin{equation} \underline{V}_{\Gamma} = \max_{\alpha} \alpha^{\top}b \end{equation} Upper Bound Fast Informed Bound solving fully-observable MDP Project \(b\) into the point-set Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection) Calculating \(\bar{V}(b)\) Recall that though the lower-bound is given by alpha vectors, the upper bound is given in terms of a series of tuples \((b, U(b)) \in \Upsilon\). HSVI1: we figure the upper bound for any given \(b\) by projecting onto the convex hull formed by points on \(\Upsilon\) HSVI2: approximate linear projection Update Begin with state \(b = b_0\). Repeat: at every step, we perform a local update for upper and lower bound using the current \(b\) the lower bound is updated using PBVI Backup on \(b, \Gamma\) the upper bound is updated using POMDP Bellman Update on \(b, \Upsilon\), putting the new \((b, u(b))\) in the set \(\Upsilon\). Then, we update our belief via the usual: \begin{equation} b \leftarrow update(b, a^{*}, o^{*}) \end{equation} where \(a^{*}\) and \(o^{*}\) are determined by… IE-MAX Heuristic IE-MAX Heuristic is used to determine \(a^{*}\), whereby we choose the action such that: \begin{equation} a^{*} = \arg\max_{a}Q^{(\bar{V})}(b) \end{equation} yes, we choose the next action which maximizes the upper bound of the utility we can get. weighted excess uncertainty weighted excess uncertainty is used to determine \(o^{*}\). Suppose we are depth \(d\) loops in the search tree (i.e. this is our $d$th chain), we define: \begin{equation} \text{excess}(b,t) = (\bar{V}(b)-\underline{V}(b)) - \epsilon \gamma^{-t} \end{equation} “how far away are we from converging to a value uncertainty of no more than \(\epsilon\), given we are depth \(t\) in? and, we choose the observation \(o^{*}\) such that: \begin{equation} o^{*} = \arg\max_{o} \qty[p(o|b,a^{*}) \text{excess}(update(b,a,o), t+1)] \end{equation} where, \begin{align} P(o|b,a) &= \sum_{s}^{} p(o|s,a) b(s) \\ &= \sum_{s}^{} \sum_{s’}^{} T(s’|s,a) O(o|s’,a) b(s) \end{align}