Suggest edit — HSVI

Title

Name

Note

---
title: "HSVI"
type: concept
related: [Point Based Value Iteration, Fast Informed Bound, Hsvi, Alpha Vector, Blind Lower Bound]
source: https://www.jemoka.com/posts/kbhhsvi/
confidence: high
status: active
---

Improving PBVI without sacrificing quality.
Initialization We first initialize HSVI with a set of alpha vectors $\Gamma$, representing the lower-bound, and a list of tuples of $(b, U(b))$ named $\Upsilon$, representing the upper-bound. We call the value functions they generate as $\bar{V}$ and $\underline V$.
Lower Bound Set of alpha vectors: best-action worst-state (HSVI1), blind lower bound (HSVI2)
Calculating $\underline{V}(b)$ \begin{equation} \underline{V}_{\Gamma} = \max_{\alpha} \alpha^{\top}b \end{equation}
Upper Bound Fast Informed Bound
solving fully-observable MDP Project $b$ into the point-set Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection) Calculating $\bar{V}(b)$ Recall that though the lower-bound is given by alpha vectors, the upper bound is given in terms of a series of tuples $(b, U(b)) \in \Upsilon$.
HSVI1: we figure the upper bound for any given $b$ by projecting onto the convex hull formed by points on $\Upsilon$ HSVI2: approximate linear projection Update Begin with state $b = b_0$.
Repeat:
at every step, we perform a local update for upper and lower bound using the current $b$
the lower bound is updated using PBVI Backup on $b, \Gamma$ the upper bound is updated using POMDP Bellman Update on $b, \Upsilon$, putting the new $(b, u(b))$ in the set $\Upsilon$. Then, we update our belief via the usual:
\begin{equation} b \leftarrow update(b, a^{*}, o^{*}) \end{equation}
where $a^{*}$ and $o^{*}$ are determined by&hellip;
IE-MAX Heuristic IE-MAX Heuristic is used to determine $a^{*}$, whereby we choose the action such that:
\begin{equation} a^{*} = \arg\max_{a}Q^{(\bar{V})}(b) \end{equation}
yes, we choose the next action which maximizes the upper bound of the utility we can get.
weighted excess uncertainty weighted excess uncertainty is used to determine $o^{*}$. Suppose we are depth $d$ loops in the search tree (i.e. this is our $d$th chain), we define:
\begin{equation} \text{excess}(b,t) = (\bar{V}(b)-\underline{V}(b)) - \epsilon \gamma^{-t} \end{equation}
&ldquo;how far away are we from converging to a value uncertainty of no more than $\epsilon$, given we are depth $t$ in?
and, we choose the observation $o^{*}$ such that:
\begin{equation} o^{*} = \arg\max_{o} \qty[p(o|b,a^{*}) \text{excess}(update(b,a,o), t+1)] \end{equation}
where,
\begin{align} P(o|b,a) &amp;= \sum_{s}^{} p(o|s,a) b(s) \\ &amp;= \sum_{s}^{} \sum_{s&rsquo;}^{} T(s&rsquo;|s,a) O(o|s&rsquo;,a) b(s) \end{align}