Suggest edit — Approximate Value Function

Title

Name

Note

---
title: "Approximate Value Function"
type: concept
related: [Action Value Function, Kernel Smoothing, Markov Decision Process, Utility Function, Utility Theory]
source: https://www.jemoka.com/posts/kbhapproximate_value_function/
confidence: high
status: active
---

How do we deal with Markov Decision Process solution with continuous state space?
Let there be a value function parameterized on \(\theta\):
\begin{equation} U_{\theta}(s) \end{equation}
Let us find the value-function policy of this utility:
\begin{equation} \pi(s) = \arg\max_{a} \qty(R(s,a) + \gamma \sum_{s&rsquo;}^{} T(s&rsquo;|s,a) U_{\theta}(s&rsquo;)) \end{equation}
We now create a finite sampling of our state space, which maybe infinitely large (for instance, continuous):
\begin{equation} S \in \mathcal{S} \end{equation}
where, \(S\) is a set of discrete states \(\{s_1, \dots, s_{m}\}\).
Now, what next?
generally: Loop until convergence:
Initialize \(u_{\theta}\) For all \(s_{i} \in S\), let \(u_{i} = \max_{a} R(s,a) + \gamma \sum_{s&rsquo;}^{}T(s&rsquo;|s,a) u_{\theta}(s&rsquo;)\), the utility at those discrete state samples \(s_{i}\) Then, fit a \(\theta\) so that \(U_{\theta}(s_{i})\) is close to \(u_{i}\) to get \(T\): get a finite sampling of next states, or fit a function to it.
BUT: Convergence is not guaranteed.
There are two main specific approaches to achieve this:
global approximation linreg a best-fit line of state value vs. utility value polynomial fit a best-fit line, whereby \(U_{\theta}(s) = \theta^{T}\beta(s)\), where each \(\beta_{j}(s)=s^{j-1}\). a frigin neural network (train a model with parameters \(\theta\) which produces the utility calculations for you \(M_{\theta}(s) = U_{\theta}(s)\)) local approximation make a sampling in your continuous state space to discretized it do any utility function thing you&rsquo;d like (policy evaluation or value iteration) to get some set of \(\theta_{i}\), which is the utility for being in each sampled discrete state \(s_{i}\) whenever you need to calculate \(U(s)\) of a particular state&hellip; linearly interpolate k nearest neighbor kernel smoothing