[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/approximate_value_function.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "Approximate Value Function" type: concept related: [Action Value Function, Kernel Smoothing, Markov Decision Process, Utility Function, Utility Theory] source: https://www.jemoka.com/posts/kbhapproximate_value_function/ confidence: high status: active --- How do we deal with Markov Decision Process solution with continuous state space? Let there be a value function parameterized on \(\theta\): \begin{equation} U_{\theta}(s) \end{equation} Let us find the value-function policy of this utility: \begin{equation} \pi(s) = \arg\max_{a} \qty(R(s,a) + \gamma \sum_{s’}^{} T(s’|s,a) U_{\theta}(s’)) \end{equation} We now create a finite sampling of our state space, which maybe infinitely large (for instance, continuous): \begin{equation} S \in \mathcal{S} \end{equation} where, \(S\) is a set of discrete states \(\{s_1, \dots, s_{m}\}\). Now, what next? generally: Loop until convergence: Initialize \(u_{\theta}\) For all \(s_{i} \in S\), let \(u_{i} = \max_{a} R(s,a) + \gamma \sum_{s’}^{}T(s’|s,a) u_{\theta}(s’)\), the utility at those discrete state samples \(s_{i}\) Then, fit a \(\theta\) so that \(U_{\theta}(s_{i})\) is close to \(u_{i}\) to get \(T\): get a finite sampling of next states, or fit a function to it. BUT: Convergence is not guaranteed. There are two main specific approaches to achieve this: global approximation linreg a best-fit line of state value vs. utility value polynomial fit a best-fit line, whereby \(U_{\theta}(s) = \theta^{T}\beta(s)\), where each \(\beta_{j}(s)=s^{j-1}\). a frigin neural network (train a model with parameters \(\theta\) which produces the utility calculations for you \(M_{\theta}(s) = U_{\theta}(s)\)) local approximation make a sampling in your continuous state space to discretized it do any utility function thing you’d like (policy evaluation or value iteration) to get some set of \(\theta_{i}\), which is the utility for being in each sampled discrete state \(s_{i}\) whenever you need to calculate \(U(s)\) of a particular state… linearly interpolate k nearest neighbor kernel smoothing