Suggest edit — continuous state MDP

Title

Name

Note

---
title: "continuous state MDP"
type: concept
related: [Value Iteration, Markov Decision Process]
source: https://www.jemoka.com/posts/kbhcontinuous_state_mdp/
confidence: high
status: active
---

Bellman Equation, etc., are really designed for state spaces that are discrete. However, we&rsquo;d really like to be able to support continuous state spaces! Suppose we have: \(S \in \mathbb{R}^{n}\), what can we do?
Discretization We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your \(V\) back to a step function. Recall that this could start exploding: for \(S \in \mathbb{R}^{n}\) and we want to divide each axes into \(k\) values, we will get \(k^{n}\) values!
Also, instead of using the same size grid for every state variable, we may use more steps on the states for which output sensitivity is higher.
Using a Function Use a traditional function approximation (linear regression, neural network, etc.) as a proxy.
To do this, we need a model \(f\) of the MDP such that we have \(s&rsquo; = f\qty(s,a)\) such that \(s&rsquo; \sim T\qty(.|s,a)\). You may also have stochastic models, namly we can add something like \(s&rsquo; = f\qty(s,a) + \varepsilon\) where \(\epsilon \sim \mathcal{N}\) to make a more robust model. You can obtain \(f\) via data or via physics / expert design.
After we have this, given a state \(s\), call \(\phi\qty(s)\) the features of state \(s\). Then, we write:
\begin{equation} V\qty(s) = \hat{\t}^{T} \phi\qty(s) \end{equation}
Recall also, if we determinize our MDP, we have the Bellman equation as:
\begin{equation} V\qty(s) = R\qty(s) + \gamma \max_{a} V\qty(s&rsquo;), \text{ where } s&rsquo; = T\qty(s,a) \end{equation}
Now we have all the pieces to perform a particular type of value iteration:
Fitted Value Iteration Sample \(s_1, &hellip;, s_{n} \in S\). Initialize parameters \(\theta\) to seed a model \(V_{\theta}\qty(s) = \theta^{T} s\).
Repeat, for each \(i = 1 &hellip; n\)
compute: \(y^{(i)} = R\qty(s^{(i)}) + \gamma \max_{a} V\qty(T\qty(s,a))\) update your model \(V_{\theta}\) as usual If your value has stochasticity, we can just run it 10 times, etc. and get a next state approximation in a monte-carlo way.