[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/continuous_state_mdp.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "continuous state MDP" type: concept related: [Value Iteration, Markov Decision Process] source: https://www.jemoka.com/posts/kbhcontinuous_state_mdp/ confidence: high status: active --- Bellman Equation, etc., are really designed for state spaces that are discrete. However, we’d really like to be able to support continuous state spaces! Suppose we have: \(S \in \mathbb{R}^{n}\), what can we do? Discretization We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your \(V\) back to a step function. Recall that this could start exploding: for \(S \in \mathbb{R}^{n}\) and we want to divide each axes into \(k\) values, we will get \(k^{n}\) values! Also, instead of using the same size grid for every state variable, we may use more steps on the states for which output sensitivity is higher. Using a Function Use a traditional function approximation (linear regression, neural network, etc.) as a proxy. To do this, we need a model \(f\) of the MDP such that we have \(s’ = f\qty(s,a)\) such that \(s’ \sim T\qty(.|s,a)\). You may also have stochastic models, namly we can add something like \(s’ = f\qty(s,a) + \varepsilon\) where \(\epsilon \sim \mathcal{N}\) to make a more robust model. You can obtain \(f\) via data or via physics / expert design. After we have this, given a state \(s\), call \(\phi\qty(s)\) the features of state \(s\). Then, we write: \begin{equation} V\qty(s) = \hat{\t}^{T} \phi\qty(s) \end{equation} Recall also, if we determinize our MDP, we have the Bellman equation as: \begin{equation} V\qty(s) = R\qty(s) + \gamma \max_{a} V\qty(s’), \text{ where } s’ = T\qty(s,a) \end{equation} Now we have all the pieces to perform a particular type of value iteration: Fitted Value Iteration Sample \(s_1, …, s_{n} \in S\). Initialize parameters \(\theta\) to seed a model \(V_{\theta}\qty(s) = \theta^{T} s\). Repeat, for each \(i = 1 … n\) compute: \(y^{(i)} = R\qty(s^{(i)}) + \gamma \max_{a} V\qty(T\qty(s,a))\) update your model \(V_{\theta}\) as usual If your value has stochasticity, we can just run it 10 times, etc. and get a next state approximation in a monte-carlo way.