[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhltrdp.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "LRTDP" source: https://www.jemoka.com/posts/kbhltrdp/ --- Real-Time Dynamic Programming RTDP is a asynchronous value iteration scheme. Each RTDP trial is a result of: \begin{equation} V(s) = \min_{ia \in A(s)} c(a,s) + \sum_{s’ \in S}^{} P_{a}(s’|s)V(s) \end{equation} the algorithm halts when the residuals are sufficiently small. Labeled RTDP We want to label converged states so we don’t need to keep investigate it: a state is solved if: state has less then \(\epsilon\) all reachable states given \(s’\) from this state has residual lower than \(\epsilon\) Labelled RTDP We stochastically simulate one step forward, and until a state we haven’t marked as “solved” is met, then we simulate forward and value iterate