[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/course/cs229/kbhsu_cs229_nov062025.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "SU-CS229 NOV062025" source: https://www.jemoka.com/posts/kbhsu_cs229_nov062025/ date: 2025-11-06 --- Key Sequence Notation New Concepts Markov Decision Process Bellman Equation optimal policy value iteration Important Results / Claims Questions Interesting Factoids 229 MDP notation \(S\) (state), \(A\) (actions), \(P_{(s,a)}\qty(s’) = T\qty(s’ | s,a)\) , \(\gamma\) (discount), \(R\qty(s,a)\). FUN FACT: discount factors \(< 1\) makes value iteration converge. \begin{equation} V^{\pi}\qty(s) = \mathbb{E}\qty [R\qty(s_{0},a_{0}) + \gamma R\qty(s_{1}, a_{1}) + \gamma^{2} \dots] \end{equation} \begin{equation} V^{\pi} \qty(s) = R\qty(s) + \gamma \sum_{s’}^{} P_{s,\pi\qty(s)}\qty(s’) V^{\pi}\qty(s’) \end{equation} \begin{equation} V^{*}\qty(s) = \max_{\pi} V^{\pi}\qty(s) \end{equation} What if we don’t know the transitions? Just learn the transitions! exportation exploitation.