[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/course/cs229/kbhsu_cs229_nov102025.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "SU-CS229 NOV102025" source: https://www.jemoka.com/posts/kbhsu_cs229_nov102025/ date: 2025-11-10 --- Key Sequence Notation continuous state MDP New Concepts Important Results / Claims Questions Interesting Factoids “sometimes we may want to model slower than the data to be collected; for instance, your helicopter really doesn’t move anywhere every 100ths of a second to be learned, but you can collect data that fast” Debugging RL RL should work when The simulator is good The RL algorithm correctly maximize \(V^{\pi}\) Reward such that maximum expected payoff corresponds to your goal Diagnostics check your simulator: if your policy works in sim but not IRL, your sim is bad if \(V^{\text{RL}} < V^{\text{human}}\), then your RL algorithm is just bad if \(V^{\text{RL}} \geq V^{\text{human}}\), then your objective function is bad