[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/local_policy_search.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "Local Policy Search" type: concept related: [Parameter, Utility Theory] source: https://www.jemoka.com/posts/kbhlocal_policy_search/ confidence: high status: active --- We begin with a policy parameterized on anything you’d like with random seed weights. Then, We sample a local set of parameters, one pertubation \(\pm \alpha\) per direction in the parameter vector (for instance, for a parameter in 4-space, up, down, left, right in latent space), and use those new parameters to seed a policy. Check each policy for its utility via monte-carlo policy evaluation If any of the adjacent points are better, we move there If none of the adjacent points are better, we set \(\alpha = 0.5 \alpha\) (of the up/down/left/right) and try again We continue until \(\alpha\) drops below some \(\epsilon\). Note: if we have billions of parameters, this method will be not that feasible because we have to calculate the Roll-out utility so many many many times.