[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/undirected_exploration.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "Undirected Exploration" type: concept related: [Undirected Exploration] source: https://www.jemoka.com/posts/kbhundirected_exploration/ confidence: high status: active --- base epsilon-greedy: choose a random action with probability \(\epsilon\) otherwise, we choose the action with the best expectation \(\arg\max_{a} Q(s,a)\) epsilon-greedy exploration with decay Sometimes, approaches are suggested to decay \(\epsilon\) whereby, at each timestamp: \begin{equation} \epsilon \leftarrow \alpha \epsilon \end{equation} whereby \(\alpha \in (0,1)\) is called the “decay factor.” Explore-then-commit Select actions uniformly at random for \(k\) steps; then, go to greedy and stay there