[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / wiki/concepts/action_value_function.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "action-value function" type: concept related: [Action Value Function, Advantage Function, Policy, Utility Theory] source: https://www.jemoka.com/posts/kbhaction_value_function/ confidence: high status: active --- Quality of taking a particular value at a function—“expected discounted return when following a policy from \(S\) and taking \(a\)”: \begin{equation} Q(s,a) = R(s,a) + \gamma \sum_{s’} T(s’|s,a) U(s’) \end{equation} where, \(T\) is the transition probability from \(s\) to \(s’\) given action \(a\). value function Therefore, the utility of being in a state (called the value function) is: \begin{equation} U(s) = \max_{a} Q(s,a) \end{equation} “the utility that gains the best action-value” value-function policy A value-function policy is a policy that maximizes the action-value \begin{equation} \pi(s) = \arg\max_{a} Q(s,a) \end{equation} “the policy that takes the best action to maximize action-value” we call this \(\pi\) “greedy policy with respect to \(U\)” advantage see advantage function