Suggest edit — action-value function

Title

Name

Note

---
title: "action-value function"
type: concept
related: [Action Value Function, Advantage Function, Policy, Utility Theory]
source: https://www.jemoka.com/posts/kbhaction_value_function/
confidence: high
status: active
---

Quality of taking a particular value at a function&mdash;&ldquo;expected discounted return when following a policy from \(S\) and taking \(a\)&rdquo;:
\begin{equation} Q(s,a) = R(s,a) + \gamma \sum_{s&rsquo;} T(s&rsquo;|s,a) U(s&rsquo;) \end{equation}
where, \(T\) is the transition probability from \(s\) to \(s&rsquo;\) given action \(a\).
value function Therefore, the utility of being in a state (called the value function) is:
\begin{equation} U(s) = \max_{a} Q(s,a) \end{equation}
&ldquo;the utility that gains the best action-value&rdquo;
value-function policy A value-function policy is a policy that maximizes the action-value
\begin{equation} \pi(s) = \arg\max_{a} Q(s,a) \end{equation}
&ldquo;the policy that takes the best action to maximize action-value&rdquo;
we call this \(\pi\) &ldquo;greedy policy with respect to \(U\)&rdquo;
advantage see advantage function