Suggest edit — multiagent reasoning

Title

Name

Note

---
title: "multiagent reasoning"
source: https://www.jemoka.com/posts/kbhmultiagent_reasoning/
---

simple games constituents agent \(i \in X\) the set of agents. joint action space: \(A = A&rsquo; \times A^{2} \times &hellip; \times A^{k}\) joint action would be one per agent \(\vec{a} = (a_{1}, &hellip;, a_{k})\) joint reward function \(R(a) = R&rsquo;(\vec{a}), &hellip;, R(\vec{a})\) additional information prisoner&rsquo;s dilemma Cooperate Defect Cooperate -1, -1 -4, 0 Defect 0, -4 -3, -3 traveler&rsquo;s dilemma two people write down the price of their luggage, between 2-100 the lower amount gets that value plus 2 the higher amount gets the lower amount minus 2 joint policy agent utility for agent number \(i\)
\begin{equation} U^{i} (\vec{\pi}) = \sum_{a \in A}^{} R^{(i)}(\vec{a}) \prod_{j}^{} \pi^{(j)}(a^{(j)}) \end{equation}
this is essentially the reward you get given you took
response model how would other agents respond to our system?
\(a^{-i}\): joint action except for agent \(i\) \(\vec{a} = (a^{i}, a^{-i})\), \(R(a^{i}, a^{-i}) = R(\vec{a})\) best-response deterministic best response model for agent \(i\):
\begin{equation} \arg\max_{a^{i} \in A^{i}} U^{i}(a^{i}, \pi^{-i}) \end{equation}
where the response to agent \(a\) is deterministically selected.
For prisoner&rsquo;s dilemma, this results in both parties defecting because that would maximise the utility.
softmax response its like Softmax Method:
\begin{equation} \pi^{i}(a^{i}) \propto \exp\qty(\lambda U^{i}(a^{i}, \pi^{-1})) \end{equation}
fictitious play play at some kind of game continuously
Dominant Strategy Equilibrium The dominant strategy is a policy that is the best response to all other possible agent policies. Not all games have a Dominant Strategy Equilibrium, because there are games for which the best response is never invariant to others&rsquo; strategies (rock paper scissors).
Nash Equilibrium A Nash Equilibrium is a joint policy \(\pi\) where everyone is following their best response: i.e. no one is incentive to unilaterally change from their policy. This exists for every game. In general, Nash Equilibrium is very hard to compute: it is p-pad (which is unclear relationally to np-complete).