[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhmultiagent_reasoning.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "multiagent reasoning" source: https://www.jemoka.com/posts/kbhmultiagent_reasoning/ --- simple games constituents agent \(i \in X\) the set of agents. joint action space: \(A = A’ \times A^{2} \times … \times A^{k}\) joint action would be one per agent \(\vec{a} = (a_{1}, …, a_{k})\) joint reward function \(R(a) = R’(\vec{a}), …, R(\vec{a})\) additional information prisoner’s dilemma Cooperate Defect Cooperate -1, -1 -4, 0 Defect 0, -4 -3, -3 traveler’s dilemma two people write down the price of their luggage, between 2-100 the lower amount gets that value plus 2 the higher amount gets the lower amount minus 2 joint policy agent utility for agent number \(i\) \begin{equation} U^{i} (\vec{\pi}) = \sum_{a \in A}^{} R^{(i)}(\vec{a}) \prod_{j}^{} \pi^{(j)}(a^{(j)}) \end{equation} this is essentially the reward you get given you took response model how would other agents respond to our system? \(a^{-i}\): joint action except for agent \(i\) \(\vec{a} = (a^{i}, a^{-i})\), \(R(a^{i}, a^{-i}) = R(\vec{a})\) best-response deterministic best response model for agent \(i\): \begin{equation} \arg\max_{a^{i} \in A^{i}} U^{i}(a^{i}, \pi^{-i}) \end{equation} where the response to agent \(a\) is deterministically selected. For prisoner’s dilemma, this results in both parties defecting because that would maximise the utility. softmax response its like Softmax Method: \begin{equation} \pi^{i}(a^{i}) \propto \exp\qty(\lambda U^{i}(a^{i}, \pi^{-1})) \end{equation} fictitious play play at some kind of game continuously Dominant Strategy Equilibrium The dominant strategy is a policy that is the best response to all other possible agent policies. Not all games have a Dominant Strategy Equilibrium, because there are games for which the best response is never invariant to others’ strategies (rock paper scissors). Nash Equilibrium A Nash Equilibrium is a joint policy \(\pi\) where everyone is following their best response: i.e. no one is incentive to unilaterally change from their policy. This exists for every game. In general, Nash Equilibrium is very hard to compute: it is p-pad (which is unclear relationally to np-complete).