Response Regret

Martin Zinkevich

The concept of regret is designed for the long-term interaction of multiple agents. However, most concepts of regret do not consider even the short-term consequences of an agent’s actions: e.g., how other agents may be nice to you tomorrow if you are nice to them today. For instance, an agent that always defects while playing the Prisoner’s Dilemma will never have any swap or external regret. In this paper, we introduce a new concept of regret, called response regret, that allows one to consider both the immediate and short-term consequences of one’s actions. Thus, instead of measuring how an action affected the utility on the time step it was played, we also consider the consequences of the action on the next few time steps, subject to the dynamic nature of the other agent’s responses: e.g. if the other agent always is nice to us after we are nice to it, then we should always be nice: however, if the other agent sometimes returns favors and sometimes doesn’t, we will not penalize our algorithm for not knowing when these times are. We develop algorithms for both external response regret and swap response regret, and show how if two agents minimize swap response regret, then they converge to the set of correlated equilibria in repeated bimatrix games.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.