awesome-marl/Research Papers/Shallow learning at main · instadeepai/awesome-marl

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

This is a collection of older papers on the topic of learning in multi-agent systems. We sort papers by publication date and survey subtopic. Any additions to this repo are welcome.

Auctions, Bidding and Negotiation

Rational expectations, information acquisition, and competitive bidding by Milgrom PR. Econometrica: Journal of the Econometric Society, 1981.

Most rational expectations market equilibrium models are not models of price formation, and naive mechanisms leading to such equilibria can be severely manipulable. In this paper, a bidding model is developed which has the market-like features that bidders act as price takers and that prices convey information. Higher equilibrium prices convey more favorable information about the quality of the objects being sold than do lower prices. Bidders can benefit from trading only if they have a transactions motive or if they have access to inside information. Apart from exceptional cases, prices are not fully revealing. A two stage model is developed in which bidders may acquire information at a cost before bidding and for which the equilibrium price is fully revealing, resolving a well-known paradox.
-

A theory of auctions and competitive bidding by Milgrom PR, Weber RJ. Econometrica: Journal of the Econometric Society, 1982.

A model of competitive bidding is developed in which the winning bidder's payoff may depend upon his personal preferences, the preferences of others, and the intrinsic qualities of the object being sold. In this model, the English (ascending) auction generates higher average prices than does the second-price auction. Also, when bidders are risk-neutral, the second-price auction generates higher average prices than the Dutch and first-price auctions. In all of these auctions, the seller can raise the expected price by adopting a policy of providing expert appraisals of the quality of the objects he sells.
-

Auctions and bidding by McAfee RP, McMillan J. Journal of economic literature, 1987.

This paper surveys recent developments in the theory of bidding.
-

Auctions with a stochastic number of bidders by McAfee RP, McMillan J. Journal of economic theory, 1987.

Auction theory is generalised by allowing the number of bidders to be stochastic. In a first-price sealed-bid auction with bidders having constant absolute risk aversion, the expected selling price is higher when the bidders do not know how many other bidders there are than when they do know this. Thus the seller should conceal the number of bidders if he can. Moreover, a bidder's ex ante expected utility is the same whether or not there is a policy of concealing the number of bidders: concealment therefore Pareto-dominates announcement. With risk-neutral bidders, the optimal auction is the same whether or not the bidders know who their competitors are.
-

Bargaining and Markets by Martin J. Osborne, Ariel Rubinstein. Academic Press, 1990.

The formal theory of bargaining originated with John Nash's work in the early 1950s. This book discusses two recent developments in this theory. The first uses the tool of extensive games to construct theories of bargaining in which time is modeled explicitly. The second applies the theory of bargaining to the study of decentralized markets. Rather than surveying the field, the authors present a select number of models, each of which illustrates a key point. In addition, they give detailed proofs throughout the book. It uses a small number of models, rather than a survey of the field, to illustrate key points, and includes detailed proofs given as explanations for the models. The text has been class-tested in a semester-long graduate course.
-

Bidding rings by McAfee RP, McMillan J. The American Economic Review, 1992.

We characterize coordinated bidding strategies in two cases: a weak cartel, in which the bidders cannot make side-payments; and a strong cartel, in which the cartel members can exclude new entrants and can make transfer payments. The weak cartel can do no better than have its members submit identical bids. The strong cartel in effect reauctions the good among the cartel members.
-

A Market-Oriented Programming Environment and its Application to Distributed Multicommodity Flow Problems by M. P. Wellman. JAIR, 1993.

Market price systems constitute a well-understood class of mechanisms that under certain conditions provide effective decentralization of decision making with minimal communication overhead. In a market-oriented programming approach to distributed problem solving, we derive the activities and resource allocations for a set of computational agents by computing the competitive equilibrium of an artificial economy. WALRAS provides basic constructs for defining computational market structures, and protocols for deriving their corresponding price equilibria. In a particular realization of this approach for a form of multicommodity flow problem, we see that careful construction of the decision process according to economic principles can lead to efficient distributed resource allocation, and that the behavior of the system can be meaningfully analyzed in economic terms.
-

A Note on Sequential Auctions by DAN BERNHARDT, DAVID SCOONES. American Economic Review, 1994.

This note explores multiobject, sequential, private-value auctions.
-

Production sequencing as negotiation by M. Wooldridge, S. Bussmann, and M. Klosterberg. PAAM, 1996.

The production sequencing problem involves a factory generating a product sequence such that when processed, the sequence will both satisfy current orders, and minimize overall costs. In this paper, we argue that production sequencing may fruitfully be considered as a negotiation problem, in which production cells within a factory negotiate over product sequences in order to fairly distribute costs. We begin by describing and formally defining the production sequencing problem; we then investigate its complexity, and present an analysis of it using the tools of game and negotiation theory. We then define a negotiation algorithm for production sequencing, and discuss what assumptions and simplifications must be made in order to make this algorithm suitable for implementation. We conclude by discussing issues for future work.
-

Applying Game Theory to Automated Negotiation by Ken Binmore and Nir Vulkan. Workshop on Economics, Game Theory and the Internet, 1997.

With existing technology, it is already possible for personal agents to schedule meetings for their users, to write the small print of an agreement, and for agents to search the Internet for the cheapest price. But serious negotiation cranks the difficulty of the problem up several notches. In this paper, we review what game theory has to offer in the light of experience gained in programming automated agents within the ADEPT (Advance Decision Environment for Process Tasks) project, which is currently being used by British Telecom for some purposes
-

Negotiation Decision Functions for Autonomous Agents by Peyman Faratin, Carles Sierra, Nick R Jenning. Journal of Robotics and Autonomous Systems, 1998.

We present a formal model of negotiation between autonomous agents The purpose of the negotiation is to reach an agreement about the provision of a service by one agent for another The model defines a range of strategies and tactics that agents can employ to generate initial offers, evaluate proposals and offer counter proposals. The model is based on computationally tractable assumptions, demonstrated in the domain of business process management and empirically evaluate
-

Efficient mechanism design by Krishna V, Perry M. Available at SSRN 64934, 1998.

We study Bayesian mechanism design in situations where agents’ information may be multi-dimensional, concentrating on mechanisms that lead to efficient allocations. Our main result is that a generalization of the well-known Vickrey-Clarke-Groves mechanism maximizes the planner’s “revenue” among all efficient mechanisms. This result is then used to study multiple object auctions in situations where bidders have privately known “demand curves” and extended to include situations with complementarities across objects or externalities across bidders. We also illustrate how the main result may be used to analyze the possibility of allocating both private and public goods efficiently when budget balance considerations are important. The generalized VCG mechanism, therefore, serves to unify many results in mechansim design theory.
-

Sequential Auctions for the Allocation of Resources with Complementarities by Craig Boutilier, Moises Goldszmidt, Bikash Sabata. IJCAI, 1999.

Market-based mechanisms such as auctions are being studied as an appropriate means for resource allocation in distributed and multiagent decision problems. When agents value resources in combination rather than in isolation, one generally relies on combinatorial auctions where agents bid for resource bundles, orsimultaneous auctionsfor all resources. We develop a different model, where agents bid for required resourcessequentially. This model has the advantage that it can be applied in settings where combinatorial and simultaneous models are infeasible (e.g., when resources are made available at different points in time by different parties), as well as certain benefits in settings where combinatorial models are applicable. We develop a dynamic programming model for agents to compute bidding policies based on estimated distributions over prices. We also describe how these distributions are updated to provide a learning model for bidding behavior
-

Automated Strategy Searches in an Electronic Goods Market: Learning and Complex Price Schedules by Christopher H. Brooks, Scott Fay, Rajarshi Das, Jeffrey K. MacKie-Masons, Jeffrey Kephartt, Edmund H. Durfee. ACM-EC, 1999.

Markets for electronic goods provide the possibility of exploring new and more complex pricing schemes, due to the flexibility of information goods and negligible marginal cost. In this paper we compare dynamic performance across price schedules of varying complexity. We provide a monopolist producer with two machine learning methods which implement a strategy that balances exploitation to maximize current profits against exploration to improve future profits. We find that the complexity of the price schedule affects both the amount of exploration necessary and the aggregate profit received by a producer. In general, simpler price schedules are more robust and give up less profit during the learning periods even though the more complex schedules have higher long-run profits. These results hold for both learning methods, even though the relative performance of the methods is quite sensitive to differences in the smoothness of the profit landscape for different price schedules. Our results have implications for automated learning and strategic pricing in non-stationary environments, which arise when the consumer population changes, individuals change their preferences, or competing firms change their strategies.
-

Auction theory: A guide to the literature by Klemperer P. Journal of economic surveys, 1999.

This paper provides an elementary, non-technical, survey of auction theory, by introducing and describing some of the critical papers in the subject. (The most important of these are reproduced in a companion book, The Economic Theory of Auctions, Paul Klemperer (ed.), Edward Elgar (pub.), forthcoming.) We begin with the most fundamental concepts, and then introduce the basic analysis of optimal auctions, the revenue equivalence theorem, and marginal revenues. Subsequent sections addrms risk-aversion, affiliation, asymmetries, entry, collusion, multi-unit auctions, double auctions, royalties, incentive contracts, and other topics. Appendices contain technical details, some simple worked examples, and bibliographies.
-

Dynamic pricing by software agents by Jeffrey O. Kephart, James E. Hanson, Amy R. Greenwald. Computer Networks, 2000.

We envision a future in which the global economy and the Internet will merge, evolving into an information economy bustling with billions of economically motivated software agents that exchange information goods and services with humans and other agents. Economic software agents will differ in important ways from their human counterparts, and these di�erences may have significant beneficial or harmful effects upon the global economy. It is therefore important to consider the economic incentives and behaviors of economic software agents, and to use every available means to anticipate their collective interactions. We survey research conducted by the Information Economies group at IBM Research aimed at understanding collective interactions among agents that dynamically price information goods or services. In particular, we study the potential impact of widespread shopbot usage on prices, the price dynamics that may ensue from various mixtures of automated pricing agents (or _pricebots_), the potential use of machine-learning algorithms to improve profits, and more generally the interplay among learning, optimization, and dynamics in agentbased information economies. These studies illustrate both beneficial and harmful collective behaviors that can arise in such systems, suggest possible cures for some of the undesired phenomena, and raise fundamental theoretical issues, particularly in the realms of multi-agent learning and dynamic optimization.
-

A theory of auctions and competitive bidding II by Milgrom P, Weber RJ. The economic theory of auctions, 2000.

This paper was written in early 1982. On the basis of results from previous research, we had informally conjectured that sequential auctions of the types treated here would yield upward-drifting price sequences at equilibrium. The 1981 RCA transponder lease auction described in the introduction, and then-contemporary comments of Ashenfelter concerning the “afternoon effect” in wine auctions, challenged this conjecture. Clearly, a proof was needed, and we generated one for the case of sequential first-price auctions with winning-bid announcements after each round. We wrote out the equilibrium-characterization and price-sequence proofs, and then sketched out what we thought might be analogous results for a variety of other auction formats. Proofs for the sequential first-price and second-price auctions without price announcements refused to come together, and a bit of further thought indicated a possible reason, which we here add as a bracketed comment just before the "Summary of Results” section of the paper. Without the proofs we sought, the paper languished in informal distribution, until the editor of this volume requested permission to print the paper, and graciously agreed to the inclusion of this forward.
-

Pricing information bundles in a dynamic environment by J. Kephart, C. Brooks, and R. Das. ACMEC, 2001.

We explore a scenario in which a monopolist producer of information goods seeks to maximize its profits in a market where consumer demand shifts frequently and unpredictably. The producer may set an arbitrarily complex price schedule---a function that maps the set of purchased items to a price. However, lacking direct knowledge of consumer demand, it cannotcompute the optimal schedule. Instead, it attempts to optimize profits via trial and error. By means of a simple model of consumer demand and a modified version of a simple nonlinear optimization routine, we study a variety of parametrizations of the price schedule and quantify some of the relationships among learnability, complexity, and profitability. In particular, we show that fixed pricing or simple two-parameter dynamic pricing schedules are preferred when demand shifts frequently, but that dynamic pricing based on more complex schedules tends to be most profitable when demand shifts very infrequently.
-

Automated negotiation: prospects, methods, and challenges by N. Jennings, P. Faratin, A. Lomuscio, S. Parsons, C. Sierra, and M. Wooldrigde. International Journal of Group Decision and Negotiation, 2001.

This paper is to examine the space of negotiation opportunities for autonomous agents, to identify and evaluate some of the key techniques, and to highlight some of the major challenges for future automated negotiation research. This paper is not meant as a survy of the field of automated negotiation. Rather, the descriptions and assessments of the various approaches are generally undertaken with particular reference to work in which the authors have been involved. However, the specific issues raised should be viewed as being broadly applicable.
-

Introduction to the special issue on agent-based computational economics by Leigh Tesfatsion. Journal of Economic Dynamics & Control, 2001.

A brief overview of agent-based computational economics (ACE) is given, followed by a synopsis of the articles included in this special issue on ACE and in a companion special issue on ACE scheduled to appear in Computational Economics.
-

Economic Dynamics of Agents in Multiple Auctions by Chris Preist, Andrew Byde, Claudio Bartolini. Autonomous Agents, 2001.

Over the last few years, electronic auctions have become an increasingly important aspect of e-commerce, both in the business to business and business to consumer domains. As a result of this, it is often possible to find many auctions selling similar goods on the web. However, when an individual is attempting to purchase such a good, they will usually bid in one, or a small number, of such auctions. This results in two forms of inefficiency. Firstly, the individual may pay more for the good than would be expected in an ideal market. Secondly, some sellers may fail to make a sale that could take place in an ideal market. In this paper, we present an agent that is able to participate in multiple auctions for a given good, placing bids appropriately to secure the cheapest price. We present experiments to show; 1. Current auction markets on the web are inefficient, with trades taking place away from equilibrium price, and not all benefit from trade being extracted. 2. Our agent is able to exploit these inefficiencies, resulting in it making higher profits than the simple strategy of bidding in a small number of auctions. 3. As more participants use our agent, the market becomes more efficient. When all participants use the agent, all trades take place close to equilibrium price, and the market approaches ideal behaviour
-

A bartering approach to improve multiagent learning by K. Tumer, A. K. Agogino, and D. H. Wolpert. AAMAS, 2002.

Multiagent systems offer a new paradigm to organize AI Applications. We focus on the application of Case-Based Reasoning to Multiagent systems. CBR offers the individual agents the capability of autonomously learn from experience. In this paper we present a framework for collaboration among agents that use CBR. We present explicit strategies for case bartering that address the issue of agents having a biased view of the data. The outcome of bartering is an improvement of individual agent performance and of overall multiagent system performance that equals the ideal situation where all agents have an unbiased view of the data. We also present empirical results illustrating the robustness of the case bartering process for several configurations of the multiagent system and for three different CBR techniques.
-

Effectiveness of preference elicitation in combinatorial auctions by Benoıt Hudson, Tuomas Sandholm. AMEC IV, 2002.

Combinatorial auctions where agents can bid on bundles of items are desirable because they allow the agents to express complementarity and substitutability between the items. However, expressing one’s preferences can require bidding on all bundles. Selective incremental preference elicitation by the auctioneer was recently proposed to address this problem [4], but the idea was not evaluated. In this paper we show, experimentally and theoretically, that automated elicitation provides a drastic benefit. In all of the elicitation schemes under study, as the number of items for sale increases, the amount of information elicited is a vanishing fraction of the information collected in traditional “direct revelation mechanisms” where bidders reveal all their valuation information. Most of the elicitation schemes also maintain the benefit as the number of agents increases. We develop more effective elicitation policies for existing query types. We also present a new query type that takes the incremental nature of elicitation to a new level by allowing agents to give approximate answers that are refined only on an as-needed basis. In the process, we present methods for evaluating different types of elicitation policies.
-

Auction Theory by Vijay Krishna. Academic press, 2002.

Auction Theory, Second Edition improves upon his 2002 bestseller with a new chapter on package and position auctions as well as end-of-chapter questions and chapter notes. Complete proofs and new material about collusion complement Krishna’s ability to reveal the basic facts of each theory in a style that is clear, concise, and easy to follow. With the addition of a solutions manual and other teaching aids, the 2e continues to serve as the doorway to relevant theory for most students doing empirical work on auctions.
-

Decision Procedures for Multiple Auctions by Andrew Byde, Chris Preist, Nicholas R Jennings. AAMAS, 2002.

This paper presents a decision theoretic framework that an autonomous agent can use to bid effectively across multiple, simultaneous auctions. Specifically, our framework enables an agent to make rational decisions about purchasing multiple goods from a series of auctions that operate different protocols (we deal with the English, Dutch, First-Price Sealed Bid and Vickrey cases). The framework is then used to characterize the optimal decision that an agent should take. Finally, we develop a practical algorithm that provides a heuristic approximation to this ideal
-

Using similarity criteria to make issue trade-offs in automated negotiations by P. Faratin, C. Sierra, N.R. Jennings. Artificial Intelligence, 2002.

Automated negotiation is a key form of interaction in systems that are composed of multiple autonomous agents. The aim of such interactions is to reach agreements through an iterative process of making offers. The content of such proposals are, however, a function of the strategy of the agents. Here we present a strategy called the trade-off strategy where multiple negotiation decision variables are traded-off against one another (e.g., paying a higher price in order to obtain an earlier delivery date or waiting longer in order to obtain a higher quality service). Such a strategy is commonly known to increase the social welfare of agents. Yet, to date, most computational work in this area has ignored the issue of trade-offs, instead aiming to increase social welfare through mechanism design. The aim of this paper is to develop a heuristic computational model of the trade-off strategy and show that it can lead to an increased social welfare of the system. A novel linear algorithm is presented that enables software agents to make trade-offs for multi-dimensional goods for the problem of distributed resource allocation. Our algorithm is motivated by a number of real-world negotiation applications that we have developed and can operate in the presence of varying degrees of uncertainty. Moreover, we show that on average the total time used by the algorithm is linearly proportional to the number of negotiation issues under consideration. This formal analysis is complemented by an empirical evaluation that highlights the operational effectiveness of the algorithm in a range of negotiation scenarios. The algorithm itself operates by using the notion of fuzzy similarity to approximate the preference structure of the other negotiator and then uses a hill-climbing technique to explore the space of possible trade-offs for the one that is most likely to be acceptable
-

Co-evolving automata negotiate with a variety of opponents by authors. CEC, 2002.

Real-life negotiations typically involve multiple parties with (i) different preferences for the different issues and (ii) bargaining strategies which change over time. Such a dynamic environment (with imperfect information) is addressed in this paper with a multi-population evolutionary algorithm (EA). Each population represents an evolving collection of bargaining strategies in our setup. The bargaining strategies are represented by a special kind of finite automata, which require only two transitions per state. We show that such automata (with a limited complexity) are a suitable choice in a computational setting. We furthermore describe an EA which generates highly-efficient bargaining automata in the course of time. A series of computational experiments shows that co-evolving automata are able to discriminate successfully between different opponents, although they receive no explicit information about the identity or preferences of their opponents. These results are important for the further development of evolving automata for real-life (agent system) applications
-

Truth revelation in approximately efficient combinatorial auctions by Lehmann D, Oćallaghan LI, Shoham Y. Journal of the ACM (JACM), 2002.

Some important classical mechanisms considered in Microeconomics and Game Theory require the solution of a difficult optimization problem. This is true of mechanisms for combinatorial auctions, which have in recent years assumed practical importance, and in particular of the gold standard for combinatorial auctions, the Generalized Vickrey Auction (GVA). Traditional analysis of these mechanisms - in particular, their truth revelation properties - assumes that the optimization problems are solved precisely. In reality, these optimization problems can usually be solved only in an approximate fashion. We investigate the impact on such mechanisms of replacing exact solutions by approximate ones. Specifically, we look at a particular greedy optimization method, which has empirically been shown to perform well. We show that the GVA payment scheme does not provide for a truth revealing mechanism. We introduce another scheme that does guarantee truthfulness for a restricted class of players. We demonstrate the latter property by identifying sufficient conditions for a combinatorial auction to be truth-revealing, conditions which have applicability beyond the specific auction studied here.
-

Negotiating Complex Contracts by MARK KLEIN, PEYMAN FARATIN, HIROKI SAYAMA, YANEER BAR-YAM. Group Decision and Negotiation, 2003.

Work to date on computational models of negotiation has focused almost exclusively on defining contracts consisting of one or a few independent issues and tractable contract spaces. Many real-world contracts, by contrast, are much more complex, consisting of multiple inter-dependent issues and intractably large contract spaces. This paper describes a simulated annealing based approach appropriate for negotiating such complex contracts that achieves near-optimal social welfares for negotiations with binary issue dependencies.
-

A Fuzzy-Logic Based Bidding Strategy for Autonomous Agents in Continuous Double Auctions by Minghua He, Ho-fung Leung, Nicholas R. Jennings. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003.

Increasingly, many systems are being conceptualized, designed, and implemented as marketplaces in which autonomous software entities (agents) trade services. These services can be commodities in e-commerce applications or data and knowledge services in information economies. In many of these cases, there are both multiple agents that are looking to procure services and multiple agents that are looking to sell services at any one time. Such marketplaces are termed continuous double auctions (CDAs). Against this background, this paper develops new algorithms that buyer and seller agents can use to participate in CDAs. These algorithms employ heuristic fuzzy rules and fuzzy reasoning mechanisms in order to determine the best bid to make given the state of the marketplace. Moreover, we show how an agent can dynamically adjust its bidding behavior to respond effectively to changes in the supply and demand in the marketplace. We then show, by empirical evaluations, how our agents outperform four of the most prominent algorithms previously developed for CDAs (several of which have been shown to outperform human bidders in experimental studies).
-

BOB: Improved winner determination in combinatorial auctions and generalizations by Tuomas Sandholm, Subhash Suri. Artificial Intelligence, 2003.

Combinatorial auctions can be used to reach efficient resource and task allocations in multiagent systems where the items are complementary or substitutable. Determining the winners is NP-complete and inapproximable, but it was recently shown that optimal search algorithms do very well on average. This paper presents a more sophisticated search algorithm for optimal (and anytime) winner determination, including structural improvements that reduce search tree size, faster data structures, and optimizations at search nodes based on driving toward, identifying and solving tractable special cases. We also uncover a more general tractable special case, and design algorithms for solving it as well as for solving known tractable special cases substantially faster. We generalize combinatorial auctions to multiple units of each item, to reserve prices on singletons as well as combinations, and to combinatorial exchanges. All of these generalizations support both complementarity and substitutability of the items. Finally, we present algorithms for determining the winners in these generalizations
-

Comparing Equilibria for Game-Theoretic and Evolutionary Bargaining Models by Shaheen Fatima, Michael Wooldridge, Nicholas R. Jennings. AMEC, 2003.

Game-theoretic models of bargaining are typically based on the assumption that players have perfect rationality and that they always play an equilibrium strategy. In contrast, research in experimental economics shows that in bargaining between human subjects, participants do not always play the equilibrium strategy. Such agents are said to be boundedly rational. In playing a game against a boundedly rational opponent, a player’s most effective strategy is not the equilibrium strategy, but the one that is the best reply to the opponent’s actual strategy. Against this background, this paper studies the bargaining behavior of boundedly rational agents by using genetic algorithms. Since bargaining involves players with different utility functions, we have two subpopulations – one represents the buyer, and the other represents the seller (i.e., the population is asymmetric). We study the competitive co-evolution of strategies in the two subpopulations for an incomplete information setting, and compare the results with those prescribed by game theory. Our analysis leads to two main conclusions. Firstly, our study shows that although each agent in the game-theoretic model has a strategy that is dominant at every period at which it makes a move, the stable state of the evolutionary model does not always match the game-theoretic equilibrium outcome. Secondly, as the players mutually adapt to each other’s strategy, the stable outcome depends on the initial population
-

A fuzzy constraint based model for bilateral, multi-issue negotiations in semi-competitive environments by Xudong Luo, Nicholas R. Jennings, Nigel Shadbolt, Ho-fung Leung, Jimmy Ho-man Lee. Artificial Intelligence, 2003.

This paper develops a fuzzy constraint based model for bilateral multi-issue negotiation in trading environments. In particular, we are concerned with the principled negotiation approach in which agents seek to strike a fair deal for both parties, but which, nevertheless, maximises their own payoff. Thus, there are elements of both competition and cooperation in the negotiation (hence semicompetitive environments). One of the key intuitions of the approach is that there is often more than one option that can satisfy the interests of both parties. So, if the opponent cannot accept an offer then the proponent should endeavour to find an alternative that is equally acceptable to it, but more acceptable to the opponent. That is, the agent should make a trade-off. Only if such a trade-off is not possible should the agent make a concession. Against this background, our model ensures the agents reach a deal that is fair (Pareto-optimal) for both parties if such a solution exists.Moreover, this is achieved by minimising the amount of private information that is revealed. The model uses prioritised fuzzy constraints to represent trade-offs between the different possible values of the negotiation issues and to indicate how concessions should be made when they are necessary. Also by using constraints to express negotiation proposals, the model can cover the negotiation space more efficiently since each exchange covers a region rather than a single point (which is what most existing models deal with). In addition, by incorporating the notion of a reward into our negotiation model, the agents can sometimes reach agreements that would not otherwise be possible.
-

Developing a Bidding Agent for Multiple Heterogeneous Auctions by P Anthony, NR Jennings. ACM TOIT, 2003.

Due to the proliferation of online auctions, there is an increasing need to monitor and bid in multiple auctions in order to procure the best deal for the desired good. To this end, this paper reports on the development of a heuristic decision making framework that an autonomous agent can exploit to tackle the problem of bidding across multiple auctions with varying start and end times and with varying protocols (including English, Dutch and Vickrey). The framework is flexible, configurable, and enables the agent to adopt varying tactics and strategies that attempt to ensure that the desired item is delivered in a manner consistent with the user’s preferences. Given this large space of possibilities, we employ a genetic algorithm to search (offline) for effective strategies in common classes of environment. The strategies that emerge from this evolution are then codified into the agent’s reasoning behaviour so that it can select the most appropriate strategy to employ in its prevailing circumstances. The proposed framework has been implemented in a simulated marketplace environment and its effectiveness has been empirically demonstrated.
-

Multi-Issue Negotiation Processes by Evolutionary Simulation, Validation and Social Extensions by ENRICO GERDING, DAVID VAN BRAGT and HAN LA POUTRE. Computational Economics, 2003.

We describe a system for bilateral negotiations in which artificial agents are generated by an evolutionary algorithm (EA). The negotiations are governed by a finite-horizon version of the alternating-offers protocol. Several issues are negotiated simulataneously. We first analyse and validate the outcomes of the evolutionary system, using the game-theoretic subgame-perfect equilibrium as a benchmark. We then present two extensions of the negotiation model. In the first extension agents take into account the fairness of the obtained payoff. We find that when the fairness norm is consistently applied during the negotiation, agents reach symmetric outcomes which are robust and rather insensitive to the actual fairness settings. In the second extension we model a competitive market situation where agents have multiple bargaining opportunities before reaching the final agreement. Symmetric outcomes are now also obtained, even when the number of bargaining opportunities is small. We furthermore study the influence of search or negotiation costs in this game
-

Applying Evolutionary Game Theory to Auction Mechanism Design by Andrew Byde. ACM-EC, 2003.

In this paper we describe an evolution-based method for evaluating auction mechanisms, and apply it to a space of mechanisms including the standard first- and second-price sealed bid auctions. We replicate results known already in the Auction Theory literature regarding the suitability of different mechanisms for different bidder environments, and extend the literature by establishing the superiority of novel mechanisms over standard mechanisms, for commonly occurring scenarios. Thus this paper simultaneously extends Auction Theory, and provides a systematic method for further such extensions.
-

Walverine: A walrasian trading agent by Shih-Fen Cheng, Evan Leung, Kevin M. Lochner, Kevin O’Malley, Daniel M. Reeves,Julian L. Schvartzman, Michael P. Wellman. AAMAS, 2003.

TAC-02 was the third in a series of Trading Agent Competition events fostering research in automating trading strategies by showcasing alternate approaches in an open-invitation market game. TAC presents a challenging travel-shopping scenario where agents must satisfy client preferences for complementary and substitutable goods by interacting through a variety of market types. Michigan’s entry, Walverine, bases its decisions on a competitive (Walrasian) analysis of the TAC travel economy. Using this Walrasian model, we construct a decision-theoretic formulation of the optimal bidding problem, which Walverine solves in each round of bidding for each good. Walverine’s optimal bidding approach, as well as several other features of its overall strategy, are potentially applicable in a broad class of trading environments.
-

The Importance of Ordering in Sequential Auctions by Wedad Elmaghraby. Management Science, 2003.

To date, the largest part of literature on multi-unit auctions has assumed that there are k homogeneous objects being auctioned, where each bidder wishes to win exactly one or all of k units. These modeling assumptions have made the examination of ordering in sequential auctions inconsequential. The aim of this paper is to introduce and highlight the critical influence that ordering can have on the efficiency of an auction. We study a buyer who outsources via sequential 2nd-price auctions two heterogeneous jobs, and faces a diverse set of suppliers with capacity constraints
-

Bargaining with Posterior Opportunities: An Evolutionary Social Simulation by E. H. Gerding, J.A. La Poutre. LNEMS, 2003.

Negotiations have been extensively studied theoretically throughout the years. A well-known bilateral approach is the ultimatum game, where two agents negotiate on how to split a pie or a "dollar": the proposer makes an offer and responder can choose to accept or reject. In this paper a natural extension of the ultimatum game is presented, in which both agents can negotiate with other opponents in case of a disagreement. This way the basics of a compe titive market are modelled where for instance a buyer can try several sellers before making a purchase decision. The game is investigated using an evolutionary simulation. The outcomes appear to depend largely on the information available to the agents. We find that if the agents' number of future bargaining opportunities is commonly known, the proposer has the advantage. Ifthis information is held private, however, the responder can obtain a larger share of the pie. For the first case we also provide a game-theoretic analysis and compare the outcome with evolutionary results. Furthermore, the effects of search costs and allowing multiple issues to be negotiated simultaneously are investigated.
-

SouthamptonTAC: An Adaptive Autonomous Trading Agent by MINGHUA HE, NICHOLAS R. JENNINGS. ACM Transactions on Internet Technology, 2003.

Software agents are increasingly being used to represent humans in on-line auctions. Such agents have the advantages of being able to systematically monitor a wide variety of auctions and then make rapid decisions about what bids to place in what auctions. They can do this continuously and repetitively without losing concentration. Moreover, in complex multiple auction settings, agents may need to modify their behavior in one auction depending on what is happening in another. To provide a means of evaluating and comparing (benchmarking) research methods in this area, the Trading Agent Competition (TAC) was established. This competition involves a number of agents bidding against one another in a number of related auctions (operating different protocols) to purchase travel packages for customers. Against this background, this artcle describes the design, implementation and evaluation of our adaptive autonomous trading agent, SouthamptonTAC, one of the most successful participants in TAC 2002
-

Why agents for automated negotiations should be adaptive by D.D.B. van Bragt, J.A. La Poutre. Netnomics, 2003.

We show that adaptive agents on the Internet can learn to exploit bidding agents who use a (limited) number of fixed strategies. These learning agents can be generated by adapting a special kind of finite automata with evolutionary algorithms (EAs). Our approach is especially powerful if the adaptive agent participates in frequently occurring micro-transactions, where there is sufficient opportunity for the agent to learn online from past negotiations. More in general, results presented in this paper provide a solid basis for the further development of adaptive agents for Internet applications.
-

K-implementation by Monderer D, Tennenholtz M. In Proceedings of the 4th ACM conference on Electronic Commerce, 2003.

This paper discusses an interested party who wishes to influence the behavior of agents in a game (multi-agent interaction), which is not under his control. The interested party cannot design a new game, cannot enforce agents' behavior, cannot enforce payments by the agents, and cannot prohibit strategies available to the agents. However, he can influence the outcome of the game by committing to non-negative monetary transfers for the different strategy profiles that may be selected by the agents. The interested party assumes that agents are rational in the commonly agreed sense that they do not use dominated strategies. Hence, a certain subset of outcomes is implemented in a given game if by adding non-negative payments, rational players will necessarily produce an outcome in this subset. Obviously, by making sufficiently big payments one can implement any desirable outcome. The question is what is the cost of implementation? In this paper we introduce the notion of k-implementation of a desired set of strategy profiles, where k stands for the amount of payment that need to be actually made in order to implement desirable outcomes. A major point in k-implementation is that monetary offers need not necessarily materialize when following desired behaviors. We define and study k-implementation in the contexts of games with complete and incomplete information. In the latter case we mainly focus on the VCG games. Our setting is later extended to deal with mixed strategies using correlation devices. Together, the paper introduces and studies the implementation of desirable outcomes by a reliable party who cannot modify game rules (i.e. provide protocols), complementing previous work in mechanism design, while making it more applicable to many realistic CS settings.
-

Online Learning of Aggregate Knowledge about Non-linear Preferences Applied to Negotiating Prices and Bundles by D.J.A. Somefun, T.B. Klos, J.A. La Poutre. ICEC, 2004.

In this paper, we consider a form of multi-issue negotiation where a shop negotiates both the contents and the price of bundles of goods with his customers. We present some key insights about, as well as a procedure for, locating mutually beneficial alternatives to the bundle currently under negotiation. The essence of our approach lies in combining aggregate (anonymous) knowledge of customer preferences with current data about the ongoing negotiation process. The developed procedure either works with already obtained aggregate knowledge or, in the absence of such knowledge, learns the relevant information online. We conduct computer experiments with simulated customers that have nonlinear preferences. We show how, for various types of customers, with distinct negotiation heuristics, our procedure (with and without the necessary aggregate knowledge) increases the speed with which deals are reached, as well as the number and the Pareto efficiency of the deals reached compared to a benchmark.
-

Price Prediction Strategies for Market-Based Scheduling by Jeffrey K. MacKie-Mason, Anna Osepayshvili, Daniel M. Reeves, Michael P. Wellman. AAAI, 2004.

In a market-based scheduling mechanism, the allocation of time-specific resources to tasks is governed by a competitive bidding process. Agents bidding for multiple, separately allocated time slots face the risk that they will succeed in obtaining only part of their requirement, incurring expenses for potentially worthless slots. We investigate the use of price prediction strategies to manage such risk. Given an uncertain price forecast, agents follow simple rules for choosing whether and on which time slots to bid. We find that employing price predictions can indeed improve performance over a straightforward baseline in some settings. Using an empirical game-theoretic methodology, we establish Nash equilibrium profiles for restricted strategy sets. This allows us to confirm the stability of price-predicting strategies, and measure overall efficiency. We further experiment with variant strategies to analyze the source of prediction’s power, demonstrate the existence of self-confirming predictions, and compare the performance of alternative prediction methods.
-

An Efficient Turnkey Agent for Repeated Trading with Overall Budget and Preferences by I.B. Vermeulen, D.J.A. Somefun, J.A. La Poutre. Cybernetics and Intelligent Systems, 2004.

For various e-commerce applications autonomous agents can do the actual trading on behalf of their users. We consider an agent whu trades rcpentcdly on hchalf of his uscr, given an overall budget and prcfcrcnccs per time stcp, both specified at thc start. For many e-commerce settings such an agent has limited computationaI resources, limited prior information concerning price fluctuations, and little time for online learning. We therefore develop an efficient heuristic that requires little prior information to work well from the start, even for very roughed nonsmooth problem instanccs. Extensive computer experiments conducted for a wide variety of customer preferences show virtually no difference in performance betwcen a dynamic prugramming (IW) approach and the developed heuristic carrying out the agent’s task. The DP approach ha$, however, thc important drawback of generally being too computationally intensive.
-

Market-based Recommendation: Agents that Compete for Consumer Attention by SANDER M. BOHTE, ENRICO GERDING, HAN LA POUTRE. ACM TOIT, 2004.

The amount of attention space available for recommending suppliers to consumers on e-commerce sites is typically limited. We present a competitive distributed recommendation mechanism based on adaptive software agents for efficiently allocating the “consumer attention space”, or banners. In the example of an electronic shopping mall, the task is delegated to the individual shops, each of which evaluates the information that is available about the consumer and his or her interests (e.g. keywords, product queries, and available parts of a profile). Shops make a monetary bid in an auction where a limited amount of “consumer attention space” for the arriving consumer is sold. Each shop is represented by a software agent that bids for each consumer. This allows shops to rapidly adapt their bidding strategy to focus on consumers interested in their offerings. For various basic and simple models for on-line consumers, shops, and profiles, we demonstrate the feasibility of our system by evolutionary simulations as in the field of agent-based computational economics (ACE). We also develop adaptive software agents that learn bidding-strategies, based on neural networks and strategy exploration heuristics. Furthermore, we address the commercial and technological advantages of this distributed market-based approach. The mechanism we describe is not limited to the example of the electronic shopping mall, but can easily be extended to other domains.
-

Bidding under Uncertainty: Theory and Experiments by Amy Greenwald, Justin Boyan. UAI, 2004.

This paper describes a study of agent bidding strategies, assuming combinatorial valuations for complementary and substitutable goods, in three auction environments: sequential auctions, simultaneous auctions, and the Trading Agent Competition (TAC) Classic hotel auction design, a hybrid of sequential and simultaneous auctions. The problem of bidding in sequential auctions is formulated as an MDP, and it is argued that expected marginal utility bidding is the optimal bidding policy. The problem of bidding in simultaneous auctions is formulated as a stochastic program, and it is shown by example that marginal utility bidding is not an optimal bidding policy, even in deterministic settings. Two alternative methods of approximating a solution to this stochastic program are presented: the first method, which relies on expected values, is optimal in deterministic environments; the second method, which samples the nondeterministic environment, is asymptotically optimal as the number of samples tends to infinity. Finally, experiments with these various bidding policies are described in the TAC Classic setting.
-

Automated Bilateral Bargaining about Multiple Attributes in a One-to-Many Setting by E.H. Gerding, D.J.A. Somefun, J.A. La Poutre. ICEC, 2004.

Negotiations are an important way of reaching agreements between selfish autonomous agents. In this paper we focus on one-to-many bargaining within the context of agentmediated electronic commerce. We consider an approach where a seller agent negotiates over multiple interdependent attributes with many buyer agents in a bilateral fashion. In this setting, “fairness,” which corresponds to the notion of envy-freeness in auctions, may be an important business constraint. For the case of virtually unlimited supply (such as information goods), we present a number of one-to-many bargaining strategies for the seller agent, which take into account the fairness constraint, and consider multiple attributes simultaneously. We compare the performance of the bargaining strategies using an evolutionary simulation, especially for the case of impatient buyers. Several of the developed strategies are able to extract almost all the surplus; they utilize the fact that the setting is one-to-many, even though bargaining is bilateral.
-

Coordinating multiple concurrent negotiations by Thuc Duong Nguyen, Nicholas R. Jennings. AAMAS, 2004.

To secure good deals, an agent may engage in multiple concurrent negotiations for a particular good or service. However for this to be effective, the agent needs to carefully coordinate its negotiations. At a basic level, such coordination should ensure the agent does not procure more of the good than is needed. But to really derive benefit from such an approach, the agent needs the concurrent encounters to mutually influence one another (e.g. a good price with one opponent should enable an agent to negotiate more strongly in the other interactions). To this end, this paper presents a novel heuristic model for coordinating multiple bilateral negotiations. The model is empirically evaluated and shown to be effective and robust in a range of negotiation scenarios.
-

Agent-Mediated Electronic Commerce by CARLES SIERRA. AAMAS, 2004.

Electronic commerce has been one of the traditional arenas for agent technology. The complexity of these applications has been a challenge for researchers that have developed methodologies, products and systems, having in mind the specificities of trade, the interaction particularities of commerce, the strict notion of commitment and contract, and the clearly shaped conventions and norms that structure the field. In this paper I survey some key areas for agent technology which, although general, are of special importance in electronic commerce, namely, solid development methodologies, negotiation technologies and trust-building mechanisms. I give examples of systems in which I have directly participated, although I also try to refer to the work of other AgentLink Special Interest Group members over the last few years
-

Bilateral Bargaining in a One-to-Many Bargaining Setting by Enrico Gerding, Koye Somefun, Han La Poutre. AAMAS, 2004.

Electronic markets are becoming increasingly transparent with low search cost, strong price competition, and low margins. Automated negotiation enables a business to go beyond price competition. Through the use of autonomous agents, which negotiate on behalf of their owners, a business can obtain flexibility in prices and goods, distinguish between groups of buyers based on their preferences, and even personalize complex goods according to the demands of individual buyers without significantly increasing transaction costs. We focus here on one-to-many bargaining, where a seller agent negotiates, on behave of a seller, with many buyer agents individually in a bilateral fashion. In many cases, auctions can be used to effectively organize one-to-many bargaining. For various situations, however, auctions may not be the preferred protocol for bargainers. In situations of, for example, flexible or virtually unlimited supply, multiple issues, and/or continuous sale the appropriate auction protocol becomes, at best, much more complex. Consequently, businesses may opt for the intuitive and flexible bilateral bargaining protocol, where the seller agent negotiates bilaterally with one or more buyers simultaneously by exchanging offers and counter offers.
-

Automated Negotiation and Bundling of Information Goods by Koye Somefun, Enrico Gerding, Sander Bohte, Han La Poutre. LNAI, 2004.

In this paper, we present a novel system for selling bundles of news items. Through the system, customers bargain with the seller over the price and quality of the delivered goods. The advantage of the developed system is that it allows for a high degree of flexibility in the price, quality, and content of the offered bundles. The price, quality, and content of the delivered goods may, for example, differ based on daily dynamics and personal interest of customers. Autonomous “software agents” execute the negotiation on behalf of the users of the system. To perform the actual negotiation these agents make use of bargaining strategies. We present the novel approach of decomposing bargaining strategies into concession strategies and Pareto efficient search strategies. Additionally, we introduce the orthogonal and orthogonal-DF strategy: two Pareto search strategies. We show through computer experiments that the use of these Pareto search strategies will result in very efficient bargaining outcomes. Moreover, the system is setup such that it is actually in the best interest of the customer to have their agent adhere to this approach of disentangling the bargaining strategy.
-

Self-Confirming Price Prediction for Bidding in Simultaneous Ascending Auctions by Anna Osepayshvili, Michael P. Wellman, Daniel M. Reeves, Jeffrey K. MacKie-Mason. UAI, 2005.

Simultaneous ascending auctions present agents with the exposure problem: bidding to acquire a bundle risks the possibility of obtaining an undesired subset of the goods. Auction theory provides little guidance for dealing with this problem. We present a new family of decision-theoretic bidding strategies that use probabilistic predictions of final prices. We focus on selfconfirming price distribution predictions, which by definition turn out to be correct when all agents bid decision-theoretically based on them. Bidding based on these is provably not optimal in general, but our experimental evidence indicates the strategy can be quite effective compared to other known methods.
-

Learning the Structure of Utility Graphs Used in Multi-Issue Negotiation through Collaborative Filtering by Valentin Robu, Han La Poutre. PRIMA, 2005.

Graphical utility models represent powerful formalisms for modeling complex agent decisions involving multiple issues. In the context of negotiation, it has been shown that using utility graphs enables reaching Paretoefficient agreements with a limited number of negotiation steps, even for highdimensional negotiations involving complex complementarity/ substitutability dependencies between multiple issues. This paper considerably extends the results of Valentin et al., (2005), by proposing a method for constructing the utility graphs of buyers automatically, based on previous negotiation data. Our method is based on techniques inspired from item-based collaborative filtering, used in online recommendation algorithms. Experimental results show that our approach is able to retrieve the structure of utility graphs online, with a high degree of accuracy, even for highly non-linear settings and even if a relatively small amount of data about concluded negotiations is available.
-

Modeling Complex Multi-Issue Negotiations Using Utility Graphs by Valentin Robu, D.J.A. Somefun, J.A. La Poutre. AAMAS, 2005.

This paper presents an agent strategy for complex bilateral negotiations over many issues with inter-dependent valuations. We use ideas inspired by graph theory and probabilistic influence networks to derive efficient heuristics for negotiations about multiple issues. Experimental results show — under relatively weak assumptions with respect to the structure of the utility functions – that the developed approach leads to Pareto-efficient outcomes. Moreover, Pareto-efficiency can be reached with few negotiation steps, because we explicitly model and utilize the underlying graphical structure of complex utility functions. Consequently, our approach is applicable to domains where reaching an efficient outcome in a limited amount of time is important. Furthermore, unlike other solutions for highdimensional negotiations, the proposed approach does not require a mediator.
-

Exploring bidding strategies for market-based scheduling by Daniel M. Reeves, Michael P. Wellman, Jeffrey K. MacKie-Mason, Anna Osepayshvili. Decision Support Systems, 2005.

A market-based scheduling mechanism allocates resources indexed by time to alternative uses based on the bids of participating agents. Agents are typically interested in multiple time slots of the schedulable resource, with value determined by the earliest deadline by which they can complete their corresponding tasks. Despite the strong complementarities among slots induced by such preferences, it is often infeasible to deploy a mechanism that coordinates allocation across all time slots. We explore the case of separate, simultaneous markets for individual time slots, and the strategic problem it poses for bidding agents. Investigation of the straightforward bidding policy and its variants indicates that the efficacy of particular strategies depends critically on preferences and strategies of other agents, and that the strategy space is far too complex to yield to general game-theoretic analysis. For particular environments, however, it is often possible to derive constrained equilibria through evolutionary search methods
-

Efficient Methods for Automated Multi-Issue Negotiation: Negotiating over a Two-Part Tariff by D.J.A. Somefun, E.H. Gerding, J.A. La Poutre. International Journal of Intelligent Systems, 2006.

In this article, we consider the novel approach of a seller and customer negotiating bilaterally about a two-part tariff, using autonomous software agents. An advantage of this approach is that win–win opportunities can be generated while keeping the problem of preference elicitation as simple as possible. We develop bargaining strategies that software agents can use to conduct the actual bilateral negotiation on behalf of their owners. We present a decomposition of bargaining strategies into concession strategies and Pareto-efficient-search methods: Concession and Paretosearch strategies focus on the conceding and win–win aspect of bargaining, respectively. An important technical contribution of this article lies in the development of two Pareto-search methods. Computer experiments show, for various concession strategies, that the respective use of these two Pareto-search methods by the two negotiators results in very efficient bargaining outcomes while negotiators concede the amount specified by their concession strategy
-

A Heuristic Bidding Strategy for Buying Multiple Goods in Multiple English Auctions by MINGHUA HE, NICHOLAS R. JENNINGS, ADAM PRUGEL-BENNETT. ACM Transactions on Internet Technology, 2006.

This paper presents the design, implementation, and evaluation of a novel bidding algorithm that a software agent can use to obtain multiple goods from multiple overlapping English auctions. Specifically, an Earliest Closest First heuristic algorithm is proposed that uses neurofuzzy techniques to predict the expected closing prices of the auctions and to adapt the agent’s bidding strategy to reflect the type of environment in which it is situated. This algorithm first identifies the set of auctions that are most likely to give the agent the best return and then, according to its attitude to risk, it bids in some other auctions that have approximately similar expected returns, but which finish earlier than those in the best return set. We show through empirical evaluation against a number of methods proposed in the multiple auction literature that our bidding strategy performs effectively and robustly in a wide range of scenarios.
-

The winner determination problem by Lehmann D, Müller R, Sandholm T. Combinatorial auctions, 2006.

This part of the book gives a comprehensive overview of the computational challenges in solving the winner determination problem (WDP): given a set of bids in a combinatorial auction, find an allocation of items to bidders (the auctioneer can keep some of the items) that maximizes the auctioneer’s revenue. The bids are expressions in a bidding language, by which bidders report valuations for subsets of items (see Nisan (Chapter 9)). The auctioneer’s revenue is maximized by choosing an allocation that maximizes the sum, over all bidders, of the bidders’ valuations for the subset of items that they receive.
-

Bounded Rationality

On complexity as bounded rationality by C.H. Papadimitriou, M. Yannakakis. STOC, 1994.

It has been hoped that computational approaches can help resolve some well-known paradoxes in game theory. We prove that tf the repeated prisoner’s dilemma M played by finite automata with less than exponentially (in the number of rounds) many states, then cooperation can be achieved an equilibrium (while with exponentially many states, defection is the only equilibrium). We furthermore prove a generalization to arbitrary games and Pareto optimal points. Finally, we present a general model of polynomially computable games, and characterize in terms of fami!iar complexity classes ranging from NP to NEXP the natural problems that arise in relation with such games.
-

Inductive Reasoning, Bounded Rationality and the Bar Problem by W. Brian Arthur. American Economic Review, 1994.

This paper draws on modem psychology to argue that as humans, in economic decision contexts that are complicated or ill-defined, we use not deductive, but inductive reasoning. That is, in such contexts we induce a variety of working hypotheses or mental models, act upon the most credible, and replace hypotheses with new ones if they cease to work. Inductive reasoning leads to a rich psychological world in which an agent's hypotheses or mental models compete for survival against each other, in an environment formed by other other agents' hypotheses or mental models-a world that is both evolutionary and complex. Inductive reasoning can be modeled in a variety of ways. The main body of the paper introduces and models a coordination problem-"the bar problem"-in which agents' expectations are forced to be subjective and to differ. It shows that while agents' beliefs never settle down, collectively they form an "ecology" that does converge to an equilibrium pattern.
-

Modeling Bounded Rationality by Ariel Rubinstein. MIT Press, 1998.

The notion of bounded rationality was initiated in the 1950s by Herbert Simon; only recently has it influenced mainstream economics. In this book, Ariel Rubinstein defines models of bounded rationality as those in which elements of the process of choice are explicitly embedded. The book focuses on the challenges of modeling bounded rationality, rather than on substantial economic implications. In the first part of the book, the author considers the modeling of choice. After discussing some psychological findings, he proceeds to the modeling of procedural rationality, knowledge, memory, the choice of what to know, and group decisions.In the second part, he discusses the fundamental difficulties of modeling bounded rationality in games. He begins with the modeling of a game with procedural rational players and then surveys repeated games with complexity considerations. He ends with a discussion of computability constraints in games. The final chapter includes a critique by Herbert Simon of the author's methodology and the author's response.
-

On the impossibility of predicting the behavior of rational agents by Dean P. Foster, H. Peyton Young. PNAS, 2001.

A foundational assumption in economics is that people are rational: they choose optimal plans of action given their predictions about future states of the world. In games of strategy this means that each player’s strategy should be optimal given his or her prediction of the opponents’ strategies. We demonstrate that there is an inherent tension between rationality and prediction when players are uncertain about their opponents’ payoff functions. Specifically, there are games in which it is impossible for perfectly rational players to learn to predict the future behavior of their opponents (even approximately) no matter what learning rule they use. The reason is that in trying to predict the next-period behavior of an opponent, a rational player must take an action this period that the opponent can observe. This observation may cause the opponent to alter his next-period behavior, thus invalidating the first player’s prediction. The resulting feedback loop has the property that, a positive fraction of the time, the predicted probability of some action next period differs substantially from the actual probability with which the action is going to occur. We conclude that there are strategic situations in which it is impossible in principle for perfectly rational agents to learn to predict the future behavior of other perfectly rational agents based solely on their observed actions.
-

Collective Intelligence

Knowledge and common knowledge in a distributed environment by Halpern JY, Moses Y. Journal of the ACM (JACM), 1990.

Reasoning about knowledge seems to play a fundamental role in distributed systems. Indeed, such reasoning is a central part of the informal intuitive arguments used in the design of distributed protocols. Communication in a distributed system can be viewed as the act of transforming the system’s state of knowledge. This paper presents a general framework for formalizing and reasoning about knowledge in distributed systems. It is shown that states of knowledge of groups of processors are useful concepts for the design and analysis of distributed protocols. In particular, distributed knowledge corresponds to knowledge that is “distributed” among the members of the group, while common knowledge corresponds to a fact being “publicly known.” The relationship between common knowledge and a variety of desirable actions in a distributed system is illustrated. Furthermore, it is shown that, formally speaking, in practical systems common knowledge cannot be attained. A number of weaker variants of common knowledge that are attainable in many cases of interest are introduced and investigated.
-

Using Collective Intelligence to Route Internet Traffic by David H. Wolpert, Kagan Turner, Jeremy Frank. NeurIPS, 1998.

A COllective INtelligence (COIN) is a set of interacting reinforcement learning (RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments using that theory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms.
-

General Principles of Learning-Based Multi-Agent Systems by David H. Wolpert, Kevin R. Wheeler, Kagan Turner. International Conference on Autonomous Agents, 1999.

We consider the problem of how to design large decentralized multi-agent systems (MAS’s) in an automated fashion, with little or no hand-tuning. Our approach has each agent run a reinforcement learning algorithm. This converts the problem into one of how to automatically set/update the reward functions for each of the agents so that the global goal is achieved. In particular we don not want the agents to “work at cross-purposes” as far as the global goal is concerned. We use the term artificial COllective INtelligence (COIN) to refer to systems that embody solutions to this problem. In this paper we present a summary of a mathematical framework for COINS. We then investigate the real-world applicability of the core concepts of that framework via two computer experiments: we show that our COINS perform near optimally in a difficult variant of Arthur’s bar problem [l] (and in psrtitular avoid the tragedy of the commons for that problem), and we also illustrate optimal performance for our COINS in the leader-follower problem.
-

Optimal Payoff Functions for Members of Collectives by David H. Wolpert, Kagan Tumer. Advances in Complex Systems, 2001.

We consider the problem of designing (perhaps massively distributed) collectives of computational processes to maximize a provided "world utility" function. We consider this problem when the behavior of each process in the collective can be cast as striving to maximize its own payoff utility function. For such cases the central design issue is how to initialize/update those payoff utility functions of the individual processes so as to induce behavior of the entire collective having good values of the world utility. Traditional "team game" approaches to this problem simply assign to each process the world utility as its payoff utility function. In previous work we used the "Collective Intelligence" (COIN) framework to derive a better choice of payoff utility functions, one that results in world utility performance up to orders of magnitude superior to that ensuing from the use of the team game utility. In this paper, we extend these results using a novel mathematical framework. Under that new framework we review the derivation of the general class of payoff utility functions that both (i) are easy for the individual processes to try to maximize, and (ii) have the property that if good values of them are achieved, then we are assured a high value of world utility. These are the "Aristocrat Utility" and a new variant of the "Wonderful Life Utility" that was introduced in the previous COIN work. We demonstrate experimentally that using these new utility functions can result in significantly improved performance over that of previously investigated COIN payoff utilities, over and above those previous utilities' superiority to the conventional team game utility. These results also illustrate the substantial superiority of these payoff functions to perhaps the most natural version of the economics technique of "endogenizing externalities."
-

Phe-Q: A Pheromone Based Q-Learning by Ndedi Monekosso, Paolo Remagnino. LNAI, 2001.

Biological systems have often provided inspiration for the design of artificial systems. On such example of a natural system that has inspired researchers is the ant colony. In this paper an algorithm for multi-agent reinforcement learning, a modified Q-learning, is proposed. The algorithm is inspired by the natural behaviour of ants, which deposit pheromones in the environment to communicate. The benefit besides simulating ant behaviour in a colony is to design complex multiagent systems. Complex behaviour can emerge from relatively simple interacting agents. The proposed Q-learning update equation includes a belief factor. The belief factor reflects the confidence the agent has in the pheromone detected in its environment. Agents communicate implicitly to co-operate in learning to solve a path-planning problem. The results indicate that combining synthetic pheromone with standard Q-learning speeds up the learning process. It will be shown that the agents can be biased towards a preferred solution by adjusting the pheromone deposit and evaporation rates.
-

Swarm Intelligence by Eric Bonabeau. Emergent technology conference, 2003.

Ants, Bees or Termites - all social insects - show impressive collective problem-solving capabilities. Properties associated with their group behaviour like self-organisation, robustness and flexibility are seen as characteristics that artificial systems for optimisation, control or task execution should exhibit. In the last decade, diverse efforts have been made to take social insects as an example and develop algorithms inspired by their strictly self-organised behaviour. These approaches can be subsumed under the concept of "Swarm Intelligence".
-

Multi-type ACO for Light Path Protection by Peter Vrancx, Ann Nowe, Kris Steenhaut. LAMAS, 2005.

Backup trees (BTs) are a promising approach to network protection in optical networks. BTs allow us to protect a group of working paths against single network failures, while reserving only a minimum amount of network capacity for backup purposes. The process of constructing a set of working paths together with a backup tree is computationally very expensive, however. In this paper we propose a multi-agent approach based on ant colony optimization (ACO) for solving this problem. ACO algorithms use a set of relatively simple agents that model the behavior of real ants. In our algorithm multiple types of ants are used. Ants of the same type collaborate, but are in competition with the ants of other types. The idea is to let each type find a path in the network that is disjoint with that of other types. We also demonstrate a preliminary version of this algorithm in a series of simple experiments.
-

A Primer on Multiple Intelligences by Matthew N. O. Sadiku, Sarhan M. Musa. Springer, 2021.

The concept of intelligence is interesting and is discussed every day. It has been central to the feld of psychology and remains hotly debated. It was frst introduced by Francis Galton in 1885. Intelligence or cognitive development is a biopsychological potential to process information that can be activated to solve problems. It is the capacity of the individual to act purposefully, to think rationally, and to deal effectively with his environment. The characteristic of intelligence is usually attributed to humans. Traditionally, intelligence is often regarded as a person’s intellectual capacity; something one is born with and that cannot be changed. It is the ability to learn, to proft from experience, the ability to adapt to new situations, and the ability to solve problems. It was generally believed that intelligence was a single entity that was inherited. It is fxed and can be measured. Historically, intelligence has been measured using the IQ test which is a general measure of one’s cognitive function. Other views of intelligence have emerged in recent years. Today, an increasing number of researchers, believe that there exists a multiple of intelligences, quite independent of each other. People have a unique blend of intelligences. The concept of multiple intelligences represents an effort to reframe the traditional conception of intelligence. Professor Howard Gardner at Harvard’s Graduate School of Education argued that there are better or alternative ways to measure intelligence than standard IQ tests. He frst proposed nine intelligences and insisted that all people are born with one or more intelligences. He believed human intelligence was multidimensional. This spectrum of intelligence indicates that people can be smart in a number of different ways. It implies that we are all intelligent in different ways, but there is always a primary, or more dominant, intelligence in each of us. Multiple intelligences theory states that everyone has all intelligences at varying degrees of profciency, while most will experience more dominant intelligences that impact the way they learn and interact with others. Each type of intelligence presents a distinct component of our total competence. There is no clear consensus about how many intelligences there are. In this book, we consider 19 multiple intelligences. The intelligences are possessed by everyone and vary in degree of development within each individual.
-

Communication

Learning in Distributed Systems and Multiagent Environments by P. Brazdil, M. Gams, S. Sian, L. Torgo, W. van de Velde . LNAI, 1991.

The paper begins with the discussion on why we should be concerned with machine learning in the context of distributed AI. The rest of the paper is dedicated to various problems of multiagent learning. First, a common framework for comparing different existing systems is presented. It is pointed out that it is useful to distinguish when the individual agents communicate. Some systems communicate during the learning phase, others during the problem solving phase, for example. It is also important to consider how, that is in what language, the communication is established. The paper analyses several systems in this framework. Particular attention is paid to previous work done by the authors in this area. The paper covers use of redundant knowledge, knowledge integration, evaluation of hypothesis by a community of agents and resolution of language differences between agents.
-

Altruism in the Evolution of Communication by David H. Ackley, Michael L. Littman. Workshop on the Synthesis and Simulation of Living Systems, 1994.

Computer models of evolutionary phenomena often assume that the fitness of an individual can be evaluated in isolation, but effective communication requires that individuals interact. Existing models directly reward speakers for improved behavior on the part of the listeners so that, essentially, effective communication is fitness. We present new models in which, even though "speaking truthfully" provides no tangible benefit to the speaker, effective communication nonetheless evolves. A large population is spatially distributed so that "communication range" approximately correlates with "breeding range" so that most of the time "you'll be talking to family" allowing kin selection to encourage the emergence of communication. However, the emergence of altruistic communication also creates niches that can be exploited by "information parasites." The new models display complex and subtle long-tenD dynamics as the global implications of such social dilemmas are played out
-

A self-organizing spatial vocabulary by L. Steels. Artificial Life, 1995.

Language is a shared set of conventions for mapping meanings to utterances. This paper explores self-organization as the primary mechanism for the formation of a vocabulary. It reports on a computational experiment in which a group of distributed agents develop ways to identify each other using names or spatial descriptions. It is also shown that the proposed mechanism copes with the acquisition of an existing vocabulary by new agents entering the community and with an expansion of the set of meanings.
-

Self-organising vocabularies by L. Steels. In Proceedings of Artificial Life, 1996.

The paper investigates a mechanism by which distributed agents spontaneously and autonomously develop a common vocabulary. The vocabulary is open in the sense that new agents and new meaning may be added at any time. Self-organisation plays a critical role for achieving coherence.
-

Emergent adaptive lexicons by L. Steels. In P. Maes, editor, Proceedings of the Simulation of Adaptive Behavior Conference. MIT Press, 1996.

The paper reports experiments to test the hypothesis that language is an autonomous evolving adaptive system maintained by a group of distributed agents without central control. The experiments show how a coherent lexicon may spontaneously emerge in a group of agents engaged in language games and how a lexicon may adapt to cope with new meanings that arise or new agents that enter the group. The lexicon has several characteristics of natural language lexicons, such as polysemy, synonymy and ambiguity.
-

Synthesising the origins of language and meaning using co-evolution, self-organisation and level formation. by L. Steels. Edinburgh University Press, 1997.

The paper reports on experiments in which robotic agents and software agents are set up to originate language and meaning. The experiments test the hypothesis that mechanisms for generating complexity commonly found in biosystems, in particular self-organisation, co-evolution, and level formation, also may explain the spontaneous formation, adaptation, and growth in complexity of language.
-

Collective learning and semiotic dynamics by L. Steels. Artificial Life, 1999.

We report on a case study in the emergence of a lexicon in a group of autonomous distributed agents situated and grounded in an open environment. Because the agents are autonomous, grounded, and situated, the possible words and possible meanings are not fixed but continuously change as the agents autonomously evolve their communication system and adapt it to novel situations. The case study shows that a complex semiotic dynamics unfolds and that generalisations present in the language are due to processes outside the agent.
-

The puzzle of language evolution by L. Steels. Kognitionswissenschaft, 2000.

Linguistics must again concentrate on the evolutionary nature of language, so that language models are more realistic with respect to human natural languages and have a greater explanatory force. Multi-agent systems are proposed as a possible route to develop such evolutionary models and an example is given of a concrete experiment in the origins and evolution of word-meaning based on a multi-agent approach.
-

Evolving Communication without Dedicated Communication Channels by Matt Quinn. ECAL, 2001.

Artificial Life models have consistently implemented communication as an exchange of signals over dedicated and functionally isolated channels. I argue that such a feature prevents models from providing a satisfactory account of the origins of communication and present a model in which there are no dedicated channels. Agents controlled by neural networks and equipped with proximity sensors and wheels are presented with a co-ordinated movement task. It is observed that functional, but non-communicative, behaviours which evolve in the early stages of the simulation both make possible, and form the basis of, the communicative behaviour which subsequently evolves.
-

Communication requirements of VCG-like mechanisms in convex environments by Johari R, Tsitsiklis JN. In Proceedings of Allerton Conference, 2005.

We develop VCG-like mechanisms for resource allocation environments where the players have concave utility functions, and the resource constraints can be represented through convex inequality constraints; multicommodity flow problems are a prime example. Unlike VCG mechanisms that require each player to communicate an entire utility function, our mechanisms only require each player to communicate a single scalar quantity. Despite the limited communication, we establish the existence of an efficient Nash equilibrium. Under some further assumptions, we also establish that all Nash equilibria are efficient. Our approach defines an entire family of resource allocation mechanisms; as a special case, we recover the class of mechanisms recently introduced by Yang and Hajek for a single resource.
-

Convergent Learning

Sequential equilibria by Kreps DM, Wilson R. Econometrica: Journal of the Econometric Society, 1982.

We propose a new criterion for equilibria of extensive games, in the spirit of Selten's perfectness criteria. This criterion requires that players' strategies be sequentially rational: every decision must be part of an optimal strategy for the remainder of the game. This entails specification of players' beliefs concerning how the game has evolved for each information set, including information sets off the equilibrium path. The properties of sequential equilibria are developed; in particular, we study the topological structure of the set of sequential equilibria. The connections with Selten's trembling-hand perfect equilibria are given.
-

Equilibrium points of bimatrix games by Lemke CE, Howson, Jr JT. Journal of the Society for industrial and Applied Mathematics, 1964.

An algebraic proof is given of the existence of equilibrium points for bimatrix (or two-person, non-zero-sum) games. The proof is constructive, leading to an efficient scheme for computing an equilibrium point. In a nondegenerate case, the number of equilibrium points is finite and odd. The proof is valid for any ordered field.
-

The equivalence of strong positive association and strategy-proofness by Muller E, Satterthwaite MA. Journal of Economic Theory, 1977.

Consider a group that must select one alternative from a set of three or more alternatives. Each member casts a ballot that the voting procedure counts. For a given alternative X, let two ballot profiles C and D have the property that if a member ranks alternative x above alternative y within C, then he also ranks x above that y within D. Strong positive association requires that if the voting procedure selects x when the profile is C, then it must also select x when the profile is D. We prove that strong positive association is equivalent to strategy-proofness. It therefore follows that no voting procedure exists that satisfies strong positive association, nondictatorship, and citizens’ sovereignty.
-

On total functions, existence theorems and computational complexity by Megiddo N, Papadimitriou CH. Theoretical Computer Science, 1991.

Nondeterministic multivalued functions with values that are polynomially verifiable and guaranteed to exist form an interesting complexity class between P and NP. We show that this class, which we call TFNP, contains a host of important problems, whose membership in P is currently not known. These include, besides factoring, local optimization, Brouwer's fixed points, a computational version of Sperner's Lemma, bimatrix equilibria in games, and linear complementarity for P-matrices.
-

Learning Mixed Equilibria by Drew Fundenberg, David M. Kreps. Journal of Games and Economic Behvaior, 1993.

We study learning processes for finite strategic-form games, in which players use the history of past play to forecast play in the current period. In a generalization of fictitious play, we assume only that players asymptotically choose best responses to the historical frequencies of opponents′ past play. This implies that if the stage-game strategies converge, the limit is a Nash equilibrium. In the basic model, plays seems unlikely to converge to a mixed-strategy equilibrium, but such convergence is natural when the stage game is perturbed in the manner of Harsanyi′s purification theorem.
-

Rational learning leads to nash equilibrium by Ehud Kalai, Ehud Lehrer. Econometrica, 1993.

Each of n players, in an infinitely repeated game, starts with subjective beliefs about his opponents' strategies. If the individual beliefs are compatible with the true strategies chosen, then Bayesian updating will lead in the long run to accurate prediction of the future play of the game. It follows that individual players, who know their own payoff matrices and choose strategies to maximize their expected utility, must eventually play according to a Nash equilibrium of the repeated game. An immediate corollary is that, when playing a Harsanyi-Nash equilibrium of a repeated game of incomplete information about opponents' payoff matrices, players will eventually play a Nash equilibrium of the real game, as if they had complete information
-

A generalized reinforcement-learning model: Convergence and applications by Michael L. Littman, C. Szepesvari. ICML, 1996.

Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (MDP) model is a popular way of formalizing the reinforcement learning problem but it is by no means the only way. In this paper we show how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes MDPs two-player games and MDPs under a worst-case optimality criterion as special cases. The basis of this extension is a stochastic approximation theorem that reduces asynchronous convergence to synchronous con
-

Computation of equilibria in finite games by McKelvey RD, McLennan A. Handbook of computational economics, 1996.

We review the current state of the art of methods for numerical computation of Nash equilibria for finite n-person games. Classical path following methods, such as the Lemke-Howson algorithm for two person games, and Scarf-type fixed point algorithms for n-person games provide globally convergent methods for finding a sample equilibrium. For large problems, methods which are not globally convergent, such as sequential linear complementarity methods may be preferred on the grounds of speed. None of these methods are capable of characterizing the entire set of Nash equilibria. More computationally intensive methods, which derive from the theory of semi-algebraic sets are required for finding all equilibria. These methods can also be applied to compute various equilibrium refinements.
-

Charging and rate control for elastic traffic by Kelly F. European transactions on Telecommunications, 1997.

This paper addresses the issues of charging, rate control and routing for a communication network carrying elastic traffic, such as an ATM network offering an available bit rate service. A model is described from which max-min fairness of rates emerges as a limiting special case; more generally, the charges users are prepared to pay influence their allocated rates. In the preferred version of the model, a user chooses the charge per unit time that the user will pay; thereafter the user’s rate is determined by the network according to a proportional fairness criterion applied to the rate per unit charge. A system optimum is achieved when users’ choices of charges and the network’s choice of allocated rates are in equilibrium.
-

The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems by Caroline Claus, Craig Boutilier. AAAI, 1998.

Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting. We first distinguish reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study (a simple form of) Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of that game structure and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria. We then propose alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.
-

Conjectural Equilibrium in Multiagent Learning by MICHAEL P. WELLMAN, JUNLING HU. Machine Learning, 1998.

Learning in a multiagent environment is complicated by the fact that as other agents learn, the environment effectively changes. Moreover, other agents’ actions are often not directly observable, and the actions taken by the learning agent can strongly bias which range of behaviors are encountered. We define the concept of a conjectural equilibrium, where all agents’ expectations are realized, and each agent responds optimally to its expectations. We present a generic multiagent exchange situation, in which competitive behavior constitutes a conjectural equilibrium. We then introduce an agent that executes a more sophisticated strategic learning strategy, building a model of the response of other agents. We find that the system reliably converges to a conjectural equilibrium, but that the final result achieved is highly sensitive to initial belief. In essence, the strategic learner’s actions tend to fulfill its expectations. Depending on the starting point, the agent may be better or worse off than had it not attempted to learn a model of the other agents at all.
-

Predicting how people play games: reinforcement leaning in experimental games with unique, mixed strategy equilibria by Ido Erev, Alvin E. Roth. The American Economic Review, 1998.

We examine learning in all experiments we could locate involving 100 periods or more of games with a unique equilibrium in mixed strategies, and in a new experiment. We study both the ex post ("best fit") descriptive power of learning models, and their ex ante predictive power, by simulating each experiment using parameters estimated from the other experiments. Even a one-parameter reinforcement learning model robustly outperforms the equilibrium predictions. Predictive power is improved by adding "forgetting" and "experimentation," or by allowing greater rationality as in probabilistic fictitious play. Implications for developing a low-rationality, cognitive game theory are discussed.
-

Worst-case equilibria by Koutsoupias E, Papadimitriou C. In Annual symposium on theoretical aspects of computer science, 1999.

In a system in which noncooperative agents share a common resource, we propose the ratio between the worst possible Nash equilibrium and the social optimum as a measure of the effectiveness of the system. Deriving upper and lower bounds for this ratio in a model in which several agents share a very simple network leads to some interesting mathematics, results, and open problems.
-

Nash Convergence of Gradient Dynamics in General-Sum Games by Satinder Singh, Michael Kearns, Yishay Mansour. UAI, 2000.

Multi-agent games are becoming an increasingly prevalent formalism for the study of electronic commerce and auctions. The speed at which transactions can take place and the growing complexity of electronic marketplaces makes the study of computationally simple agents an appealing direction. In this work, we analyze the behavior of agents that incrementally adapt their strategy through gradient ascent on expected payoff, in the simple setting of two-player, two-action, iterated general-sum games, and present a surprising result. We show that either the agents will converge to a Nash equilibrium, or if the strategies themselves do not converge, then their average payoffs will nevertheless converge to the payoffs of a Nash equilibrium
-

A simple adaptive procedure leading to correlated equilibrium by Hart S, Mas‐Colell A. Econometrica, 2000.

We propose a new and simple adaptive procedure for playing a game: ‘‘regret-matching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to measures of regret for not having used other strategies in the past. It is shown that our adaptive procedure guarantees that, with probability one, the empirical distributions of play converge to the set of correlated equilibria of the game.
-

Game Networks by La Mura, P. UAI, 2000.

We introduce Game networks (G nets), a novel representation for multi-agent decision problems. Compared to other game-theoretic representations, such as strategic or extensive forms, G nets are more structured and more compact; more fundamentally, G nets constitute a computationally advantageous framework for strategic inference, as both probability and utility independencies are captured in the structure of the network and can be exploited in order to simplify the inference process. An important aspect of multi-agent reasoning is the identification of some or all of the strategic equilibria in a game; we present original convergence methods for strategic equilibrium which can take advantage of strategic separabilities in the G net structure in order to simplify the computations. Specifically, we describe a method which identifies a unique equilibrium as a function of the game payoffs, and one which identifies all equilibria.
-

Rational and Convergent Learning in Stochastic Games by Michael Bowling, Manuela Veloso. 2001.

This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: “learn quickly while losing, slowly while winning.” The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.
-

Playing is believing: the role of beliefs in multi-agent learning by Yu-Han Chang, Leslie Pack Kaelbling. NeurIPS, 2001.

We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms and discuss some insights that can be gained. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the long-run against fair opponents.
-

Social agents playing a periodical policy by Ann Now´e, Johan Parent, and Katja Verbeeck. ECML, 2001.

Coordination is an important issue in multiagent systems. Within the stochastic game framework this problem translates to policy learning in a joint action space. This technique however suffers some important drawbacks like the assumption of the existence of a unique Nash equilibrium and synchronicity, the need for central control, the cost of communication, etc. Moreover in general sum games it is not always clear which policies should be learned. Playing pure Nash equilibria is often unfair to at least one of the players, while playing a mixed strategy doesn’t give any guarantee for coordination and usually results in a sub-optimal payoff for all agents. In this work we show the usefulness of periodical policies, which arise as a side effect of the fairness conditions used by the agents. We are interested in games which assume competition between the players, but where the overall performance can only be as good as the performance of the poorest player. Players are social distributed reinforcement learners, who have to learn to equalize their payoff. Our approach is illustrated on synchronous one-step games as well as on asynchronous job scheduling games.
-

Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games by Xiaofeng Wang, Tuomas Sandholm. NeurIPS, 2002.

Multiagent learning is a key problem in AI. In the presence of multiple Nash equilibria, even agents with non-conflicting interests may not be able to learn an optimal coordination policy. The problem is exaccerbated if the agents do not know the game and independently receive noisy payoffs. So, multiagent reinforfcement learning involves two interrelated problems: identifying the game and learning to play. In this paper, we present optimal adaptive learning, the first algorithm that converges to an optimal Nash equilibrium with probability 1 in any team Markov game. We provide a convergence proof, and show that the algorithm’s parameters are easy to set to meet the convergence conditions.
-

Multiagent learning using a variable learning rate by Michael Bowling, Manuela Veloso. Artificial Intelligence, 2002.

Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents. This creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents’ policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, “Win or Learn Fast”, for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method
-

Efficient learning equilibrium by Ronen I. Brafman, Moshe Tennenholtz. NeurIPS, 2002.

We introduce efficient learning equilibrium (ELE), a normative approach to learning in noncooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms must arrive at a desired value after polynomial time, and a deviation from the prescribed ELE becomes irrational after polynomial time. We prove the existence of an ELE (where the desired value is the expected payoff in a Nash equilibrium) and of a Pareto-ELE (where the objective is the maximization of social surplus) in repeated games with perfect monitoring. We also show that an ELE does not always exist in the imperfect monitoring case. Finally, we discuss the extension of these results to general-sum stochastic games
-

Extending Q-Learning to General Adaptive Multi-Agent Systems by Gerald Tesauro. NeurIPS, 2003.

Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.
-

Multi-agent influence diagrams for representing and solving games by Koller D, Milch B. Games and economic behavior, 2003.

Game theory provides a theoretical basis for defining rational behavior in multi-agent scenarios. However, the computational complexity of finding equilibria in large games has limited game theory’s practical applications. In this paper, we explore the application of structured probabilistic models to multi-agent scenarios. We define multi-agent influence diagrams (MAIDs), which represent games in a way that allows us to take advantage of independence relationships among variables. This representation allows us to define a notion of strategic relevance: D’ is strategically relevant to D if, to optimize the decision rule at D, the decision maker needs to know the decision rule at D’. We provide a sound and complete graphical criterion for determining strategic relevance. We then show how strategic relevance, which is made explicit by the MAID structure, can be exploited in algorithms for computing equilibria to obtain exponential savings in certain classes of games.
-

Existence of Multiagent Equilibria with Limited Agents by Michael Bowling, Manuela Veloso. JAIR, 2004.

Multiagent learning is a necessary yet challenging problem as multiagent systems become moreprevalent and environments become more dynamic. Much of the groundbreaking work in this areadraws on notable results from game theory, in particular, the concept of Nash equilibria. Learnersthat directly learn an equilibrium obviously rely on their existence. Learners that instead seek toplay optimally with respect to the other players also depend upon equilibria since equilibria arefixed points for learning. From another perspective, agents with limitations are real and common.These may be undesired physical limitations as well as self-imposed rational limitations, such asabstraction and approximation techniques, used to make learning tractable. This article explores theinteractions of these two important concepts: equilibria and limitations in learning. We introducethe question of whether equilibria continue to exist when agents have limitations. We look at thegeneral effects limitations can have on agent behavior, and define a natural extension of equilibriathat accounts for these limitations. Using this formalization, we make three major contributions: (i)a counterexample for the general existence of equilibria with limitations, (ii) sufficient conditionson limitations that preserve their existence, (iii) three general classes of games and limitations thatsatisfy these conditions. We then present empirical results from a specific multiagent learningalgorithm applied to a specific instance of limited agents. These results demonstrate that learningwith limitations is feasible, when the conditions outlined by our theoretical analysis hold.
-

Best-Response Multiagent Learning in Non-Stationary Environments by Michael Weinberg, Jeffrey S. Rosenschein. AAMAS, 2004.

This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a best-response policy, rather than in reaching an equilibrium. We present the first learning algorithm that is provably optimal against restricted classes of non-stationary opponents. The algorithm infers an accurate model of the opponent’s non-stationary strategy, and simultaneously creates a best-response policy against that strategy. Our learning algorithm works within the very general framework of N-player, general-sum stochastic games, and learns both the game structure and its associated optimal policy
-

A polynomial-time algorithm for Action-Graph Games by Jiang AX, Leyton-Brown K. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2006.

Action-Graph Games (AGGs) (Bhat & Leyton-Brown 2004) are a fully expressive game representation which can compactly express strict and context-specific independence and anonymity structure in players’ utility functions. We present an efficient algorithm for computing expected payoffs under mixed strategy profiles. This algorithm runs in time polynomial in the size of the AGG representation (which is itself polynomial in the number of players when the in-degree of the action graph is bounded). We also present an extension to the AGG representation which allows us to compactly represent a wider variety of structured utility functions.
-

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents by Vincent Conitzer, Tuomas Sandholm. Machine Learning, 2007.

Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games—assuming that the opponent’s mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents’ mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players’ actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.
-

Strong mediated equilibrium by Monderer D, Tennenholtz M. Artificial Intelligence. 2009 Jan 1;173(1):180-95.

This paper discusses the use of mediators in multi-agent systems in order to obtain stability against potential deviations by sets of agents. It is shown that strong equilibrium, a strategy profile that is immune to deviations by coalitions, is rare in game-theoretic terms. The use of mediators is suggested as a way to enrich the set of situations where stability can be obtained. A mediator is defined as a reliable entity that can ask the agents for permission to play on their behalf and is guaranteed to behave in a specified way based on messages received from the agents. A mediator cannot enforce behavior, however, and agents can play the game directly without the mediator's help. A mediator generates a new game for the players, the mediated game, and general results about mediators are proved. The notion of strong mediated equilibrium, which is a strong equilibrium in the mediated game, is discussed and it is shown that desired behaviors that are stable against deviations by coalitions can be obtained using mediators in several classes of settings.
-

An efficient optimal-equilibrium algorithm for two-player game trees by Littman ML, Ravi N, Talwar A, Zinkevich M. arXiv, 2012.

Two-player complete-information game trees are perhaps the simplest possible setting for studying general-sum games and the computational problem of finding equilibria. These games admit a simple bottom-up algorithm for finding subgame perfect Nash equilibria efficiently. However, such an algorithm can fail to identify optimal equilibria, such as those that maximize social welfare. The reason is that, counterintuitively, probabilistic action choices are sometimes needed to achieve maximum payoffs. We provide a novel polynomial-time algorithm for this problem that explicitly reasons about stochastic decisions and demonstrate its use in an example card game.
-

Graphical models for game theory by Kearns M, Littman ML, Singh S. arXiv, 2013.

We introduce a compact graph-theoretic representation for multi-party game theory. Our main result is a provably correct and efficient algorithm for computing approximate Nash equilibria in one-stage games represented by trees or sparse graphs.
-

Coordination

Learning to coordinate without sharing information by Sandip Sen, Mahendra Sekaran, John Hale. National Conference on Artificial Intelligence, 1994.

Researchers in the field of Distributed Artificial Intelligence (DAI) have been developing efficient mechanisms to coordinate the activities of multiple autonomous agents. The need for coordination arises because agents have to share resources and expertise required to achieve their goals. Previous work in the area includes using sophisticated information exchange protocols, investigating heuristics for negotiation, and developing formal models of possibilities of conflict and cooperation among agent interests. In order to handle the changing requirements of continuous and dynamic environments, we propose learning as a means to provide additional possibilities for effective coordination. We use reinforcement learning techniques on a block pushing problem to show that agents can learn complimentary policies to follow a desired path without any knowledge about each other. We theoretically analyze and experimentally verify the effects of learning rate on system convergence, and demonstrate benefits of using learned coordination knowledge on similar problems. Reinforcement learning based coordination can be achieved in both cooperative and non-cooperative domains, and in domains with noisy communication channels and other stochastic characteristics that present a formidable challenge to using other coordination schemes.
-

Recursive agent and agent-group tracking in a real-time dynamic environment by M. Tambe. AAAI Press, 1995.

Agent tracking is an important capability an intelligent agent requires for interacting with other agents. It involves monitoring the observable actions of other agents as well as inferring their unobserved actions or high-level goals and behaviors. This paper focuses on a key challenge for agent tracking: recursive tracking of individuals or groups of agents. The paper first introduces an approach for tracking recursive agent models. To tame the resultant growth in the tracking effort and aid real-time performance, the paper then presents model sharing, an optimization that involves sharing the effort of tracking multiple models. Such shared models are dynamically unshared as needed -- in effect, a model is selectively tracked if it is dissimilar enough to require unsharing. The paper also discusses the application of recursive modeling in service of deception, and the impact of sensor imperfections. This investigation is based on our on-going effort to build intelligent pilot agents for a real-world synthetic air-combat environment.
-

Learning Conventions in Multiagent Stochastic Domains using Likelihood Estimates by Craig Boutilier. UAI, 1996.

Fully cooperative multiagent systems-those in which agents share a joint utility model-is of special interest in AI. A key problem is that of ensuring that the actions of individual agents are coordinated, especially in settings where the agents are autonomous decision makers. We investigate approaches to learning coordinated strategies in stochastic domains where an agent's actions are not directly observable by others. Much recent work in game theory has adopted a Bayesian learning perspective to the more general problem of equilibrium selection, but tends to assume that actions can be observed. We discuss the special problems that arise when actions are not observable, including effects on rates of convergence, and the effect of action failure probabilities and asymmetries. We also use likelihood estimates as a means of generalizing fictitious play learning models in our setting. Finally, we propose the use of maximum likelihood as a means of removing strategies from consideration, with the aim of convergence to a conventional equilibrium, at which point learning and deliberation can cease.
-

Planning, Learning and Coordination in Multiagent Decision Processes by Craig Boutilier. TARK, 1996.

There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interesting issues that arise with regard to coordinating the policies of individual agents. To this end, we describe multiagent Markov decision processes as a general model in which to frame this discussion. These are special n-person cooperative games in which agents share the same utility function. We discuss coordination mechanisms based on imposed conventions (or social laws) as well as learning methods for coordination. Our focus is on the decomposition of sequential decision processes so that coordination can be learned (or imposed) locally, at the level of individual states. We also discuss the use of structured problem representations and their role in the generalization of learned conventions and in approximation.
-

Multiagent coordination with learning classifier systems by Sandip Sen, Mahendra Sekaran. IJCAI, 1996.

Researchers in the field of Distributed Artificial Intelligence (DAI) [Bond and Gasser, 1988] have developed a variety of agent coordination schemes under different assumptions about agent capabilities and relationships. Most of these schemes rely on shared knowledge or authority relationships between agents. These kinds of information may not be available or may be manipulated by malevolent agents. We have used reinforcement learning [Barto et al., 1989] as a coordination mechanism that imposes little cognitive burden on agents and does not suffer from the above-mentioned shortcomings [Sekaran and Sen, 1994; Sen et al., 1994]. In this paper, we evaluate a particular reinforcement learning methodology, a genetic algorithm based machine learning mechanism known as classifier systems [Holland, 1986] for developing action policies to optimize environmental feedback. Action policies that provide a mapping between perceptions and actions can be used by multiple agents to learn coordination strategies without having to rely on shared information. These agents are unaware of the capabilities of other agents and may or may not be cognizant of goals to achieve. We show that through repeated problem-solving experience, these agents can develop policies to maximize environmental feedback that can be interpreted as goal achievement from the viewpoint of an external observer. Experimental results from a couple of multiagent domains show that classifier systems can be more effective than the more widely used Q-learning scheme for multiagent coordination
-

A game-theoretic approach to coordination of traffic signal agents by Bazzan A. L. C. PhD thesis, 1997.

Innovative control strategies are needed to cope with the increasing urban traffic chaos. In most cases, the currently used strategies are based on a central traffic-responsive control system which can be demanding to implement and maintain. Therefore, a functional and spatial decentralization is desired. For this purpose, distributed artificial intelligence and multi-agent systems have come out with a series of techniques which allow coordination and cooperation. However, in many cases these are reached by means of communication and centrally controlled coordination processes, giving little room for decentralized management. Consequently, there is a lack of decision-support tools at managerial level (traffic control centers) capable of dealing with decentralized policies of control and actually profiting from them. In the present work a coordination concept is used, which overcomes some disadvantages of the existing methods. This concept makes use of techniques of evolutionary game theory: intersections in an arterial are modeled as individually-motivated agents or players taking part in a dynamic process in which not only their own local goals but also a global one has to be taken into account. The role of the traffic manager is facilitated since s/he has to deal only with tactical ones, leaving the operational issues to the agents. Thus the system ultimately provides support for the traffic manager to decide on traffic control policies. Some application in traffic scenarios are discussed in order to evaluate the feasibility of transferring the responsibility of traffic signal coordination to agents. The results show different performances of the decentralized coordination process in different scenarios (e.g. the flow of vehicles is nearly equal in both opposing directions, one direction has a clearly higher flow, etc.). Therefore, the task of the manager is facilitate once s/he recognizes the scenario and acts accordingly.
-

A Framework for Coordination and Learning Among Teams of Agents by Hung H. Bui, Svetha Venkatesh, Dorota Kieronska. LNAI, 1998.

We present a framework for team coordination under incomplete information based on the theory of incomplete information games. When the true distribution of the uncertainty involved is not known in advance, we consider a repeated interaction scenario and show that the agents can learn to estimate this distribution and share their estimations with one another. Over time, as the set of agents' estimations become more accurate, the utility they can achieve approaches the optimal utility when the true distribution is known, while the communication requirement for exchanging the estimations among the agents can be kept to a minimal level.
-

Individual learning of coordination knowledge by Sandip Sen, Mahendra Sekaran. Journal of Experimental and Theoretical Artificial Intelligence, 1998.

Social agents, both human and computational, inhabiting a world containing multiple active agents, need to coordinate their activities. This is because agents share resources, and without proper coordination or “rules of the road”, everybody will be interfering with the plans of others. As such, we need coordination schemes that allow agents to effectively achieve local goals without adversely affecting the problem-solving capabilities of other agents. Researchers in the field of Distributed Artificial Intelligence (DAI) have developed a variety of coordination schemes under different assumptions about agent capabilities and relationships. Whereas some of these research have been motivated by human cognitive biases, others have approached it as an engineering problem of designing the most effective coordination architecture or protocol. We evaluate individual and concurrent learning by multiple, autonomous agents as a means for acquiring coordination knowledge. We show that a uniform reinforcement learning algorithm suffices as a coordination mechanism in both cooperative and adversarial situations. Using a number of multiagent learning scenarios with both tight and loose coupling between agents and with immediate as well as delayed feedback, we demonstrate that agents can consistently develop effective policies to coordinate their actions without explicit information sharing. We demonstrate the viability of using both the Q-learning algorithm and genetic algorithm based classifier systems with different payoff schemes, namely the bucket brigade algorithm (BBA) and the profit sharing plan (PSP), for developing agent coordination on two different multi-agent domains. In addition, we show that a semi-random scheme for action selection is preferable to the more traditional fitness proportionate selection scheme used in classifier systems.
-

Coordinated Reinforcement Learning by Carlos Guestrin, Michail Lagoudakis, Ronald Parr. AAAI, 2002.

We present several new algorithms for multiagent reinforcement learning. A common feature of these algorithms is a parameterized, structured representation of a policy or value function. This structure is leveraged in an approach we call coordinated reinforcement learning, by which agents coordinate both their action selection activities and their parameter updates. Within the limits of our parametric representations, the agents will determine a jointly optimal action without explicitly considering every possible action in their exponentially large joint action space. Our methods differ from many previous reinforcement learning approaches to multiagent coordination in that structured communication and coordination between agents appears at the core of both the learning algorithm and the execution architecture. Our experimental results, comparing our approach to other RL methods, illustrate both the quality of the policies obtained and the additional benefits of coordination
-

Reinforcement Learning of Coordination in Cooperative Multiagent Systems by Kapetanakis S. and Kudenko D. AAAI, 2002.

We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multiagent systems. Specifically, we focus on a novel action selection strategy for Q-learning (Watkins 1989). The new technique is applicable to scenarios where mutual observation of actions is not possible. To date, reinforcement learning approaches for such independent agents did not guarantee convergence to the optimal joint action in scenarios with high miscoordination costs. We improve on previous results (Claus & Boutilier 1998) by demonstrating empirically that our extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.
-

Learning to Reach the Pareto Optimal Nash Equilibrium as a Team by Katja Verbeeck, Ann Nowe, Tom Lenaerts, Johan Parent. LNAI, 2002.

Coordination is an important issue in multi-agent systems when agents want to maximize their revenue. Often coordination is achieved through communication, however communication has its price. We are interested in finding an approach where the communication between the agents is kept low, and a global optimal behavior can still be found. In this paper we report on an efficient approach that allows independent reinforcement learning agents to reach a Pareto optimal Nash equilibrium with limited communication. The communication happens at regular time steps and is basicallya signal for the agents to start an exploration phase. During each exploration phase, some agents exclude their current best action so as to give the team the opportunityto look for a possiblyb etter Nash equilibrium. This technique of reducing the action space byexclusions was onlyrecen tlyin troduced for finding periodical policies in games of conflicting interests. Here, we explore this technique in repeated common interest games with deterministic or stochastic outcomes.
-

Coordination in Multiagent Reinforcement Learning: A Bayesian Approach by Georgios Chalkiadakis, Craig Boutilier. AAMAS, 2003.

Much emphasis in multiagent reinforcement learning (MARL) research is placed on ensuring that MARL algorithms (eventually) converge to desirable equilibria. As in standard reinforcement learning, convergence generally requires sufficient exploration of strategy space. However, exploration often comes at a price in the form of penalties or foregone opportunities. In multiagent settings, the problem is exacerbated by the need for agents to “coordinate” their policies on equilibria. We propose a Bayesian model for optimal exploration in MARL problems that allows these exploration costs to be weighed against their expected benefits using the notion of value of information. Unlike standard RL models, this model requires reasoning about how one’s actions will influence the behavior of other agents. We develop tractable approximations to optimal Bayesian exploration, and report on experiments illustrating the benefits of this approach in identical interest games.
-

Coordination of independent learners in cooperative Markov games by La¨etitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat. Technical Report, 2009.

In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficulties encountered by these approaches, we focus on two main skills: optimistic agents, which manage the coordination in deterministic environments, and the detection of the stochasticity of a game. Indeed, the key difficulty in stochastic environment is to distinguish between various causes of noise. The SOoN algorithm is so introduced, standing for “Swing between Optimistic or Neutral”, in which independent learners can adapt automatically to the environment stochasticity. Empirical results on various cooperative Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of other agents
-

Cooperation

Distributed intelligence for air fleet control by R. Steeb, S. Cammarata, F. Hayes-Roth, P. Thorndyke, and R. Wesson. Morgan Kaufmann Publishers, 1988.

This report summarizes the results of a nine-month investigation of distributed intelligence for air fleet control, conducted for the Information Processing Techniques Office, Defense Advanced Research Projects Agency.
-

Enhancing performance of cooperating agents in realtime diagnostic systems by U. M. Schwuttke and A. G. Quan. IJCAI, 1993.

We present a data-driven protocol and a supporting adlitecturc for communication among cooperating intelligent agents in real-time diagnostic systems. The system architecture and the exchange of information among agents are based on simplicity of agents, hierarchical organization of agents, and modular non-overlapping division of the problem domain. These featurcs combine 10 enable efficient diagnosis of complcex system failures in real-time environments with high dala volumes and moderate failure rates. Preliminary results of the real-world application of this work to the monitoring and diagnosis of complex systems are discussed in the context of NASA’s interplanetary mission operations.
-

Multi-Agent Reinforcement Learning Independent vs Cooperative Agents by Ming Tan. ICML, 1993.

Intelligent human agents exist in a cooperative social environment that facilitates learning. They learn not only by trial-and-error but also through cooperation by sharing instantaneous information episodic experience and learned knowledge. The key investigations of this paper are, "Given the same number of reinforcement learning agents will cooperative agents outperform independent agents who do not communicate during learning?" and "What is the price for such cooperation?" Using independent agents as a benchmark cooperative agents are studied in following ways: (1) sharing sensation, (2) sharing episodes and (3) sharing learned policies. This paper shows that as (a) additional sensation from another agent is beneficial if it can be used efficiently, (b) sharing learned policies or episodes among agents speeds up learning at the cost of communication and (c) for joint tasks agents engaging in partnership can significantly outperform independent agents although they may learn slowly in the beginning. These tradeoffs are not just limited to multi-agent reinforcement learning.
-

Integrating intelligent systems into a cooperating community for electricity distribution management by L. Z. Varga, N. R. Jennings, and D. Cockburn. Expert Systems with Applications, 1994.

Systems in which semi-autonomous problem-solving agents communicate and cooperate with one another represent an exciting vision of future computing environments. However, if this vision is ever going to result in commercially viable systems, then consideration must be given to the large software base that exists within many organisations. Success requires the ability to incorporate pre-existing systems alongside purpose-built agents in a cooperating community. This requirement is vital because the former represent a substantial resource investment that companies cannot afford to consign to the scrap heap. We report on our experiences of constructing cooperating communities that contain elements that were pre-existing and some that were developed specifically for incorporation into an integrated environment. The general purpose framework of ARCHON (ARchitecture for Co-operative Heterogeneous ON-line systems) provides the underlying technology to facilitate cooperative problem solving, and the exemplar domain is the real world problem of electricity distribution man-agement. The actual application being developed is called CIDIM (Cooperating Intelligent systems for Distribution system Management). An evolving methodology for designing and developing a mixed system such as this is outlined, based on our experiences in CIDIM and several other real-world industrial applications. It specifies a hybrid top-down and bottom-up approach to integration, identifies the important characteristics that shape multi-agent problem analysis, and outlines key factors that impinge upon the design of the community. This methodology is then used to motivate the design decisions for the CIDIM application. Finally the process of instantiating the individual agents is discussed, some helpful guidelines on testing and evaluating future applications are given, and the implementation of one of CIDIM's cooperative scenarios is described in depth.
-

Learning team strategies with multiple policy-sharing agents: A soccer case study by R. Salustowicz, M. Wiering, and J. Schmidhuber. Technical report, 1997.

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.
-

Evolving Behaviors for Cooperating Agents by Jeffrey K. Bassett, Kenneth A. De Jong. Symposium on Methodologies for Intelligent Systems, 2000.

A good deal of progress has been made in the past few years in the design and implementation of control programs for autonomous agents. A natural extension of this work is to consider solving difficult tasks with teams of cooperating agents. Our interest in this area is motivated in part by our involvement in a Navy-sponsored micro air vehicle (MAV) project in which the goal is to solve difficult surveillance tasks using a large team of small inexpensive autonomous air vehicles rather than a few expensive piloted vehicles. Our approach to developing control programs for these MAVs is to use evolutionary computation techniques to evolve behavioral rule sets. In this paper we describe our architecture for achieving this, and we present some of our initial results.
-

Advantages of Cooperation Between Reinforcement Learning Agents in Difficult Stochastic Problems by Hamid R. Berenji, David Vengerov. International Conference on Fuzzy Systems, 2000.

This paper presents the first results in understanding the reasons for cooperative advantage between reinforcement learning agents. We consider a cooporation method which consists of using and updating a common policy. We tested this method on a complex fuzzy reinforcement learning problem and found that cooperation brings larger than expected benefits. More precisely, we found that K cooperative agents oach learning for N time steps outperform K independent agents each learning in a separate world for K*N time steps. In this paper, we explain the observed phenomenon and determine the necessary conditions for its presence in a wide class of reinforccment learning problems.
-

Advantages of Cooperation Between Reinforcement Learning Agents in Difficult Stochastic Problems by Hamid R. Berenji, David Vengerov. Fuzzy Systems, 2000.

This paper presents the first results in understanding the reasons for cooperative advantage betwccn reinforcement learning agents. We consider a cooporation method which consists of using and updating a common policy. We tested this method on a complex fumy reinforcement learning problem and found that cooperation brings larger than expected benefits. More precisely, we found that K cooperative agents oach learning for N time steps outperform K independent agents oach learning in a separate world for K*N time steps. In this paper we explain the observed phenomenon and determine the necessary conditions for its presence in a wide class of reinforccment learning problems.
-

Learning to Cooperate via Policy Search by Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Pack Kaelbling. UAI, 2000.

Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.
-

All learning is local: Multi-agent learning in global reward games by Yu-Han Chang, Tracey Ho, and Leslie Pack Kaelbling. NeurIPS, 2003.

In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy.
-

Marginal contribution nets: A compact representation scheme for coalitional games by Ieong S, Shoham Y. In Proceedings of the 6th ACM Conference on Electronic Commerce, 2005.

We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation.
-

Multi-attribute coalitional games by Ieong S, Shoham Y. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006.

We study coalitional games where the value of cooperation among the agents are solely determined by the attributes the agents possess, with no assumption as to how these attributes jointly determine this value. This framework allows us to model diverse economic interactions by picking the right attributes. We study the computational complexity of two coalitional solution concepts for these games — the Shapley value and the core. We show how the positive results obtained in this paper imply comparable results for other games studied in the literature.
-

Bayesian Coalitional Games by Ieong S, Shoham Y. AAAI, 2008.

We introduce Bayesian Coalitional Games (BCGs), a generalization of classical coalitional games to settings with uncertainties. We define the semantics of BCG using the partition model, and generalize the notion of payoffs to contracts among agents. To analyze these games, we extend the solution concept of the core under three natural interpretations—ex ante, ex interim, and ex post—which coincide with the classical definition of the core when there is no uncertainty. In the special case where agents are risk-neutral, we show that checking for core emptiness under all three interpretations can be simplified to linear feasibility problems similar to that of their classical counterpart.
-

Decentralised Partially Observable MDPs

Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings by R. Nair, M. Tambe, M. Yokoo, D. Pynadath, S. Marsella. IJCAI, 2003.

The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision process (POMDP). Yet, despite the growing importance and applications of decentralized POMDP models in the multiagents arena, few algorithms have been developed for efficiently deriving joint policies for these models. This paper presents a new class of locally optimal algorithms called “Joint Equilibrium based search for policies (JESP)”. We first describe an exhaustive version of JESP and subsequently a novel dynamic programming approach to JESP. Our complexity analysis reveals the potential for exponential speedups due to the dynamic programming approach. These theoretical results are verified via empirical comparisons of the two JESP versions with each other and with a globally optimal brute-force search algorithm. Finally, we prove piece-wise linear and convexity (PWLC) properties, thus taking steps towards developing algorithms for continuous belief states
-

Decision Theory

A decision-theoretic approach to coordinating multiagent interactions by Piotr J. Gmytrasiewicz, Edmund H. Durfee, David K. Weh. IJCAI, 1991.

We describe a decision-theoretic method that an autonomous agent can use to model multiagent situations and behave rationally based on its model. Our approach, which we call the Recursive Modeling Method, explicitly accounts for the recursive nature of multiagent reasoning. Our method lets an agent recursively model another agent's decisions based on probabilistic views of how that agent perceives the multiagent situation, which in turn are derived from hypothesizing how that other agent perceives the initial agent's possible decisions, and so on. Further, we show how the possibility of multiple interactions can affect the decisions of agents, allowing cooperative behavior to emerge as a rational choice of selfish agents that otherwise might behave uncooperatively
-

The Complexity of Decentralized Control of Markov Decision Processes by Daniel S. Bernstein, Shlomo Zilberstein, Neil Immerman. UAI, 2000.

Planning for distributed agents with partial state information is considered from a decision theoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the finite-horizon problems corresponding to both of our models are complete for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov processes. In contrast to the MDP and POMDP problems, the problems we consider provably do not admit polynomialtime algorithms and most likely require doubly exponential time to solve in the worst case. We have thus provided mathematical evidence corresponding to the intuition that decentralized planning problems cannot easily be reduced to centralized problems and solved exactly using established techniques.
-

Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions by P. Stone, R. S. P., M. L. Littman, J. A. Csirik, and D. McAlleste. JAIR, 2003.

Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives.
-

Dispersion Games

Dispersion games: general definitions and some specific learning results by T. Grenager, R. Powers, Y. Shoham. AAAI, 2002.

Dispersion games are the generalization of the anticoordination game to arbitrary numbers of agents and actions. In these games agents prefer outcomes in which the agents are maximally dispersed over the set of possible actions. This class of games models a large number of natural problems, including load balancing in computer science, niche selection in economics, and division of roles within a team in robotics. Our work consists of two main contributions. First, we formally define and characterize some interesting classes of dispersion games. Second, we present several learning strategies that agents can use in these games, including traditional learning rules from game theory and artificial intelligence, as well as some special purpose strategies. We then evaluate analytically and empirically the performance of each of these strategies.
-

Evaluation

Analyzing Complex Strategic Interactions in Multi-Agent Systems by William E. Walsh, Rajarshi Das, Gerald Tesauro, Jeffrey O. Kephart. AAAI, 2002.

We develop a model for analyzing complex games with repeated interactions, for which a full game-theoretic analysis is intractable. Our approach treats exogenously specified, heuristic strategies, rather than the atomic actions, as primitive, and computes a heuristic-payoff table specifying the expected payoffs of the joint heuristic strategy space. We analyze two games based on (i) automated dynamic pricing and (ii) continuous double auction. For each game we compute Nash equilibria of previously published heuristic strategies. To determine the most plausible equilibria, we study the replicator dynamics of a large population playing the strategies. In order to account for errors in estimation of payoffs or improvements in strategies, we also analyze the dynamics and equilibria based on perturbed payoff
-

Run the GAMUT: A Comprehensive Approach to Evaluating Game-Theoretic Algorithms by Eugene Nudelman, Jennifer Wortman, Yoav Shoham. AAMAS, 2004.

We present GAMUT, a suite of game generators designed for testing game-theoretic algorithms. We explain why such a generator is necessary, offer a way of visualizing relationships between the sets of games supported by GAMUT, and give an overview of GAMUT’s architecture. We highlight the importance of using comprehensive test data by benchmarking existing algorithms. We show surprisingly large variation in algorithm performance across different sets of games for two widely-studied problems: computing Nash equilibria and multiagent learning in repeated games.
-

New Criteria and a New Algorithm for Learning in Multi-Agent Systems by Rob Powers, Yoav Shoham. NeurIPS, 2004.

We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm’s payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms
-

Learning against opponents with bounded memory by Rob Powers, Yoav Shoham. IJCAI, 2005.

Recently, a number of authors have proposed criteria for evaluating learning algorithms in multiagent systems. While well-justified, each of these has generally given little attention to one of the main challenges of a multi-agent setting: the capability of the other agents to adapt and learn as well. We propose extending existing criteria to apply to a class of adaptive opponents with bounded memory which we describe. We then show an algorithm that provably achieves an epsilon-best response against this richer class of opponents while simultaneously guaranteeing a minimum payoff against any opponent and performing well in self-play. This new algorithm also demonstrates strong performance in empirical tests against a variety of opponents in a wide range of environments
-

Evolutionary Game Theory and Strategies

The logic of animal conflict by Maynard Smith, J., Price, G.R. Nature, 1973.

Conflicts between animals of the same species usually are of “limited war” type, not causing serious injury. This is often explained as due to group or species selection for behaviour benefiting the species rather than individuals. Game theory and computer simulation analyses show, however, that a “limited war” strategy benefits individual animals as well as the species.
-

Evolution and the Theory of Games by J. Maynard Smith. American Scientist, 1976.

n situations characterized by conflict of interest, the best strategy to adopt depends on what others are doing.
-

The Evolution of Cooperation by Robert Axelrod, William D. Hamilton. Science, 1981.

Cooperation in organisms, whether bacteria or primates, has been a difficulty for evolutionary theory since Darwin. On the assumption that interactions between pairs of individuals occur on a probabilistic basis, a model is developed based on the concept of an evolutionarily stable strategy in the context of the Prisoner's Dilemma game. Deductions from the model, and the results of a computer tournament show how cooperation based on reciprocity can get started in an asocial world, can thrive while interacting with a wide range of other strategies, and can resist invasion once fully established. Potential applications include specific aspects of territoriality, mating, and disease.
-

An Introduction to Evolutionary Game Theory by Weibull, Jörgen W. IFN, 1992.

Please note that these lecture notes are incomplete and may contain typos and errors. I hope to have a more full-fledged version in a few months, and appreciate in the meantime comments and suggestions.
-

Competitive Environments Evolve Better Solutions for Complex Tasks by Peter J. Angeline, Jordan B. Pollack. ICGA, 1993.

In the typical genetic algorithm experiment, the fitness function is constructed to be independent of the contents of the population to provide a consistent objective measure. Such objectivity entails significant knowledge about the environment which suggests either the problem has previously been solved or other non-evolutionary techniques may be more efficient. Furthermore, for many complex tasks an independent fitness function is either impractical or impossible to provide. In this paper, we demonstrate that competitive fitness functions, i.e. fitness functions that are dependent on the constituents of the population, can provide a more robust training environment than independent fitness functions. We describe three differing methods for competitive fitness, and discuss their respective advantages
-

Nash Equilibrium and Evolution by Imitation by Bjornerstedt J., and Weibull, J. The Rational Foundations of Economic Behavior, 1995.

Nash's "mass action" interpretation of his equilibrium concept does not presume that the players know the game or are capable of sophisticated calculations. Instead, players are repeatedly and randomly drawn from large populations to play the game, one population for each player position, and base their strategy choice on observed payoffs. The present paper examines in some detail such an interpretation in a dass of population dynamics based on adaptation by way of imitation of successful behaviors. Drawing from results in evolutionary game theory, implications of dynamic stability for aggregate Nash equilibrium play are discussed.
-

Evolutionary computing in multi-agent environments by Lawrence Bull, Terence C Fogarty. AAAI, 1996.

The fields of Artificial Intelligence and Artificial Life have both focused on complex systems in which agents must cooperate to achieve certain goals. In our work we examine the performance of the genetic algorithm when applied to systems of this type. That is, we examine the use of population-based evolutionary computing techniques within cooperative multi-agent environments. In extending the genetic algorithm to such environments we introduce three macro-level operators to reduce the amount of knowledge required a priori; the joining of agents (symbiogenesis), the transfer of genetic material between agents and the speciation of initially homogeneous agents. These operators are used in conjunction with a generic rule-based framework, a simplified version of Pittsburgh-style classifier systems, which we alter to allow for direct systemic communication to evolve between the thus represented agents. In this paper we use a simulated trail following task to demonstrate these techniques, finding that they can give improved performance
-

Discovery by Genetic Programming of a Cellular Automata Rule that is Better than any Known Rule for the Majority Classification Problem by David Andre, Forrest H Bennett, and John R. Koza. Genetic programming, 1996.

It is difficult to program cellular automata. This is especially true when the desired computation requires global communication and global integration of information across great distances in the cellular space. Various human-written algorithms have appeared in the past two decades for the vexatious majority classification task for one-dimensional two-state cellular automata. This paper describes how genetic programming with automatically defined functions evolved a rule for this task with an accuracy of 82.326%. This level of accuracy exceeds that of the original 1978 Gacs-Kurdyumov-Levin (GKL) rule, all other known human-written rules, and all other known rules produced by automated methods. The rule evolved by genetic programming is qualitatively different from all previous rules in that it employs a larger and more intricate repertoire of domains and particles to represent and communicate information across the cellular space.
-

Evolving Cellular Automata with Genetic Algorithms: A Review of Recent Work by Melanie Mitchell, James P. Crutchfield, Rajarshi Das. EvCA, 1996.

We review recent work done by our group on applying genetic algorithms (GAs) to the design of cellular automata (CAs) that can perform computations requiring global coordination. A GA was used to evolve CAs for two computational tasks: density classification and synchronization. In both cases, the GA discovered rules that gave rise to sophisticated emergent computational strategies. These strategies can be analyzed using a "computational mechanics" framework in which "particles" carry information and interactions between particles effects information processing. This framework can also be used to explain the process by which the strategies were designed by the GA. The work described here is a first step in employing GAs to engineer useful emergent computation in decentralized multi-processor systems. It is also a first step in understanding how an evolutionary process can produce complex systems with sophisticated collective computational abilities.
-

Methods for Competitive and Cooperative Co-evolution by John Grefenstette, Robert Daley. AAAI, 1996.

We have been investigating evolutionary methods to design behavioral strategies for intelligent robots in multi-agent environments. Such ~nvironments resemble an ecological system in which species evolve and adapt in a complex interaction with other evolving and adapting species. This paper will report on our investigations of alternative co-evolutionary approaches in the context of a simulated multi-agent environment
-

Co-adaptation in a Team by Thomas D. Haynes, Sandip Sen. IJCIO, 1997.

We introduce a cooperative co-evolutionary system to facilitate the development of teams of heterogeneous agents. We believe that k different behavioral strategies for controlling the actions of a group of k agents can combine to form a cooperation strategy which efficiently achieves global goals. We both examine the on-line adaption of behavioral strategies utilizing genetic programming and demonstrate the successful co-evolution of cooperative individuals. We present a new crossover mechanism for genetic programming systems in order to facilitate the evolution of more than one member in the team during each crossover operation. Our goal is to reduce the time needed to evolve an effective team.
-

What have we learned from Evolutionary Game Theory so far? by Weibull, Jörgen W. IFN, 1997.

Evolutionary theorizing has a long tradition in economics. Only recently has this approach been brought into the framework of noncooperative game theory. Evolutionary game theory studies the robustness of strategic behavior with respect to evolutionary forces in the context of games played many times in large populations of boundedly rational agents. This new strand in economic theory has lead to new predictions and opened up doors to other social sciences. The discussion will be focused on the following questions: What distinguishes the evolutionary approach from the rationalistic? What are the most important ndings in evolutionary game theory so far? What are the next challenges for evolutionary game theory in economics?
-

Learning through Reinforcement and Replicator Dynamics by Borgers, T., Sarin, R. Journal of Economic Theory, 1997.

This paper considers a version of R. R. Bush and F. Mosteller's stochastic learning theory in the context of games. We show that in a continuous time limit the learning model converges to the replicator dynamics of evolutionary game theory. Thus we provide a non-biological interpretation of evolutionary game theory.
-

Evolutionary learning of communicating agents by Hitoshi Iba. Journal of Information Sciences, 1998.

This paper presents the emergence of the cooperative behavior for communicating agents by means of Genetic Programming (GP). Our experimental domains are the pursuit game and the robot navigation task. We conduct experiments with the evolution of the communicating agents and show the effectiveness of the emergent communication in terms of the robustness of generated GP programs. The performance of GP-based multi-agent learning is discussed with comparative experiments by using different breeding strategies, i.e., homogenous breeding and heterogeneous breeding
-

ASGA: Improving the ant system by integration with genetic algorithms by T. White, B. Pagurek, and F. Oppacher. In J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo, editors, Genetic Programming, 1998.

This paper describes how the Ant System can be improved by self-adaptation of its controlling parameters. Adaptation is achieved by integrating a genetic algorithm with the ant system and maintaining a population of agents (ants) that have been used to generate solutions. These agents have behavior that is inspired by the foraging activities of ants, with each agent capable of simple actions. Problem solving is inherently distributed and arises as a consequence of the self-organization of a collection of agents, or swarm system. This paper applies the Ant System with Genetic Algorithm (ASGA) system to the problem of path finding in networks, demonstrating by experimentation that the hybrid algorithm exhibits improved performance when compared to the basic Ant System.
-

Evolutionary Games and Population Dynamics by Hofbauer, J., Sigmund, K. Cambridge University Press, 1998.

Every form of behaviour is shaped by trial and error. Such stepwise adaptation can occur through individual learning or through natural selection, the basis of evolution. Since the work of Maynard Smith and others, it has been realised how game theory can model this process. Evolutionary game theory replaces the static solutions of classical game theory by a dynamical approach centred not on the concept of rational players but on the population dynamics of behavioural programmes. In this book the authors investigate the nonlinear dynamics of the self-regulation of social and economic behaviour, and of the closely related interactions between species in ecological communities. Replicator equations describe how successful strategies spread and thereby create new conditions which can alter the basis of their success, i.e. to enable us to understand the strategic and genetic foundations of the endless chronicle of invasions and extinctions which punctuate evolution. In short, evolutionary game theory describes when to escalate a conflict, how to elicit cooperation, why to expect a balance of the sexes, and how to understand natural selection in mathematical terms.
-

Evolving Multiple Agents by Genetic Programming by Hitoshi Iba. MIT press, 1999.

On the emergence of the cooperative behaviour for multiple agents by means of Genetic Programming (GP). Our experimental domains are multi-agent test beds, i.e., the robot navigation task and the Tile World. The world consists of a simulated robot agent and a simulated environment which is both dynamic and unpredictable. In our previous paper, we proposed three types of strategies, i.e, homogeneous breeding, heterogeneous breeding, and co-evolutionary breeding, for the purpose of evolving the cooperative behavior. We use the heterogeneous breeding in this paper. The previous Q-learning approach commonly used for the multi-agent task has the difficulty with the combinatorial explosion for many agents. This is because the state space for Q-table is so huge for the practical computer resources. We show how successfully GP-based multi-agent learning is applied to multi-agent tasks and compare the performance with Q-learning by experiments. Thereafter, we conduct experiments with the evolution of the communicating agents. The communication is an essential factor for the emergence of cooperation. This is because a collaborative agent must be able to handle situations in which conflicts arise and must be capable of negotiating with other agents to reach an agreement. The effectiveness of the emergent communication is empirically shown in terms of the robustness of generated GP programs.
-

Game Theory Evolving by Herbert Gintis. Princeton University Press, 2000.

Since its original publication Game Theory Evolving has been considered the best textbook on evolutionary game theory. This completely revised and updated second edition of Game Theory Evolving contains new material and shows students how to apply game theory to model human behavior in ways that reflect the special nature of sociality and individuality. The textbook continues its in-depth look at cooperation in teams, agent-based simulations, experimental economics, the evolution and diffusion of preferences, and the connection between biology and economics.
-

Resource sharing and coevolution in evolving cellular automata by J. Werfel, M. Mitchell, and J. P. Crutchfield. IEEE, 2000.

Evolving one-dimensional cellular automata (CAs) with genetic algorithms has provided insight into how improved performance on a task requiring global coordination emerges when only local interactions are possible. Two approaches that can affect the search efficiency of the genetic algorithm are coevolution, in which a population of problems—in our case, initial configurations of the CA lattice—evolves along with the population of CAs; and resource sharing, in which a greater proportion of a limited fitness resource is assigned to those CAs which correctly solve problems that fewer other CAs in the population can solve. Here we present evidence that, in contrast to what has been suggested elsewhere, the improvements observed when both techniques are used together depend largely on resource sharing alone.
-

Talking Helps: Evolving Communicating Agents for the Predator-Prey Pursuit Problem by Kam-Chuen Jim, C. Lee Giles. Artifical life, 2000.

We analyze a general model of multi-agent communication in which all agents communicate simultaneously to a message board. A genetic algorithm is used to evolve multi-agent languages for the predator agents in a version of the predator-prey pursuit problem. We show that the resulting behavior of the communicating multi-agent system is equivalent to that of a Mealy finite state machine whose states are determined by the agents’ usage of the evolved language. Simulations show that the evolution of a communication language improves the performance of the predators. Increasing the language size (and thus increasing the number of possible states in the Mealy machine) improves the performance even further. Furthermore, the evolved communicating predators perform significantly better than all previous work on similar preys. We introduce a method for incrementally increasing the language size which results in an effective coarse-to-fine search that significantly reduces the evolution time required to find a solution. We present some observations on the effects of language size, experimental setup, and prey difficulty on the evolved Mealy machines. In particular, we observe that the start state is often revisited, and incrementally increasing the language size results in smaller Mealy machines. Finally, a simple rule is derived that provides a pessimistic estimate on the minimum language size that should be used for any multi-agent problem.
-

A Game-Theoretic Approach to the Simple Coevolutionary Algorithm by Sevan G. Ficici, Jordan B. Pollack. LNCS, 2000.

The fundamental distinction between ordinary evolutionary algorithms (EA) and co-evolutionary algorithms lies in the interaction between coevolving entities. We believe that this property is essentially game-theoretic in nature. Using game theory, we describe extensions that allow familiar mixing-matrix and Markov-chain models of EAs to address coevolutionary algorithm dynamics. We then employ concepts from evolutionary game theory to examine design aspects of conventional coevolutionary algorithms that are poorly understood.
-

Evolution of biological information by Thomas D. Schneider. Nucleic Acids Research, 2000.

How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial ‘protein’ in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium.
-

Tuning Synthetic Pheromones With Evolutionary Computing by J Sauter, HVD Parunak and S Brueckner. ECOMAS, 2001.

Agents guided by synthetic pheromones can imitate the dynamics of insects. These systems are well suited to problems such as the control of unmanned robotic vehicles. We have developed a model for controlling robotic vehicles in air combat missions using synthetic pheromones. In the course of our experimentation, we have identified the need for proper tuning of the algorithms to get acceptable performance. We describe pheromones in natural andsynthetic systems, and describe the mechanisms we have developed. The role of evolutionary computing in offlineand online tuning is discussed.
-

Coevolutionary dynamics in a minimal substrate by R. Watson and J. Pollack. GECCO, 2001.

One of the central difficulties of coevolutionary methods arises from "intransitive superiority" -- in a two-player game, for example, the fact that A beats B, and B beats C, does not exclude the possibility that C beats A. Such cyclic superiority in a coevolutionary substrate is hypothesized to cause cycles in the dynamics of the population such that it "chases its own tail" - traveling through some part of strategy space more than once despite apparent improvement with each step. It is often difficult to know whether an application domain contains such difficulties and to verify this hypothesis in the failure of a given coevolutionary set-up. In this paper we wish to elucidate some of the issues and concepts in an abstract domain where the dynamics of coevolution can be studied simply and directly. We define three simple "number games" that illustrate intransitive superiority and resultant oscillatory dynamics, as well as some other relevant concepts. These include the distinction between a player's perceived performance and performance with respect to an external metric, and the significance of strategies with a multidimensional nature. These features alone can also cause oscillatory behavior and coevolutionary failure.
-

An empirical analysis of collaboration methods in cooperative coevolutionary algorithms by R. P. Wiegand, W. Liles, and K. De Jong. GECCO, 2001.

Although a variety of coevolutionary methods have been explored over the years, it has only been recently that a general architecture for cooperative coevolution has been proposed. Since that time, the flexibility and success of this cooperative coevolutionary architecture (CCA) has been shown in an array of different kinds of problems. However, many questions about the dynamics of this model, as well as the efficacy of various CCA-specific choices remain unanswered. One such choice surrounds the issue of how the algorithm selects collaborators for evaluation. This paper offers an empirical analysis of various types of collaboration mechanisms and presents some basic advice about how to choose a mechanism which is appropriate for a particular problem.
-

Modeling variation in cooperative coevolution using evolutionary game theory by R. P. Wiegand, W. Liles, and K. De Jong. FOGA, 2002.

Though coevolutionary algorithms are currently used for optimization purposes, practitioners are often plagued with difficulties due to the fact that such systems frequently behave in counter intuitive ways that are not well understood. This paper seeks to extend work which uses evolutionary game theory (EGT) as a form of dynamical systems modeling of coevolutionary algorithms in order to begin to answer questions regarding how these systems work. It does this by concentrating on a particular subclass of cooperative coevolutionary algorithms, for which multi-population symmetric evolutionary game theoretic models are known to apply. We examine dynamical behaviors of this model in the context of static function optimization, by both formal analysis, as well as model validation study. Finally, we begin looking at the effects of variation by extending traditional EGT, offering some introductory analysis, as well as model validation. In the course of this study, we investigate the effects of parameterized uniform crossover and bit-flip mutation.
-

Evolutionary dynamics discovered via visualization in the breve simulation environment by L. Spector and J. Klein. International Conference on the Simulation and Synthesis of Living Systems, 2002.

We report how breve, a simulation environment with rich 3d graphics, was used to discover significant patterns in the dynamics of a system that evolves controllers for swarms of goal-directed agents. These patterns were discovered via visualization in the sense that we had not considered their relevance or thought to look for them initially, but they became obvious upon visually observing the behavior of the system. In this paper we briefly describe breve and the system of evolving swarms that we implemented within it. We then describe two discovered properties of the evolutionary dynamics of the system: transitions to/from genetic drift regimes and the emergence of collective or multicellular organization. We comment more generally on the utility of 3d visualization for the discovery of biologically significant phenomena and briefly describe our ongoing work in this area. Pointers are provided to online resources including source code and animations that demonstrate several of the described effects. Associated links are available on-line at http://hampshire.edu/lspector/alife8-visualization.html.
-

Analyzing cooperative coevolution with evolutionary game theory by R. P. Wiegand, W. Liles, and K. De Jong. IEEE Press, 2002.

The task of understanding coevolutionary algorithms is a very difficult one. These algorithms search landscapes which are in some sense adaptive. As a result, the dynamical behaviors of coevolutionary systems can frequently be even more complex than traditional evolutionary algorithms (EAs). Moreover, traditional EA theory tells us little about coevolutionary algorithms. One major question that has yet to be clearly addressed is whether or not coevolutionary algorithms are well–suited for optimization tasks. Although this question is equally applicable to competitive, as well as cooperative approaches, answering the question for cooperative coevolutionary algorithms is perhaps more attainable. Recently, evolutionary game theoretic (EGT) models have begun to be used to help analyze the dynamical behaviors of coevolutionary algorithms. One type of EGT model which is already reasonably well understood are multi–population symmetric games. We believe these games can be used to analytically model cooperative coevolutionary algorithms. This paper introduces our analysis framework, explaining how and why such models may be generated. It includes some examples illustrating specific theoretical and empirical analyses. We demonstrate that using our framework, a better understanding for the degree to which cooperative coevolutionary algorithms can be used for optimization can be achieved.
-

Towards a relation between learning agents and evolutionary dynamics by Karl Tuyls, Tom Lenaerts, Katja Verbeeck, Sam Maes. BNAIC, 2002.

Modeling learning agents in the context of Multi-agent Systems requires insight in the type and form of interactions with the environment and other agents in the system. Usually, these agents are modeled similar to the different players in a standard game theoretical model. In this paper we examine whether evolutionary game theory, and more specifically the replicator dynamics, is an adequate theoretical model for the study of the dynamics of reinforcement learning agents in a multi-agent system. As a first step in this direction we extend the results of [1, 9] to a more general reinforcement learning framework, i.e. Learning Automata.
-

When Evolving Populations is Better than Coevolving Individuals: The Blind Mice Problem by Thomas Miconi. IJCAI, 2003.

This paper is about the evolutionary design of multi-agent systems. An important part of recent research in this domain has been focusing on collaborative revolutionary methods. We expose possible drawbacks of these methods, and show that for a non-trivial problem called the "blind mice" problem, a classical GA approach in which whole populations are evaluated, selected and crossed together (with a few tweaks) finds an elegant and non-intuitive solution more efficiently than cooperative coevolution. The difference in efficiency grows with the number of agents within the simulation. We propose an explanation for this poorer performance of cooperative coevolution, based on the intrinsic fragility of the evaluation process. This explanation is supported by theoretical and experimental arguments.
-

Exploring the Explorative Advantage of the Cooperative Coevolutionary (1+1) EA by Thomas Jansen, R. Paul Wiegand. GECCO, 2003.

Using a well-known cooperative coevolutionary function optimization framework, a very simple cooperative coevolutionary (1+1) EA is defined. This algorithm is investigated in the context of expected optimization time. The focus is on the impact the cooperative coevolutionary approach has and on the possible advantage it may have over more traditional evolutionary approaches. Therefore, a systematic comparison between the expected optimization times of this coevolutionary algorithm and the ordinary (1+1) EA is presented. The main result is that separability of the objective function alone is is not sufficient to make the cooperative coevolutionary approach beneficial. By presenting a clear structured example function and analyzing the algorithms’ performance, it is shown that the cooperative coevolutionary approach comes with new explorative possibilities. This can lead to an immense speed-up of the optimization.
-

On a Dynamical Analysis of Reinforcement Learning in Games: Emergence of Occam’s Razor by Karl Tuyls, Katja Verbeeck, Sam Maes. Lecture Notes in Computer Science, 2003.

Modeling learning agents in the context of Multi-agent Systems requires an adequate understanding of their dynamic behaviour. Usually, these agents are modeled similar to the different players in a standard game theoretical model. Unfortunately traditional Game Theory is static and limited in its usefelness. Evolutionary Game Theory improves on this by providing a dynamics which describes how strategies evolve over time. In this paper, we discuss three learning models whose dynamics are related to the Replicator Dynamics(RD). We show how a classical Reinforcement Learning(RL) technique, i.e. Qlearning relates to the RD. This allows to better understand the learning process and it allows to determine how complex a RL model should be. More precisely, Occam’s Razor applies in the framework of games, i.e. the simplest model (Cross) suffices for learning equilibria. An experimental verification in all three models is presented.
-

Differential Equations, Dynamical Systems, and an Introduction to Chaos by MW Hirsch, S Smale, RL Devaney. Elsevier, 2003.

Hirsch, Devaney, and Smale's classic "Differential Equations, Dynamical Systems, and an Introduction to Chaos" has been used by professors as the primary text for undergraduate and graduate level courses covering differential equations. It provides a theoretical approach to dynamical systems and chaos written for a diverse student population among the fields of mathematics, science, and engineering. Prominent experts provide everything students need to know about dynamical systems as students seek to develop sufficient mathematical skills to analyze the types of differential equations that arise in their area of study. The authors provide rigorous exercises and examples clearly and easily by slowly introducing linear systems of differential equations. Calculus is required as specialized advanced topics not usually found in elementary differential equations courses are included, such as exploring the world of discrete dynamical systems and describing chaotic systems. This is a classic text by three of the world's most prominent mathematicians. It continues the tradition of expository excellence. It contains updated material and expanded applications for use in applied studies.
-

A selection-mutation model for Q-learning in multi-agent systems by Karl Tuyls, Katja Verbeeck, Tom Lenaerts. AAMAS, 2003.

Although well understood in the single-agent framework, the use of traditional reinforcement learning (RL) algorithms in multi-agent systems (MAS) is not always justified. The feedback an agent experiences in a MAS, is usually influenced by the other agents present in the system. Multi agent environments are therefore non-stationary and convergence and optimality guarantees of RL algorithms are lost. To better understand the dynamics of traditional RL algorithms we analyze the learning process in terms of evolutionary dynamics. More specifically we show how the Replicator Dynamics (RD) can be used as a model for Q-learning in games. The dynamical equations of Q-learning are derived and illustrated by some well chosen experiments. Both reveal an interesting connection between the exploitationexploration scheme from RL and the selection-mutation mechanisms from evolutionary game theory.
-

Improving Coevolutionary Search for Optimal Multiagent Behaviors by Liviu Panait, R. Paul Wiegand, Sean Luke. IJCAI, 2003.

Evolutionary computation is a useful technique for learning behaviors in multiagent systems. Among the several types of evolutionary computation, one natural and popular method is to coevolve multiagent behaviors in multiple, cooperating populations. Recent research has suggested that coevolutionary systems may favor stability rather than performance in some domains. In order to improve upon existing methods, this paper examines the idea of modifying traditional coevolution, biasing it to search for maximal rewards. We introduce a theoretical justification of the improved method and present experiments in three problem domains. We conclude that biasing can help coevolution find better results in some multiagent problem domains.
-

Extended Replicator Dynamics as a Key to Reinforcement Learning in Multi-agent Systems by Karl Tuyls, Dries Heytens, Ann Nowe, Bernard Manderick. ECML, 2003.

Modeling learning agents in the context of Multi-agent Systems requires an adequate understanding of their dynamic behaviour. Evolutionary Game Theory provides a dynamics which describes how strategies evolve over time. B¨orgers et al. [1] and Tuyls et al. [11] have shown how classical Reinforcement Learning (RL) techniques such as Cross-learning and Q-learning relate to the Replicator Dynamics (RD). This provides a better understanding of the learning process. In this paper, we introduce an extension of the Replicator Dynamics from Evolutionary Game Theory. Based on this new dynamics, a Reinforcement Learning algorithm is developed that attains a stable Nash equilibrium for all types of games. Such an algorithm is lacking for the moment. This kind of dynamics opens an interesting perspective for introducing new Reinforcement Learning algorithms in multi-state games and MultiAgent Systems.
-

A Visual Demonstration of Convergence Properties of Cooperative Coevolution by Liviu Panait, R. Paul Wiegand, Sean Luke. PPSN, 2004.

We introduce a model for cooperative coevolutionary algorithms (CCEAs) using partial mixing, which allows us to compute the expected long-run convergence of such algorithms when individuals’ fitness is based on the maximum payoff of some N evaluations with partners chosen at random from the other population. Using this model, we devise novel visualization mechanisms to attempt to qualitatively explain a difficult-to-conceptualize pathology in CCEAs: the tendency for them to converge to suboptimal Nash equilibria. We further demonstrate visually how increasing the size of N, or biasing the fitness to include an ideal-collaboration factor, both improve the likelihood of optimal convergence, and under which initial population configurations they are not much help.
-

Analyzing Multi-agent Reinforcement Learning Using Evolutionary Dynamics by ’t Hoen, P.J., Tuyls, K. ECML, 2004.

In this paper, we show how the dynamics of Q-learning can be visualized and analyzed from a perspective of Evolutionary Dynamics (ED). More specifically, we show how ED can be used as a model for Qlearning in stochastic games. Analysis of the evolutionary stable strategies and attractors of the derived ED from the Reinforcement Learning (RL) application then predict the desired parameters for RL in MultiAgent Systems (MASs) to achieve Nash equilibriums with high utility. Secondly, we show how the derived fine tuning of parameter settings from the ED can support application of the COllective INtelligence (COIN) framework. COIN is a proved engineering approach for learning of cooperative tasks in MASs. We show that the derived link between ED and RL predicts performance of the COIN framework and visualizes the incentives provided in COIN toward cooperative behavior.
-

Selection in Coevolutionary Algorithms and the Inverse Problem by Sevan Ficici, Ofer Melnik, Jordan Pollack. Springer, 2004.

The inverse problem in the collective intelligence framework concerns how the private utility functions of agents can be engineered so that their selfish behaviors collectively give rise to a desired world state. In this chapter we examine several selection and fitnesssharing methods used in coevolution and consider their operation with respect to the inverse problem. The methods we test are truncation and linear-rank selection and competitive and similarity-based fitness sharing. Using evolutionary game theory to establish the desired world state, our analyses show that variable-sum games with polymorphic Nash are problematic for these methods. Rather than converge to polymorphic Nash, the methods we test produce cyclic behavior, chaos, or attractors that lack game-theoretic justification and therefore fail to solve the inverse problem. The private utilities of the evolving agents may thus be viewed as poorly factored—improved private utility does not correspond to improved world utility.
-

Spatial embedding and loss of gradient in cooperative coevolutionary algorithms. by R. P. Wiegand and J. Sarma. Springer, 2004.

Coevolutionary algorithms offer great promise as adaptive problem solvers but suffer from several known pathologies. Historically, spatially embedded coevolutionary algorithms seem to have succeeded where other coevolutionary approaches fail; however, explanations for this have been largely unexplored. We examine this idea more closely by looking at spatial models in the context of a particular coevolutionary pathology: loss of gradient. We believe that loss of gradient in cooperative coevolution is caused by asymmetries in the problem or initial conditions between populations, driving one population to convergence before another. Spatial models seem to lock populations together in terms of evolutionary change, helping establish a type of dynamic balance to thwart loss of gradient. We construct a tunably asymmetric function optimization problem domain and conduct an empirical study to justify this assertion. We find that spatial restrictions for collaboration and selection can help keep population changes balanced when presented with severe asymmetries in the problem.
-

Emergence of Collective Behavior in Evolving Populations of Flying Agents by L. Spector, J. Klein, C. Perry, and M. Feinstein. Springer, 2005.

We demonstrate the emergence of collective behavior in two evolutionary computation systems, one an evolutionary extension of a classic (highly constrained) flocking algorithm and the other a relatively un-constrained system in which the behavior of agents is governed by evolved computer programs. We describe the systems in detail, document the emergence of collective behavior, and argue that these systems present new opportunities for the study of group dynamics in an evolutionary context.
-

On Social Learning and Robust Evolutionary Algorithm Design in Economic Games by Floortje Alkemade, Han La Poutre, Hans Amman. Congress on Evolutionary Computation, 2005.

Agent-based computational economics (ACE) combines elements from economics and computer science. In this paper, we focus on the relation between the evolutionary technique that is used and the economic problem that is modeled. In the field of ACE, economic simulations often derive parameter settings for the genetic algorithm directly from the values of the economic model parameters. In this paper we compare two important approaches that are dominating in ACE and show that the above practice may hinder the performance of the GA and thereby hinder agent learning. More specifically, we show that economic model parameters and evolutionary algorithm parameters should be treated separately by comparing the two widely used approaches to social learning with respect to their convergence properties and robustness. This leads to new considerations for the methodological aspects of evolutionary algorithm design within the field of ACE. We also present improved social (ACE) simulation results for the Cournot oligopoly game, yielding (higher profit) Cournot-Nash equilibria instead of the competitive equilibria.
-

The Success and Failure of Tag-Mediated Evolution of Cooperation by Austin McDonald, Sandip Sen. LAMAS, 2005.

Use of tags to limit partner selection for playing has been shown to produce stable cooperation in agent populations playing the Prisoner’s Dilemma game. There is, however, a lack of understanding of how and why tags facilitate such cooperation. We start with an empirical investigation that identifies the key dynamics that result in sustainable cooperation in PD. Sufficiently long tags are needed to achieve this effect. A theoretical analysis shows that multiple simulation parameters including tag length, mutation rate and population size will have significant effect on sustaining cooperation. Experiments partially validate these observations. Additionally, we claim that tags only promote mimicking and not coordinated behavior in general, i.e., tags can promote cooperation only if cooperation requires identical actions from all group members. We illustrate the failure of the tag model to sustain cooperation by experimenting with domains where agents need to take complementary actions to maximize payoff
-

Exploration

Co-adaptive genetic algorithms: An example in othello strategy by R. Smith and B. Gray. University of Alabama, 1993.

This paper focuses on co-adaptive GAs, where population members are independent, and adaptation depends on the evolving population context. Recent research on co-adaptive GAs is reviewed. As an example of co-adaptation, a system of Othelo strategy acquisition is presented. Preliminary results illustrate how a co-adaptive GA can continually explore the space of Orthelo strategies. This exploration yields insight into strategy interactions in Orthelo. Avenues for future analysis and experimentation with co-adaptive GAs are discussed.
-

An Adaptive Approach for the Exploration-Exploitation Dilemma and Its Application to Economic Systems by Lilia Rejeb, Zahia Guessoum, Rym MHallah. LAMAS, 2005.

Learning agents have to deal with the exploration-exploitation dilemma. The choice between exploration and exploitation is very difficult in dynamic systems; in particular in large scale ones such as economic systems. Recent research shows that there is neither an optimal nor a unique solution for this problem. In this paper, we propose an adaptive approach based on metarules to adapt the choice between exploration and exploitation. This new adaptive approach relies on the variations of the performance of the agents. To validate the approach, we apply it to economic systems and compare it to two adaptive methods originally proposed by Wilson: one local and one global. Moreover, we compare different exploration strategies and focus on their influence on the performance of the agents.
-

Extensive Form Games

The complexity of two-person zero-sum games in extensive form by Koller D, Megiddo N. Games and economic behavior, 1992.

This paper investigates the complexity of finding max-min strategies for finite two-person zero-sum games in the extensive form. The problem of determining whether a player with imperfect recall can guarantee himself a certain payoff is shown to be NP-hard. When both players have imperfect recall, this is even harder. Moreover, the max-min behavior strategy of such a player may use irrational numbers. Thus, for games with imperfect recall, computing the max-min strategy or the value of the game is a hard problem. For a game with perfect recall, we present an algorithm for computing a max-min behavior strategy, which runs in time polynomial in the size of the game.
-

Efficient computation of equilibria for extensive two-person games by Koller D, Megiddo N, Von Stengel B. Games and economic behavior, 1996.

The Nash equilibria of a two-person, non-zero-sum game are the solutions of a certain linear complementarity problem (LCP). In order to use this for solving a game in extensive form, the game must first be converted to a strategic description such as the normal form. The classical normal form, however, is often exponentially large in the size of the game tree. If the game has perfect recall, a linear-sized strategic description is the sequence form. For the resulting small LCP, we show that an equilibrium is found efficiently by Lemke’s algorithm, a generalization of the Lemke–Howson method.
-

On pruning techniques for multi-player games by N. Sturtevant and R. Korf. AAAI, 2000.

Max-n (Luckhardt and Irani, 1986) is the extension of the minimax backup rule to multi-player games. We have shown that only a limited version of alpha-beta pruning, shallow pruning, can be applied to a max-n search tree. We extend this work by calculating the exact bounds needed to use this pruning technique. In addition, we show that branch-and-bound pruning, using a monotonic heuristic, has the same limitations as alpha-beta pruning in a max-n tree. We present a hybrid of these algorithms, alpha-beta branch-and-bound pruning, which combines a monotonic heuristic and backed-up values to prune even more effectively. We also briefly discuss the reduction of a n-player game to a "paranoid" 2-player game. In Sergeant Major, a 3-player card game, we averaged node expansions over 200 height 15 trees. Shallow pruning and branch-and-bound each reduced node expansions by a factor of about 100. Alpha-beta branch-and-bound reduced the expansions by an additional factor of 19. The 2-player reduction was a factor of 3 better than alpha-beta branchand-bound. Using heuristic bounds in the 2-player reduction reduced node expansions another factor of 12.
-

Learning to play games in extensive form by valuation by Phillipe Jehiel, Dov Samet. NAJ Economics, 2001.

Game theoretic models of learning which are based on the strategic form of the game cannot explain learning in games with large extensive form. We study learning in such games by using valuation of moves. A valuation for a player is a numeric assessment of her moves that purports to reflect their desirability. We consider a myopic player, who chooses moves with the highest valuation. Each time the game is played, the player revises her valuation by assigning the payoff obtained in the play to each of the moves she has made. We show for a repeated win–lose game that if the player has a winning strategy in the stage game, there is almost surely a time after which she always wins. When a player has more than two payoffs, a more elaborate learning procedure is required. We consider one that associates with each move the average payoff in the rounds in which this move was made. When all players adopt this learning procedure, with some perturbations, then, with probability 1 there is a time after which strategies that are close to subgame perfect equilibrium are played. A single player who adopts this procedure can guarantee only her individually rational payoff
-

Fictitious Play

Consistency and Cautious Fictitious Play by Drew Fudenberg, David K. Levine. Economic Dynamics and Control, 1995.

We study a variation of fictitious play, in which the probability of each action is an exponential function of that action’s utility against the historical frequency of opponents’ play. Regardless of the opponents’ strategies, the utility received by an agent using this rule is nearly the best that could be achieved against the historical frequency. Such rules are approximately optimal in i.i.d. environments, and guarantee nearly the minmax regardless of opponents’ behavior. Fictitious play shares these properties provided it switches “infrequently” between actions. We also study the long run outcomes when all players use consistent and cautious rules
-

Fictitious play property for games with identical interests by Monderer D, Shapley LS. Journal of economic theory, 1996.

An n-person game has identical interests if it is best response equivalent in mixed strategies to a game with identical payoff functions. It is proved that every such game has the fictitious play property.
-

Foundational Theory

Theory of Games and Economic Behaviour by von Neumann, J., Morgenstern, O. Princeton University Press, 1944.

This is the classic work upon which modern-day game theory is based. What began more than sixty years ago as a modest proposal that a mathematician and an economist write a short paper together blossomed, in 1944, when Princeton University Press published Theory of Games and Economic Behavior. In it, John von Neumann and Oskar Morgenstern conceived a groundbreaking mathematical theory of economic and social organization, based on a theory of games of strategy. Not only would this revolutionize economics, but the entirely new field of scientific inquiry it yielded--game theory--has since been widely used to analyze a host of real-world phenomena from arms races to optimal policy choices of presidential candidates, from vaccination policy to major league baseball salary negotiations. And it is today established throughout both the social sciences and a wide range of other sciences.
-

Non-cooperative games by J. Nash. Annals of Mathematics, 1951.

Theory of n-person games called cooperative in the absence of coalitions.
-

Games with incomplete information played by “Bayesian” players, Part I. The basic model by Harsanyi JC. Management science, 1967.

The paper develops a new theory for the analysis of games with incomplete information where the players are uncertain about some important parameters of the game situation, such as the payoff functions, the strategies available to various players, the information other players have about the game, etc. However, each player has a subjective probability distribution over the alternative possibilities.

In most of the paper it is assumed that these probability distributions entertained by the different players are mutually "consistent", in the sense that they can be regarded as conditional probability distributions derived from a certain "basic probability distribution" over the parameters unknown to the various players. But later the theory is extended also to cases where the different players' subjective probability distributions fail to satisfy this consistency assumption.

In cases where the consistency assumption holds, the original game can be replaced by a game where nature first conducts a lottery in accordance with the basic probablity distribution, and the outcome of this lottery will decide which particular subgame will be played, i.e., what the actual values of the relevant parameters will be in the game. Yet, each player will receive only partial information about the outcome of the lottery, and about the values of these parameters. However, every player will know the "basic probability distribution" governing the lottery. Thus, technically, the resulting game will be a game with complete information. It is called the Bayes-equivalent of the original game. Part I of the paper describes the basic model and discusses various intuitive interpretations for the latter. Part II shows that the Nash equilibrium points of the Bayes-equivalent game yield "Bayesian equilibrium points" for the original game. Finally, Part III considers the main properties of the "basic probablity distribution".
-

Games with incomplete information played by “Bayesian” players, part II. Bayesian equilibrium points by Harsanyi JC. Management science, 1968.

Part I of this paper has described a new theory for the analysis of games with incomplete information. It has been shown that, if the various players' subjective probability distributions satisfy a certain mutual-consistency requirement, then any given game with incomplete information will be equivalent to a certain game with complete information, called the "Bayes-equivalent" of the original game, or briefly a "Bayesian game."

Part II of the paper will now show that any Nash equilibrium point of this Bayesian game yields a "Bayesian equilibrium point" for the original game and conversely. This result will then be illustrated by numerical examples, representing two-person zero-sum games with incomplete information. We shall also show how our theory enables us to analyse the problem of exploiting the opponent's erroneous beliefs. However, apart from its indubitable usefulness in locating Bayesian equilibrium points, we shall show it on a numerical example (the Bayes-equivalent of a two-person cooperative game) that the normal form of a Bayesian game is in many cases a highly unsatisfactory representation of the game situation and has to be replaced by other representations (e.g., by the semi-normal form). We shall argue that this rather unexpected result is due to the fact that Bayesian games must be interpreted as games with "delayed commitment" whereas the normal-form representation always envisages a game with "immediate commitment."
-

A formal theory of knowledge and action by Moore RC. Formal theories of the commonsense world, 1985.

Most work on planning and problem solving within the field of artificial intelligence assumes that the agent has complete knowledge of all relevant aspects of the problem domain and problem situation. In the real world, however, planning and acting must frequently be performed without complete knowledge. This imposes two additional burdens on as intelligent agent trying to act effectively. First, when the agent entertains a plan for achieving some goal, he must consider not only whether the physical prerequisites of the plan have been satisfied. but also whether he has all the information necessary to carry- out the plan. Second. he must be able to reason about what be can do to obtain necessary information that he lacks. In this paper. we present a theory of action in which these problems are taken into account. showing how to formalize both the knowledge prerequisites of action and the effects of action on knowledge.
-

On the strategic stability of equilibria by Kohlberg E, Mertens JF. Econometrica: Journal of the Econometric Society, 1986.

A basic problem in the theory of noncooperative games is the following: which Nash equilibria are strategically stable, i.e. self-enforcing, and does every game have a strategically stable equilibrium? We list three conditions which seem necessary for strategic stability- backwards induction, iterated dominance, and invariance-and define a set-valued equilibrium concept that satisfies all three of them. We prove that every game has at least one such equilibrium set. Also, we show that the departure from the usual notion of single-valued equilibrium is relatively minor, because the sets reduce to points in all generic games.
-

Notes on the theory of choice by Kreps, D. M. Boulder, CO: Westview Press, 1988.

In this book, Professor Kreps presents a first course on the basic models of choice theory that underlie much of economic theory. This course, taught for several years at the Graduate School of Business, Stanford University, gives the student an introduction to the axiomatic method of economic analysis, without placing too heavy a demand on mathematical sophistication. The course begins with the basics of choice and revealed preference theory and then discusses numerical representations of ordinal preference. Models with uncertainty come next: first is von Neumann-Morgenstern utility, and then choice under uncertainty with subjective uncertainty, using the formulation of Anscombe and Aumann, and then sketching the development of Savage's classic theory. Finally, the course delves into a number of special topics, including de Finetti's theorem, modeling choice on a part of a larger problem, dynamic choice, and the empirical evidence against the classic models.
-

A Course in Game Theory by Martin J. Osborne and Ariel Rubinstein. MIT Press, 1994.

A Course in Game Theory presents the main ideas of game theory at a level suitable for graduate students and advanced undergraduates, emphasizing the theory's foundations and interpretations of its basic concepts. The authors provide precise definitions and full proofs of results, sacrificing generalities and limiting the scope of the material in order to do so. The text is organized in four parts: strategic games, extensive games with perfect information, extensive games with imperfect information, and coalitional games. It includes over 100 exercises.
-

Potential games by Monderer D, Shapley LS. Games and economic behavior, 1996.

We define and discuss several notions of potential functions for games in strategic form. We characterize games that have a potential function, and we present a variety of applications.
-

Learning in Multiagent Systems by Sandip Sen, Gerhard Weiss. MIT Press, 1999.

Chapter 6 of the book.
-

The Theory of Learning in Games by Drew Fudenberg, David K. Levine. MIT Press, 1999.

In economics, most noncooperative game theory has focused on equilibrium in games, especially Nash equilibrium and its refinements. The traditional explanation for when and why equilibrium arises is that it results from analysis and introspection by the players in a situation where the rules of the game, the rationality of the players, and the players' payoff functions are all common knowledge. Both conceptually and empirically, this theory has many problems. In The Theory of Learning in Games Drew Fudenberg and David Levine develop an alternative explanation that equilibrium arises as the long-run outcome of a process in which less than fully rational players grope for optimality over time. The models they explore provide a foundation for equilibrium theory and suggest useful ways for economists to evaluate and modify traditional equilibrium concepts.
-

Algorithms, Games, and the Internet by Christos H. Papadimitrio. STOC, 2001.

If the Internet is the next great subject for Theoretical Computer Science to model and illuminate mathematically, then Game Theory, and Mathematical Economics more generally, are likely to prove useful tools. In this talk I survey some opportunities and challenges in this important frontier.
-

Foundations of strategic equilibrium by Hillas J, Kohlberg E. Handbook of Game Theory with Economic Applications, 2002.

This is the third volume of the Handbook of Game Theory with Economic Applications. Since the publication of multi-Volume 1 a decade ago, game theory has continued to develop at a furious pace, and today it is the dominant tool in economic theory. The three volumes together cover the fundamental theoretical aspects, a wide range of applications to economics, several chapters on applications to political science and individual chapters on applications to disciplines as diverse as evolutionary biology, computer science, law, psychology and ethics. The authors are the most eminent practitioners in the field, including three Nobel Prize winners.The topics covered in the present volume include strategic ("Nash") equilibrium; incomplete information; two-person non-zero-sum games; noncooperative games with a continuum of players; stochastic games; industrial organization; bargaining, inspection; economic history; the Shapley value and its applications to perfectly competitive economies, to taxation, to public goods and to fixed prices; political science; law mechanism design; and game experimentation.
-

Reasoning about uncertainty by Halpern, Joseph Y. MIT press, 2017.

Uncertainty is a fundamental and unavoidable feature of daily life; in order to deal with uncertaintly intelligently, we need to be able to represent it and reason about it. In this book, Joseph Halpern examines formal ways of representing uncertainty and considers various logics for reasoning about it. While the ideas presented are formalized in terms of definitions and theorems, the emphasis is on the philosophy of representing and reasoning about uncertainty; the material is accessible and relevant to researchers and students in many fields, including computer science, artificial intelligence, economics (particularly game theory), mathematics, philosophy, and statistics.

Halpern begins by surveying possible formal systems for representing uncertainty, including probability measures, possibility measures, and plausibility measures. He considers the updating of beliefs based on changing information and the relation to Bayes' theorem; this leads to a discussion of qualitative, quantitative, and plausibilistic Bayesian networks. He considers not only the uncertainty of a single agent but also uncertainty in a multi-agent framework. Halpern then considers the formal logical systems for reasoning about uncertainty. He discusses knowledge and belief; default reasoning and the semantics of default; reasoning about counterfactuals, and combining probability and counterfactuals; belief revision; first-order modal logic; and statistics and beliefs. He includes a series of exercises at the end of each chapter.
-

General-Sum Games

Convergence Problems of General-Sum Multiagent Reinforcement Learning by Michael Bowling. ICML, 2000.

Stochastic games are a generalization of MDPs to multiple agents, and can be used as a framework for investigating multiagent learning. Hu and Wellman (1998) recently proposed a multiagent Q-learning method for general-sum stochastic games. In addition to describing the algorithm, they provide a proof that the method will converge to a Nash equilibrium for the game under specified conditions. The convergence depends on a lemma stating that the iteration used by this method is a contraction mapping. Unfortunately the proof is incomplete. In this paper we present a counterexample and flaw to the lemma’s proof. We also introduce strengthened assumptions under which the lemma holds, and examine how this affects the classes of games to which the theoretical result can be applied
-

Friend-or-foe Q-learning in general-sum games by Michael L. Littman. ICML, 2001.

This paper describes an approach to reinforcement learning in multiagent general-sum games in which a learner is told to treat each other agent as either a friend" or foe". This Q-learning-style algorithm provides strong convergence guarantees compared to an existing Nash-equilibrium-based learning rule.
-

Correlated-Q Learning by Amy Greenwald, Keith Hall, Roberto Serrano. NeurIPS workshop on multi-agent learning, 2002.

Bowling named two desiderata for multiagent learning algorithms: rationality and convergence. This paper introduces correlated-Q learning, a natural generalization of Nash-Q and FF-Q that satisfies these criteria. Nash-Q satisfies rationality, but in general it does not converge. FF-Q satisfies convergence, but in general it is not rational. Correlated-Q satisfies rationality by construction. This papers demonstrates the empirical convergence of correlated-Q on a standard testbed of general-sum Markov games.
-

Nash Q-Learning for General-Sum Stochastic Games by Junling Hu, Michael Wellman. JMLR, 2003.

We extend Q-learning to a noncooperative multiagent context, using the framework of general-sum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably converges given certain restrictions on the stage games (defined by Q-values) that arise during learning. Experiments with a pair of two-player grid games suggest that such restrictions on the game structure are not necessarily required. Stage games encountered during learning in both grid environments violate the conditions. However, learning consistently converges in the first grid game, which has a unique equilibrium Q-function, but sometimes fails to converge in the second, which has three different equilibrium Q-functions. In a comparison of offline learning performance in both games, we find agents are more likely to reach a joint optimal path with Nash Q-learning than with a single-agent Q-learning method. When at least one agent adopts Nash Q-learning, the performance of both agents is better than using single-agent Q-learning. We have also implemented an online version of Nash Q-learning that balances exploration with exploitation, yielding improved performance.
-

Towards a Pareto Optimal Solution in general-sum games by Sandip Sen, Stephane Airiau, Rajatish Mukherjee. AAMAS, 2003.

Multiagent learning literature has investigated iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we work with agents who select actions based on expected utility calculations that incorporates the observed frequencies of the actions of the opponent(s). We augment this stochastically-greedy agents with an interesting action revelation strategy that involves strategic revealing of one's action to avoid worst-case, pessimistic moves. We argue that in certain situations, such apparently risky revealing can indeed produce better payoff than a non-revealing approach. In particular, it is possible to obtain Pareto-optimal solutions that dominate Nash Equilibrium. We present results over a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.
-

Hierachical Learning

Concurrent layered learning by S. Whiteson and P. Stone. AAMAS, 2003.

Hierarchies are powerful tools for decomposing complex control tasks into manageable subtasks. Several hierarchical approaches have been proposed for creating agents that can execute these tasks. Layered learning is such a hierarchical paradigm that relies on learning the various subtasks necessary for achieving the complete high-level goal. Layered learning prescribes training low-level behaviors (those closer to the environmental inputs prior to high-level behaviors. In past implementations these lower-level behaviors were always frozen before advancing to the next layer. In this paper, we hypothesize that there are situations where layered learning would work better were the lower layers allowed to keep learning concurrently with the training of subsequent layers, an approach we call concurrent layered learning. We identify a situation where concurrent layered learning is beneficial and present detailed empirical results verifying our hypothesis. In particular, we use neuro-evolution to concurrently learn two layers of a layered learning approach to a simulated robotic soccer keepaway task. The main contribution of this paper is evidence that there exist situations where concurrent layered learning outperforms traditional layered learning. Thus, we establish that, when using layered learning, the concurrent training of layers can be an effective option.
-

Recent Advances in Hierarchical Reinforcement Learning by Andrew G. Barto, Sridhar Mahadevan. Discrete Event Dynamic Systems, 2003.

Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting.
-

Learning to Communicate and Act using Hierarchical Reinforcement Learning by Mohammad Ghavamzadeh, Sridhar Mahadevan. AAMAS, 2004.

In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decisions and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. In this algorithm, we define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Coordination skills among agents are learned faster by sharing information at the cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action. A communication action has a certain cost and provides each agent at a certain cooperation level with the actions selected by the other agents at the same level. We demonstrate the efficacy of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multiagent taxi domain.
-

Incomplete Information Games

Formulation of bayesian analysis for games with incomplete information by J-F. Mertens, S. Zamir. International Journal of Game Theory, 1985.

A formal model is given of Harsanyi's infinite hierarchies of beliefs. It is shown that the model closes with some Bayesian game with incomplete information, and that any such game can be approximated by one with a finite number of states of world.
-

Generating and Solving Imperfect Information Games by D. Koller and A. Pfeffer, IJCAI, 1995.

Work on game playing in AI has typically ignored games of imperfect information such as poker. In this paper, we present a framework for dealing with such games. We point out several important issues that arise only in the context of imperfect information games, particularly the insufficiency of a simple game tree model to represent the players' information state and the need for randomization in the players' optimal strategies. We describe Gala, an implemented system that providesthe user with a very natural and expressive language for describing games. From a game description, Gala creates an augmented game tree with information sets which can be used by various algorithms in order to find optimal strategies for that game. In particular, Gala implements the first practical algorithm for finding optimal randomized strategies in two-player imperfect information competitive games [Koller et al., 1994]. The running time of this algorithm is polynomial in the size of the game tree, whereas previous algorithms were exponential. We present experimental results showing that this algorithm is also efficient in practice and can therefore form the basis for a game playing system.
-

Regret minimizing equilibria and mechanisms for games with strict type uncertainty by Hyafil N, Boutilier C. arXiv preprint arXiv:1207.4147, 2012.

Mechanism design has found considerable application to the construction of agent-interaction protocols. In the standard setting, the type (e.g., utility function) of an agent is not known by other agents, nor is it known by the mechanism designer. When this uncertainty is quantified probabilistically, a mechanism induces a game of incomplete information among the agents. However, in many settings, uncertainty over utility functions cannot easily be quantified. We consider the problem of incomplete information games in which type uncertainty is strict or unquantified. We propose the use of minimax regret as a decision criterion in such games, a robust approach for dealing with type uncertainty. We define minimax-regret equilibria and prove that these exist in mixed strategies for finite games. We also consider the problem of mechanism design in this framework by adopting minimax regret as an optimization criterion for the designer itself, and study automated optimization of such mechanisms.
-

No-Regret Learning

On no-regret learning, fictitious play, and nash equilibrium by Jafari, C., Greenwald, A., Gondek, D., Ercal, G. ICML, 2001.

This paper addresses the question what is the outcome of multi-agent learning via no-regret algorithms in repeated games? Specifically, can the outcome of no-regret learning be characterized by traditional game-theoretic solution concepts, such as Nash equilibrium? The conclusion of this study is that no-regret learning is reminiscent of fictitious play: play converges to Nash equilibrium in dominancesolvable, constant-sum, and generalsum 2 2 games, but cycles exponentially in the Shapley game. Notably, however, the information required of fictitious play far exceeds that of noregret learning.
-

Convergence and No-Regret in Multiagent Learning by Michael Bowling. NeurIPS, 2004.

Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR).
-

Opponent Modelling

Learning other agents preferences in multiagent negotiation by authors. AAAI, 1996.

In multiagent systems, an agent does not usually have complete information about the preferences and decision making processes of other agents. This might prevent the agents from making coordinated choices, purely due to their ignorance of what others want. This paper describes the integration of a learning module into a communication-intensive negotiating agent architecture. The learning module gives the agents the ability to learn about other agents’ preferences via past interactions. Over time, the agents can incrementally update their models of other agents’ preferences and use them to make better coordinated decisions. Combining both communication and learning, as two complement knowledge acquisition methods, helps to reduce the amount of communication needed on average, and is justified in situations where communication is computationally costly or simply not desirable (e.g. to preserve the individual privacy).
-

Learning Models of Other Agents Using Influence Diagrams by Dicky Suryadi, Piotr J. Gmytrasiewicz. AAAI, 2000.

We adopt decision theory as a descriptive paradigm to model rational agents. We use influence diagrams as a modeling representation of agents, which is used to interact with them and to predict their behavior. In this paper, we provide a framework that an agent can use to learn the models of other agents in a multi-agent system (MAS) based on their observed behavior. Since the correct model is usually not known with certainty our agents maintain a number of possible models and assign a probability to each of them being correct. When none of the available models is likely to be correct, we modify one of them to better account for the observed behaviors. The modification refines the parameters of the influence diagram used to model the other agent’s capabilities, preferences, or beliefs. The modified model is then allowed to compete with the other models and the probability assigned to it being correct can be arrived at based on how well it predicts the behaviors of the other agent already observed.
-

Learning mutual trust by Bikramjit Banerjee, Rajatish Mukherjee, Sandip Sen. Workshop on Deception, Fraud and Trust in Agent Societies, 2000.

Multiagent learning literature has looked at iterated twoplayer games to develop mehanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configuration implies that there is no motivation for one player to hange its strategy if the other does not. Often, in general sum games, a higher payoff an be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we consider 1-level agents (modelers) who selet ations based on expeted utility considering probability distributions over the ations of the opponent(s). We show that in certain situations, such stochastially-greedy agents an perform better (by developing mutually trusting behavior) than those that expliitly attempt to converge to Nash Equilibrium
-

Learning about other agents in a dynamic multiagent system by Junling Hu, Michael Wellman. Journal of Cognitive Systems Research, 2001.

We analyze the problem of learning about other agents in a class of dynamic multiagent systems, where performance of the primary agent depends on behavior of the others. We consider an online version of the problem, where agents must learn models of the others in the course of continual interactions. Various levels of recursive models are implemented in a simulated double auction market. Our experiments show learning agents on average outperform non-learning agents who do not use information about others. Among learning agents, those with minimum recursion assumption generally perform better than the agents with more complicated, though often wrong assumptions.
-

Representing and aggregating conflicting beliefs by Maynard-Zhang P, Lehmann D. Journal of Artificial Intelligence Research, 2003.

We consider the two-fold problem of representing collective beliefs and aggregating these beliefs. We propose a novel representation for collective beliefs that uses modular, transitive relations over possible worlds. They allow us to represent conflicting opinions and they have a clear semantics, thus improving upon the quasi-transitive relations often used in social choice. We then describe a way to construct the belief state of an agent informed by a set of sources of varying degrees of reliability. This construction circumvents Arrow’s Impossibility Theorem in a satisfactory manner by accounting for the explicitly encoded conflicts. We give a simple set-theory-based operator for combining the information of multiple agents. We show that this operator satisfies the desirable invariants of idempotence, commutativity, and associativity, and, thus, is well-behaved when iterated, and we describe a computationally effective way of computing the resulting belief state. Finally, we extend our framework to incorporate voting.
-

Relational Learning

Multi-agent Relational Reinforcement Learning by Tom Croonenborghs, Karl Tuyls, Jan Ramon, and Maurice Bruynooghe. LAMAS, 2005.

In this paper we report on using a relational state space in multi-agent reinforcement learning. There is growing evidence in the Reinforcement Learning research community that a relational representation of the state space has many benefits over a propositional one. Complex tasks as planning or information retrieval on the web can be represented more naturally in relational form. Yet, this relational structure has not been exploited for multi-agent reinforcement learning tasks and has only been studied in a single agent context so far. In this paper we explore the powerful possibilities of using Relational Reinforcement Learning (RRL) in complex multi-agent coordination tasks. More precisely, we consider an abstract multi-state coordination problem, which can be considered as a variation and extension of repeated stateless Dispersion Games. Our approach shows that RRL allows to represent a complex state space in a multi-agent environment more compactly and allows for fast convergence of learning agents. Moreover, with this technique, agents are able to make complex interactive models (in the sense of learning from an expert), to predict what other agents will do and generalize over this model. This enables to solve complex multi-agent planning tasks, in which agents need to be adaptive and learn, with more powerful tools.
-

Repeated Games

Approximation to bayes risk in repeated plays by James Hannan. Contributions to the Theory of Games, 1959.

This paper is concerned with the development of a dynamic theoryof decision under uncertainty. The results obtained are directly applicableto the development of a dynamic theory of games in which at least one player is, at each stage, fully informed on the joint empirical distribution ofthe past choices of strategies of the rest. Since the decision problem canbe Imbedded in a sufficiently unspecified game theoretic model, the paperis written in the language and notation of the general two person game, in which, however, player I’s motivation is completely unspecified.
-

Bounded complexity justifies cooperation in finitely repeated prisoner’s dilemma by Abraham Neyman. Economic Letters, 1985.

Cooperation in the finitely repeated prisoner's dilemma is justified, without departure from strict utility maximization or complete information, but under the assumption that there are bounds (possibly very large) to the complexity of the strategies that the players may use.
-

Noncomputable strategies and discounted repeated games by John H. Nachbar, William R. Zame. Economic Theory, 1996.

A number of authors have used formal models of computation to capture the idea of “bounded rationality” in repeated games. Most of this literature has used computability by a finite automaton as the standard. A conceptual difficulty with this standard is that the decision problem is not “closed.” That is, for every strategy implementable by an automaton, there is some best response implementable by an automaton, but there may not exist any algorithm forfinding such a best response that can be implemented by an automaton. However, such algorithms can always be implemented by a Turing machine, the most powerful formal model of computation. In this paper, we investigate whether the decision problem can be closed by adopting Turing machines as the standard of computability. The answer we offer is negative. Indeed, for a large class of discounted repeated games (including the repeated Prisoner's Dilemma) there exist strategies implementable by a Turing machine for whichno best response is implementable by a Turing machine.
-

On Multiagent Q-Learning in a Semi-competitive Domain by Tuomas W. Sandholm, Robert H. Crites. Adaptation and Learning in Multiagent Systems, 1996.

Q-learning is a recent reinforcement learning (RL) algorithm that does not need a model of its environment and can be used on-line. Therefore it is well-suited for use in repeated games against an unknown opponent. Most RL research has been confined to single agent settings or to multiagent settings where the agents have totally positively correlated payoffs (team problems) or totally negatively correlated payoffs (zero-sum games). This paper is an empirical study of reinforcement learning in the iterated prisoner's dilemma (IPD), where the agents' payoffs are neither totally positively nor totally negatively correlated. RL is considerably more difficult in such a domain. This paper investigates the ability of a variety of Q-learning agents to play the IPD game against an unknown opponent. In some experiments, the opponent is the fixed strategy Tit-for-Tat, while in others it is another Q-learner. All the Q-learners learned to play optimally against Tit-for-Tat. Playing against another learner was more difficult because the adaptation of the other learner creates a nonstationary environment in ways that are detailed in the paper. The learners that were studied varied along three dimensions: the length of history they received as context, the type of memory they employed (lookup tables based on restricted history windows or recurrent neural networks (RNNs) that can theoretically store features from arbitrarily deep in the past), and the exploration schedule they followed. Although all the learners faced difficulties when playing against other learners, agents with longer history windows, lookup table memories, and longer exploration schedules fared best in the IPD games.
-

Prediction, Optimization, and Learning in Repeated Games by John H. Nachbar. Econometrica, 1997.

Consider a two-player discounted repeated game in which each player optimizes with respect to a prior belief about his opponent's repeated game strategy. One would like to argue that if beliefs are cautious, then each player's best response will be in the support, loosely speaking, of his opponent's belief and that, therefore, players will learn as the game unfolds to predict the continuation path of play. If this conjecture were true, a convergence result due to Kalai and Lehrer would imply that the continuation path of the repeated game would asymptotically resemble that of a Nash equilibrium. One would thus have constructed a theory in which Nash equilibrium behavior is a necessary long-run consequence of optimization by cautious players. This paper points out an obstacle to such a theory. Loosely put, in many repeated games, if players optimize with respect to beliefs that satisfy a diversity condition termed neutrality, then each player will choose a strategy that his opponent was certain would not be played.
-

Adaptive Game Playing Using Multiplicative Weights by Yoav Freund, Robert E. Schapire. Games and Economic Behavior, 1999.

We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicative-weight methods of Littlestone and Warmuth, is analyzed using the Kullback–Liebler divergence. This analysis yields a new, simple proof of the min–max theorem, as well as a provable method of approximately solving a game. A variant of our game-playing algorithm is proved to be optimal in a very strong sense
-

Implicit negotiation in repeated games by Peter Stone and Michael L. Littman. ATAL, 2001.

In business-related interactions such as the on-going high-stakes FCC spectrum auctions, explicit communication among participants is regarded as collusion, and is therefore illegal. In this paper, we consider the possibility of autonomous agents engaging in implicit negotiation via their tacit interactions. In repeated general-sum games, our testbed for studying this type of interaction, an agent using a _best response_ strategy maximizes its own payoff assuming its behavior has no effect on its opponent. This notion of best response requires some degree of learning to determine the fixed opponent behavior. Against an unchanging opponent, the best-response agent performs optimally, and can be thought of as a _follower_, since it adapts to its opponent. However, pairing two best-response agents in a repeated game can result in suboptimal behavior. We demonstrate this suboptimality in several different games using variants of Q-learning as an example of a best-response strategy. We then examine two _leader_ strategies that induce better performance from opponent followers via stubbornness and threats. These tactics are forms of implicit negotiation in that they aim to achieve a mutually beneficial outcome without using explicit communication outside of the game.
-

Leading Best-Response Strategies in Repeated Games by Peter Stone, Michael L. Littman. IJCAI, 2001.

First steps toward agents that can reason this way: Negotiation without explicit communication!
-

Sophisticated EWA Learning and Strategic Teaching in Repeated Games by Colin F. Camerer, Teck-Hua Ho, Juin-Kuan Chong. Journal of Economic Theory, 2002.

Most learning models assume players are adaptive (i.e., they respond only to their own previous experience and ignore others' payoff information) and behavior is not sensitive to the way in which players are matched. Empirical evidence suggests otherwise. In this paper, we extend our adaptive experienceweighted attraction (EWA) learning model to capture sophisticated learning and strategic teaching in repeated games. The generalized model assumes there is a mixture of adaptive learners and sophisticated players. An adaptive learner adjusts his behavior the EWA way. A sophisticated player rationally best-responds to her forecasts of all other behaviors. A sophisticated player can be either myopic or farsighted. A farsighted player develops multiple-period rather than single-period forecasts of others' behaviors and chooses to 'teach' the other players by choosing a strategy scenario that gives her the highest discounted net present value. We estimate the model using data from p-beauty contests and repeated trust games with incomplete information. The generalized model is better than the adaptive EWA model in describing and predicting behavior. Including teaching also allows an empirical learning-based approach to reputation formation which predicts better than a quantal-response extension of the standard typebased approach.
-

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games by Michael L. Littman, Peter Stone. ACM conference on Electronic commerce, 2003.

With the increasing reliance on game theory as a foundation for auctions and electronic commerce, efficient algorithms for computing equilibria in multiplayer general-sum games are of great theoretical and practical interest. The computational complexity of finding a Nash equilibrium for a one-shot bimatrix game is a well known open problem. This paper treats a closely related problem, that of finding a Nash equilibrium for an average-payoff repeated bimatrix game, and presents a polynomial-time algorithm. Our approach draws on the “folk theorem” from game theory and shows how finite-state equilibrium strategies can be found efficiently and expressed succinctly
-

The Role of Reactivity in Multiagent Learning by Bikramjit Banerjee, Jing Peng. AAMAS, 2004.

In this paper we take a closer look at a recently proposed classification scheme for multiagent learning algorithms. Based on this scheme an exploitation mechanism (we call it the Exploiter) was developed that could beat various Policy Hill Climbers (PHC) and other fair opponents in some repeated matrix games. We show on the contrary that some fair opponents may actually beat the Exploiter in repeated games. This clearly indicates a deficiency in the original classification scheme which we address. Specifically, we introduce a new measure called Reactivity that measures how fast a learner can adapt to an unexpected hypothetical change in the opponent’s policy. We show that in some games, this new measure can approximately predict the performance of a player, and based on this measure we explain the behaviors of various algorithms in the Matching Pennies game, which was inexplicable by the original scheme. Finally we show that under certain restrictions, a player that consciously tries to avoid exploitation may be unable to do so.
-

Robotic Teams

An adaptive communication protocol for cooperating mobile robots by H. Yanco and L. Stein. MIT press, 1993.

We describe mobile robots engaged in a cooperative task that requires communication. The robots are initially given a fixed but uninterpreted vocabulary for communication. In attempting to perform their task, the robots learn a private communication language. Different meanings for vocabulary elements are learned in different runs of the experiment. As circumstances change, the robots adapt their language to allow continued success at their task.
-

Robo-shepherd: Learning complex robotic behaviors by A. Schultz, J.Grefenstette, and W. Adams. ASME Press, 1996.

This paper reports on recent results using genetic algorithms to learn decision rules for complex robot behaviors. The method involves evaluating hypothetical rule sets on a simulator and applying simulated evolution to evolve more effective rules. The main contributions of this paper are (1) the task learned is a complex behavior involving multiple mobile robots, and (2) the learned rules are verified through experiments on operational mobile robots. The case study involves a shepherding task in which one mobile robot attempts to guide another robot to a specified area.
-

Learning Roles: Behavioral Diversity in Robot Teams by Tucker Balch. AAAI, 1997.

This paper describes research investigating behavioral specialization in learning robot teams. Each agent is provided a common set of skills (motor schema-based behavioral assemblages) from which it builds a taskachieving strategy using reinforcement learning. The agents learn individually to activate particular behavioral assemblages given their current situation and a reward signal. The experiments, conducted in robot soccer simulations, evaluate the agents in terms of performance, policy convergence, and behavioral diversity. The results show that in many cases, robots will autorustically diversify by choosing heterogeneous behaviors. The degree of diversification and the performance of the team depend on the reward structure. When the entire team is jointly rewarded or penalized (global reinforcement), teams tend towards heterogeneous behavior. When agents are provided feedback individually (local reinforcement), they converge to identical policies.
-

Reinforcement Learning in the Multi-Robot Domain by MAJA J. MATARIC. Autonomous Robots, 1997.

This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task
-

Genetic Programming Produced Competitive Soccer Softbot Teams for RoboCup97 by Sean Luke. Genetic Programming, 1998.

At RoboCup, teams of autonomous robots or software softbots compete in simulated soccer matches to demonstrate cooperative robotics techniques in a very difficult, real-time, noisy environment. At the IJCAI/RoboCup97 softbot competition, all entries but ours used human-crafted cooperative decision-making behaviors. We instead entered a softbot team whose high-level decision making behaviors had been entirely evolved using genetic programming. Our team won its first two games against human-crafted opponent teams, and received the RoboCup Scientific Challenge Award. This report discusses the issues we faced and the approach we took to use GP to evolve our robot soccer team for this difficult environment.
-

Reinforcement learning soccer teams with incomplete world models by M. Wiering, R. Salustowicz, and J. Schmidhuber. Journal of Autonomous Robots, 1999.

We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(lambda)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.
-

Evolving control for distributed micro air vehicles by A. Wu, A. Schultz, and A. Agah. IEEE, 1999.

In this paper, we focus on the task of large area surveillance. Given an area to be surveilled and a team of MAVs with appropriate sensors, the task is to dynamically distribute the micro air vehicles (MAVs) appropriately in the surveillance area for maximum coverage based on features present on the ground, and to adjust this distribution over time as changes in the team or on the ground occur. We have developed a system that will learn rule sets for controlling the individual MAVs in a distributed surveillance team. Since each rule set gov- erns an individual MAV, control of the overall behavior of the entire team is distributed; there is no single entity controlling the actions of the entire team. Currently, all members of the MAV team utilize the same rule set; specialization of individual MAVs through the evolution of unique rule sets is a logical extension to this work.
-

Exploiting embodiment in multi-robot teams by B. B. Werger and M. Mataric. IRIS, 1999.

This paper describes multi-robot experiments and systems which exploit their embodied nature to reduce needs for sensory, effector, and computational resources. Many natural phenomena, such as territorial markings or ant pheromone trails, take great advantage of the ability mark the environment, and of the natural dissipative processes which cause these markings to decay. In this way, globally complex behavior can result from simple local rules. The information invariants ([Donald 1995], [Donald et al 1994]) literature raises the issue of robots similarly recording information, or even "programs," into the physical environment. This paper provides example systems that dynamically encode information and "programs" into the physical environment, and by so doing, increase their own robustness and reduce their resource requirements and computational complexity. The main experimental system that we present, a robot "chain" used for foraging, is modeled after the natural phenomenon of ant ph...
-

Reward and Diversity in Multirobot Foraging by Tucker Balch. IJCAI, 1999.

This research seeks to quantify the impact of the choice of reward function on behavioral diversity in learning robot teams. The methodology developed for this work has been applied to multirobot foraging soccer and cooperative movement. This paper focuses specifically on results in multirobot foraging. In these experiments three types of reward are used with Qlearning to train a multirobot team to forageing a local performancebased reward a global performancebased reward and a heuristic strategy referred to as shaped reinforcement. Local strategies provide each agent a specific reward according to its own behavior while global rewards provide all the agents on the team the same reward simultaneously. Shaped reinforcement provides a heuristic reward for an agent's action given its situation. The experiments indicate that local performance based rewards and shaped reinforcement generate statistically similar results, they both provide the best performance and the least diversity. Finally learned policies are demonstrated on a team of Nomadic Technologies' Nomad-150 robots.
-

Heterogeneity in the Coevolved Behaviors of Mobile Robots: The Emergence of Specialists by Mitchell A. Potter, Lisa A. Meeden, Alan C. Schult. IJCAI, 2001.

Many mobile robot tasks can be most efficiently solved when a group of robots is utilized. The type of organization, and the level of coordination and communication within a team of robots affects the type of tasks that can be solved. This paper examines the tradeoff of homogeneity versus heterogeneity in the controlsystems by allowing a team of robots to coevolve their high-level controllers given differentlevels of difficulty of the task. Our hypothesis is that simply increasing the difficulty of a task is not enough to induce a team of robots to create specialists. The key factor is not difficulty per se, but the number of skill sets necessary to successfully solve the task. As the number of skills needed increases, the more beneficial and necessary heterogeneity becomes. We demonstrate this in the task domain of herding, where one or more robots must herd another robot into a confined space.
-

The necessity of average rewards in cooperative multirobot learning by P. Tangamchit, J. Dolan, and P. Khosla. IEEE, 2002.

Learning can be an effective way for robot systems to deal with dynamic environments and changing task conditions. However, popular single-robot learning algorithms based on discounted rewards, such as Q learning, do not achieve cooperation (i.e., purposeful division of labor) when applied to task-level multirobot systems. A task-level system is defined as one performing a mission that is decomposed into subtasks shared among robots. We demonstrate the superiority of average-reward-based learning such as the Monte Carlo algorithm for task-level multirobot systems, and suggest an explanation for this superiority.
-

Trail-laying robots for robust terrain coverage by J. Svennebring and S. Koenig. ICRA, 2003.

Robotics researchers have studied robots that can follow the trails laid by other robots. We, on the other hand, study robots that leave trails in the terrain to cove closed terrain once or repeatedly. How to design such ant robots has so far been studied only theoretically for gross robot simplifications. In this paper, we describe for the first time how to build physical ant robots that cover terrain. We show that a modified version of node counting can model the behavior of the ant robots and report on first experiments that we performed to understand their behavior better. These experiments confirm that our ant robots indeed cover terrain robustly even if the trails are of uneven quality, the ant robots are moved without realizing this, or some trails are destroyed. Finally, we report the results of a large-scale experiment where ten simulated ant robots covered a factory floor of 25 by 25 meters repeatedly over 85 hours without any ant robots getting stuck.
-

Co-Evolving Team Capture Strategies for Dissimilar Robots by H. Joseph Blumenthal, Gary B. Parker. AAAI, 2004.

Evolving team members to act cohesively is a complex and challenging problem. To allow the greatest range of solutions in team problem solving, heterogeneous agents are desirable. To produce highly specialized agents, team members should be evolved in separate populations. Co-evolution in separate populations requires a system for selecting suitable partners for evaluation at trial time. Selecting too many partners for evaluation drives computation time to unreasonable levels, while selecting too few partners blinds the GA from recognizing highly fit individuals. In previous work, we employed a method based on punctuated anytime learning which periodically tests a number of partner combinations to select a single individual from each population to be used at trail time. We began testing our method in simulation using a two-agent box pushing task. We then expanded our research by simulating a predator-prey scenario in which all the agents had the exact same capabilities. In this paper, we report the expansion of our work by applying this method of team learning to five dissimilar robots.
-

Efficient Reward Functions for Adaptive Multi-rover Systems by Kagan Tumer, Adrian Agogino. LAMAS, 2005.

This chapter focuses on deriving reward functions that allow multiple agents to co-evolve efficient control policies that maximize a system level reward in noisy and dynamic environments. The solution we present is based on agent rewards satisfying two crucial properties. First, the agent reward function and global reward function has to be aligned, that is, an agent maximizing its agent-specific reward should also maximize the global reward. Second, the agent has to receive sufficient “signal” from its reward, that is, an agent’s action should have a large influence over its agent-specific reward. Agents using rewards with these two properties will evolve the correct policies quickly. This hypothesis is tested in episodic and non-episodic, continuous-space multi-rover environment where rovers evolve to maximize a global reward function over all rovers. The environments are dynamic (i.e. changes over time), noisy and have restriction on communication between agents. We show that a control policy evolved using agent-specific rewards satisfying the above properties outperforms policies evolved using global rewards by up to 400%. More notably, in the presence of a larger number of rovers or rovers with noisy and communication limited sensors, the proposed method outperforms global reward by a higher percentage than in noisefree conditions with a small number of rovers.
-

Stochastic Games

An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning by Michael Bowling, Manuela Veloso. Carnegie Mellon University, 2000.

Learning behaviors in a multiagent environmentis crucial for developing and adapting multiagent systems. Reinforcement learning techniques have addressed this problem for a single agent acting in a stationary environment, which is modeled as a Markov decision process (MDP). But, multiagent environments are inherently non-stationary since the other agents are free to change their behavior as they also learn and adapt. Stochastic games, first studied in the game theory community, are a natural extension of MDPs to include multiple agents. In this paper we contribute a comprehensive presentation of the relevant techniques for solving stochastic games from both the game theory community and reinforcement learning communities. We examine the assumptions and limitations of these algorithms, and identify similarities between these algorithms, single agent reinforcement learners, and basic game theory techniques
-

A Multiagent Reinforcement Learning Algorithm using Extended Optimal Response by Nobuo Suematsu, Akira Hayashi. AAMAS, 2002.

Stochastic games provides a theoretical framework to multiagent reinforcement learning. Based on the framework, a multiagent reinforcement learning algorithm for zero-sum stochastic games was proposed by Littman and it was extended to general-sum games by Hu and Wellman. Given a stochastic game, if all agents learn with their algorithm, we can expect that the policies of the agents converge to a Nash equilibrium. However, agents with their algorithm always try to converge to a Nash equilibrium independent of the policies used by the other agents. In addition, in case there are multiple Nash equilibria, agents must agree on the equilibrium where they want to reach. Thus, their algorithm lacks adaptability in a sense. In this paper, we propose a multiagent reinforcement learning algorithm. The algorithm uses the extended optimal response which we introduce in this paper. It will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response. We also provide some empirical results in three simple stochastic games, which show that the algorithm can realize what we intend.
-

A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games by Enrique Munoz de Cote, Michael L. Littman. Preprint, 2012.

We present a polynomial-time algorithm that always finds an (approximate) Nash equilibrium for repeated two-player stochastic games. The algorithm exploits the folk theorem to derive a strategy profile that forms an equilibrium by buttressing mutually beneficial behavior with threats, where possible. One component of our algorithm efficiently searches for an approximation of the egalitarian point, the fairest pareto-efficient solution. The paper concludes by applying the algorithm to a set of grid games to illustrate typical solutions the algorithm finds. These solutions compare very favorably to those found by competing algorithms, resulting in strategies with higher social welfare, as well as guaranteed computational efficiency.
-

Theoretical and Conceptual Frameworks

Distributed reinforcement learning by G. Weiß. Sankt Augustin: Infix Verlag, 1995.

Distributed reinforcement learning...
-

Markov games as a framework for multiagent reinforcement learning by Michael L. Littman. ICML, 1994.

In the Markov decision process(MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Q-learning-like algorithm for finding optimal policies and demonstrates its application to a simple two-player game in which the optimal policy is probabilistic.
-

Realistic multi-agent reinforcement learning by J Schmidhuber. ECAI, 1996.

A "realistic multi-agent reinforcement learning system" consist of multiple "realistic reinforcement learners". Each leads a single life lasting from birth to unknown death. In between it tries to maximize cumulative reward. Its actions and learning algorithms consume part of its life --- computational resources are limited. The expected reward for a certain behaviour may change over time, partly because of the learner's or other learner's actions. Therefore, no learner can expect to collect (and generalize from) voluminous performance statistics. For this reason, previous approaches to multi-agent reinforcement learning are either very limited or heuristic by nature. I introduce a novel, sound, simple principle for learning in such general but typical situations. The principle allows for plugging in a wide variety of learning algorithms. I mention experiments with complex, non-Markovian environments that demonstrate the principle's practical feasibility.
-

Multi-agent learning with the success-story algorithm by J Schmidhuber and J Zhao. LNAI, 1996.

We study systems of multiple reinforcement learners. Each leads a single life lasting from birth to unknown death. In between it tries to accelerate reward intake. Its actions and learning algorithms consume part of its life computational resources are limited. The expected reward for a certain behavior may change over time, partly because of other learners' actions and learning processes. For such reasons, previous approaches to multi-agent reinforcement learning are either limited or heuristic by nature. Using a simple backtracking method called the "success-story algorithm", however, at certain times called evaluation points each of our learners is able to establish success histories of behavior modifications: it simply undoes all those of the previous modifications that were not empirically observed to trigger lifelong reward accelerations (computation time for learning and testing is taken into account). Then it continues to act and learn until the next evaluation point. Success histories can be enforced despite interference from other learners. The principle allows for plugging in a wide variety of learning algorithms. An experiment illustrates its feasibility.
-

Agents learning about agents: A framework and analysis by J. Vidal and E. Durfee. AAAI, 1997.

We provide a framework for the study of learning in certain types of multi-agent systems (MAS), that divides an agent's knowledge about others into different "types". We use concepts from computational learning theory to calculate the relative sample complexities of learning the different types of knowledge, given either a supervised or a reinforcement learning algorithm. These results apply only for the learning of a fixed target function, which would probably not exist if the other agents are also learning. We then show how a changing target function affects the learning behaviors of the/agents, and how to determine the advantages of having lower sample complexity. Our results can be used by a designer of a learning agent in a MAS to determine which knowledge he should put into the agent and which knowledge should be learned by the agent.
-

The moving target function problem in multiagent learning by J. Vidal and E. Durfee. IEEE, 1998.

We describe a framework that can be used to model and predict the behavior of MASs with learning agents. It uses a difference equation for calculating the progression of an agent's error in its decision function, thereby telling us how the agentis expected to fare in the MAS.The equation relies on parameters which capture the agents' learning abilities (such as its change rate, learning rate and retention rate) as well as relevant aspects of the MAS (such as the impact that agents have on each other). We validate the framework with experimental results using reinforcement learning agents in a market system, as well as by other experimental results gathered from the AI literature.
-

What is 'multi' in multi-agent learning? by G. Weiß and P. Dillenbourg. Pergamon Press, 1999.

Learning in multi-agent environments constitutes a research and application area whose importance is broadly acknowledged in artificial intelligence. Although there is a rapidly growing body of literature on multi-agent learning, almost nothing is known about the intrinsic nature of and requirements for this kind of learning. This observation is the starting point of this chapter which aims at providing a more general characterization of multi-agent learning. This is done in an interdisciplinary way from two different perspectives: the perspective of single-agent learning (the 'machine learning perspective') and the perspective of human-human collaborative learning (the 'psychological perspective'). The former leads to a 'positive' characterization: three types of learning mechanisms - multiplication, division, and interaction - are identified and illustrated that can occur in multi-agent but not in single-agent settings. The latter leads to a 'negative' characterization: several cognitive processes like conflict resolution, mutual regulation, and explanation are identified and discussed that are most essential to human-human collaborative learning, but have been largely ignored so far in the available multi-agent learning approaches. Misunderstanding among humans is identified as a major source of these processes, and its important role in the context of multiagent systems is stressed. This chapter also offers a brief guide to agents and multi-agent systems as studied in artificial intelligence, and suggests directions for future research on multi-agent learning.
-

Predicting the expected behavior of agents that learn about agents: the CLRI framework by J. Vidal and E. Durfee. Springer, 2003.

We describe a framework and equations used to model and predict the behavior of multi-agent systems (MASs) with learning agents. A difference equation is used for calculating the progression of an agent’s error in its decision function, thereby telling us how the agent is expected to fare in the MAS. The equation relies on parameters which capture the agent's learning abilities, such as its change rate, learning rate and retention rate, as well as relevant aspects of the MAS such as the impact that agents have on each other. We validate the framework with experimental results using reinforcement learning agents in a market system, as well as with other experimental results gathered from the AI literature. Finally, we use PAC-theory to show how to calculate bounds on the values of the learning parameters.
-

Adversarial reinforcement learning by W. Uther and M. Veloso. Carnegie Mellon University, 2003.

Reinforcement Learning has been used for a number of years in single agent environments. This article reports on our investigation of Reinforcement Learning techniques in a multi-agent and adversarial environment with continuous observable state information. We introduce a new framework, two-player hexagonal grid soccer, in which to evaluate algorithms. We then compare the performance of several single-agent Reinforcement Learning techniques in that environment. These are further compared to a previously developed adversarial Reinforcement Learning algorithm designed for Markov games. Building upon these efforts, we introduce new algorithms to handle the multi-agent, the adversarial, and the continuous-valued aspects of the domain. We introduce a technique for modelling the opponent in an adversarial game. We introduce an extension to Prioritized Sweeping that allows generalization of learnt knowledge over neighboring states in the domain; and we introduce an extension to the U Tree generalizing algorithm that allows the handling of continuous state spaces. Extensive empirical evaluation is conducted in the grid soccer domain
-

Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm by Junling Hu, Michael P. Wellman. ICML, 1998.

In this paper, we adopt general-sum stochastic games as a framework for multiagent reinforcement learning. Our work extends previous work by Littman on zero-sum stochastic games to a broader framework. We design a multiagent Q-learning method under this framework, and prove that it converges to a Nash equilibrium under specified conditions. This algorithm is useful for finding the optimal strategy when there exists a unique Nash equilibrium in the game. When there exist multiple Nash equilibria in the game, this algorithm should be combined with other learning techniques to find optimal strategies.
-

Multiagent reinforcement learning in stochastic games by Hu, J., Wellman, M.P. Cambridge University Press, 1999.

We adopt stochastic games as a general framework for dynamic noncooperative systems. This framework provides a way of describing the dynamic interactions of agents in terms of individuals' Markov decision processes. By studying this framework, we go beyond the common practice in the study of learning in games, which primarily focus on repeated games or extensive-form games. For stochastic games with incomplete information, we design a multiagent reinforcement learning method which allows agents to learn Nash equilibrium strategies. We show in both theory and experiments that this algorithm converges. From the viewpoint of machine learning research, our work helps to establish the theoretical foundation for applying reinforcement learning, originally deened for single-agent systems, to multiagent systems.
-

Distributed Value functions by J Schneider, WK Wong, A Moore and M Riedmiller. Carnegie Mellon University, 1999.

Many interesting problems, such as power grids, network switches, and traffic flow, that are candidates for solving reinforcement learning (RL), also have properties that make distributed solutions desirable. We propose an algorithm for distributed reinforcement learning based on distributing the representation of the value function across nodes. Each node in the system only has the ability to sense state locally, choose actions locally, and receive reward locally (the goal of the system is to maximise the sum of rewards over all nodes and over all time). However each node is allowed to give its neighbors the current estimate of its value function for the states it passes through. We present a value function learning rule, using that information, that allows each node to learn a value function that is an estimate of a weighted sum of the future rewards for all the nodes in the network. With this representation, each node can choose actions to imporve the perfomance of the overall system.
-

Incremental Reinforcement Learning for Designing Multi-Agent Systems by Olivier Buffet, Alain Dutech, François Charpillet. AGENTS, 2001.

Complex systems seem more easily desribed and controlled if seen as Multi-Agent Systems (MAS): constellation of satellites, multi-robots appliations... The design of a MAS is usually built "by hand", with the need of repeated simulations to tune the system. Here, we propose to build the system by having eah agent locally learn its own behavior. The use of Reinforcement Learning (RL) in the context of MAS suffers from several limitations which can make the learning task nearly impossible: Combinatorial explosion. The computational burden of RL algorithms grows exponentially with the number of states and ations in the system. Hidden global state. Usual situated agents an only rely on an imperfect, loal and partial perception of their environment. Credit assignment problem. When a positive reward is given by the environment, it is not always evident to credit positively the "good" actions that led to this reward. Our answer to these problems is to use a decentralized adapted incremental learning algorithm based on a classial RL tehnique
-

Learning to share meaning in a multi-agent system by A. Williams. Springer, 2004.

The development of the semantic Web will require agents to use common domain ontologies to facilitate communication of conceptual knowledge. However, the proliferation of domain ontologies may also result in conflicts between the meanings assigned to the various terms. That is, agents with diverse ontologies may use different terms to refer to the same meaning or the same term to refer to different meanings. Agents will need a method for learning and translating similar semantic concepts between diverse ontologies. Only until recently have researchers diverged from the last decade’s ‘‘common ontology’’ paradigm to a paradigm involving agents that can share knowledge using diverse ontologies. This paper describes how we address this agent knowledge sharing problem of how agents deal with diverse ontologies by introducing a methodology and algorithms for multi-agent knowledge sharing and learning in a peer-to-peer setting. We demonstrate how this approach will enable multi-agent systems to assist groups of people in locating, translating, and sharing knowledge using our Distributed Ontology Gathering Group Integration Environment (DOGGIE) and describe our proof-of-concept experiments. DOGGIE synthesizes agent communication, machine learning, and reasoning for information sharing in the Web domain.
-

Theses

Layered Learning in Multi-Agent Systems by Peter Stone. PhD thesis, 1998.

Multi-agent systems in complex, real-time domains require agents to act effectively both autonomously and as part of a team. This dissertation addresses multi-agent systems consisting of teams of autonomous agents acting in real-time, noisy, collaborative, and adversarial environments. Because of the inherent complexity of this type of multi-agent system, this thesis investigates the use of machine learning within multi-agent systems. The dissertation makes four main contributions to the fields of Machine Learning and Multi-Agent Systems. First, the thesis defines a team member agent architecture within which a exible teamstructure is presented, allowing agents to decompose the task space into exible roles and allowing them to smoothly switch roles while acting. Team organization is achieved by the introduction of a locker-room agreement as a collection of conventions followed by all team members. It defines agent roles, team formations, and pre-compiled multi-agent plans. In addition, the team member agent architecture includes a communication paradigm for domains with single-channel, low-bandwidth, unreliable communication. The communication paradigm facilitates team coordination while being robust to lost messages and active interference from opponents. Second, the thesis introduces layered learning, a general-purpose machine learning paradigm for complex domains in which learning a mapping directly from agents' sensors to their actuators is intractable. Given a hierarchical task decomposition, layered learning allows for learning at each level of the hierarchy, with learning at each level directly affecting learning at the next higher level. Third, the thesis introduces a new multi-agent reinforcement learning algorithm, namely team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL is designed for domains in which agents cannot necessarily observe the state changes when other team members act. It exploits local, action-dependent features to aggressively generalize its input representation for learning and partitions the task among the agents, allowing them to simultaneously learn collaborative policies by observing the long-term effects of their actions. Fourth, the thesis contributes a fully functioning multi-agent system that incorporates learning in a real-time, noisy domain with teammates and adversaries. Detailed algorithmic descriptions of the agents' behaviors as well as their source code are included in the thesis. Empirical results validate all four contributions within the simulated robotic soccer domain. The generality of the contributions is verified by applying them to the real robotic soccer, and network routing domains. Ultimately, this dissertation demonstrates that by learning portions of their cognitive processes, selectively communicating, and coordinating their behaviors via common knowledge, a group of independent agents can work towards a common goal in a complex, real-time, noisy, collaborative, and adversarial environment.
-

Multiagent Learning in the Presence of Agents with Limitations by Michael Bowling. Thesis, 2003.

Learning to act in a multiagent environment is a challenging problem. Optimal behavior for one agent depends upon the behavior of the other agents, which are learning as well. Multiagent environments are therefore non-stationary, violating the traditional assumption underlying single-agent learning. In addition, agents in complex tasks may have limitations, such as physical constraints or designer-imposed approximations of the task that make learning tractable. Limitations prevent agents from acting optimally, which complicates the already challenging problem. A learning agent must effectively compensate for its own limitations while exploiting the limitations of the other agents. My thesis research focuses on these two challenges, namely multiagent learning and limitations, and includes four main contributions. First, the thesis introduces the novel concepts of a variable learning rate and the WoLF (Win or Learn Fast) principle to account for other learning agents. The WoLF principle is capable of making rational learning algorithms converge to optimal policies, and by doing so achieves two properties, rationality and convergence, which had not been achieved by previous techniques. The converging effect of WoLF is proven for a class of matrix games, and demonstrated empirically for a wide-range of stochastic games. Second, the thesis contributes an analysis of the effect of limitations on the game-theoretic concept of Nash equilibria. The existence of equilibria is important if multiagent learning techniques, which often depend on the concept, are to be applied to realistic problems where limitations are unavoidable. The thesis introduces a general model for the effect of limitations on agent behavior, which is used to analyze the resulting impact on equilibria. The thesis shows that equilibria do exist for a few restricted classes of games and limitations, but even well-behaved limitations do not preserve the existence of equilibria, in general. Third, the thesis introduces GraWoLF, a general-purpose, scalable, multiagent learning algorithm. GraWoLF combines policy gradient learning techniques with the WoLF variable learning rate. The effectiveness of the learning algorithm is demonstrated in both a card game with an intractably large state space, and an adversarial robot task. These two tasks are complex and agent limitations are prevalent in both. Fourth, the thesis describes the CMDragons robot soccer team strategy for adapting to an unknown opponent. The strategy uses a notion of plays as coordinated team plans. The selection of team plans is the decision point for adapting the team to its current opponent, based on the outcome of previously executed plays. The CMDragons were the first RoboCup robot team to employ online learning to autonomously alter its behavior during the course of a game. These four contributions demonstrate that it is possible to effectively learn to act in the presence of other learning agents in complex domains when agents may have limitations. The introduced learning techniques are proven effective in a class of small games, and demonstrated empirically across a wide range of settings that increase in complexity
-

An Analysis of Cooperative Coevolutionary Algorithms by R. Paul Wiegand. Thesis, 2003.

Coevolutionary algorithms behave in very complicated, often quite counterintuitive ways. Researchers and practitioners have yet to understand why this might be the case, how to change their intuition by understanding the algorithms better, and what to do about the differences. Unfortunately, there is little existing theory available to researchers to help address these issues. Further, little empirical analysis has been done at a component level to help understand intrinsic differences and similarities between coevolutionary algorithms and more traditional evolutionary algorithms. Finally, attempts to categorize coevolution and coevolutionary behaviors remain vague and poorly defined at best. The community needs directed investigations to help practitioners understand what particular coevolutionary algorithms are good at, what they are not, and why. This dissertation improves our understanding of coevolution by posing and answering the question: “Are cooperative coevolutionary algorithms (CCEAs) appropriate for static optimization tasks?” Two forms of this question are “How long do they take to reach the global optimum” and “How likely are they to get there?” The first form of the question is addressed by analyzing their performance as optimizers, both theoretically and empirically. This analysis includes investigations into the effects of coevolution-specific parameters on optimization performance in the context of particular properties of potential problem domains. The second leg of this dissertation considers the second form of the question by looking at the dynamical properties of these algorithms, analyzing their limiting behaviors again from theoretical and empirical points of view. Two common cooperative coevolutionary pathologies are explored and illustrated, in both formal and practical settings. The result is a better understanding of, and appreciation for, the fact that CCEAs are not generally appropriate for the task of static, single-objective optimization. In the end a new view of the CCEA is offered that includes analysis-guided suggestions for how a traditional CCEA might be modified to be better suited for optimization tasks, or might be applied to more appropriate tasks, given the nature of its dynamics.
-

Planning under uncertainty in complex structured environments by Guestrin CE. Stanford University, 2003.

Many real-world tasks require multiple decision makers (agents) to coordinate their actions in order to achieve common long-term goals. Examples include: manufacturing systems, where managers of a factory coordinate to maximize profit; rescue robots that, after an earthquake, must safely find victims as fast as possible; or sensor networks, where multiple sensors collaborate to perform a large-scale sensing task under strict power constraints. All of these tasks require the solution of complex long-term multiagent planning problems in uncertain dynamic environments.
Factored Markov decision processes (MDPs) allow us to represent complex uncertain dynamic systems very compactly by exploiting problem-specific structure. Specifically, the state of the system is described by a set of variables that evolve stochastically over time using a compact representation called a dynamic Bayesian network (DBN). A DBN exploits locality by assuming that the short-term evolution of a particular variable only depends on a few other variables, e.g., the state of a section of a factory is only directly affected by a few other sections. In the long-term, all variables in a DBN usually become correlated. Factored MDPs often allow for an exponential reduction in representation complexity. However, the complexity of exact solution algorithms for such MDPs grows exponentially in the number of variables, and in the number of agents.
This thesis builds a formal framework and approximate planning algorithms that exploit structure in factored MDPs to solve problems with many trillions of states and actions very efficiently.
-

Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric approaches by Ville Kononen. PhD dissertation, 2004.

Modern computing systems are distributed, large, and heterogeneous. Computers, other information processing devices and humans are very tightly connected with each other and therefore it would be preferable to handle these entities more as agents than stand-alone systems. One of the goals of artificial intelligence is to understand interactions between entities, whether they are artificial or natural, and to suggest how to make good decisions while taking other decision makers into account. In this thesis, these interactions between intelligent and rational agents are modeled with Markov games and the emphasis is on adaptation and learning in multiagent systems. Markov games are a general mathematical tool for modeling interactions between multiple agents. The model is very general, for example common board games are special instances of Markov games, and particularly interesting because it forms an intersection of two distinct research disciplines: machine learning and game theory. Markov games extend Markov decision processes, a well-known tool for modeling single-agent problems, to multiagent domains. On the other hand, Markov games can be seen as a dynamic extension to strategic form games, which are standard models in traditional game theory. From the computer science perspective, Markov games provide a flexible and efficient way to describe different social interactions between intelligent agents. This thesis studies different aspects of learning in Markov games. From the machine learning perspective, the focus is on a very general learning model, i.e. reinforcement learning, in which the goal is to maximize the long-time performance of the learning agent. The thesis introduces an asymmetric learning model that is computationally efficient in multiagent systems and enables the construction of different agent hierarchies. In multiagent reinforcement learning systems based on Markov games, the space and computational requirements grow very quickly with the number of learning agents and the size of the problem instance. Therefore, it is necessary to use function approximators, such as neural networks, to model agents in many real-world applications. In this thesis, various numeric learning methods are proposed for multiagent learning problems. The proposed methods are tested with small but non-trivial example problems from different research areas including artificial robot navigation, simplified soccer game, and automated pricing models for intelligent agents. The thesis also contains an extensive literature survey on multiagent reinforcement learning and various methods based on Markov games. Additionally, game-theoretic methods and methods originated from computer science for multiagent learning and decision making are compared.
-

Applications

Traffic Management

Multiagent Traffic Management: Opportunities for Multiagent Learning by Kurt Dresner, Peter Stone. LAMAS, 2005.

Traffic congestion is one of the leading causes of lost productivity and decreased standard of living in urban settings. In previous work published at AAMAS, we have proposed a novel reservation-based mechanism for increasing throughput and decreasing delays at intersections. In more recent work, we have provided a detailed protocol by which two different classes of agents (intersection managers and driver agents) can use this system. We believe that the domain created by this mechanism and protocol presents many opportunities for multiagent learning on the parts of both classes of agents. In this paper, we identify several of these opportunities and offer a first-cut approach to each
-

Network Routing

Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach by Justin A. Boyan, Michael L. Littman. NeurIPS, 1994.

This paper describes the Q-routing algorithm for packet routing, in which a reinforcement learning module is embedded into each node of a switching network. Only local communication is used by each node to keep accurate statistics on which routing decisions lead to minimal delivery times. In simple experiments involving a 36-node, irregularly connected network, Q-routing proves superior to a nonadaptive algorithm based on precomputed shortest paths and is able to route efficiently even when critical aspects of the simulation, such as the network load, are allowed to vary dynamically. The paper concludes with a discussion of the tradeoff between discovering shortcuts and maintaining stable policies.
-

Ants and reinforcement learning: A case study in routing in dynamic networks by D. Subramanian, P. Druschel, and J. Chen. IJCAI, 1997.

We investigate two new distributed routing algorithms for data networks based on simple biological" ants" that explore the network and rapidly learn good routes, using a novel variation of reinforcement learning. These two algorithms are fully adaptive to topology changes and changes in link costs in the network, and have space and computational overheads that are competitive with traditional packet routing algorithms: although they can generate more routing traffic when the rate of failures in a network is low, they perform much better under higher failure rates. Both algortihms are more resilient than traditional algorithms, in the sense that random corruption of routing has limited impact on the computation of paths. We present convergence theorems for both of our algorithms drawing up on the theory of non-stationary and stationary disctete-time Markov chains over the reals.
-

Scheduling

Multi-Machine Scheduling - A Multi-Agent Learning Approach by Wilfried Brauer, Gerhard WeiB. International Conference on Multi-Agent Systems, 1998.

Multi-machine scheduling, that is, the assigment of jobs to machines such that certain performance demands like cost and time effectiveness are fualled, is a ubiquitous and complex activity in everyday lge. This paperpresents an approach to multi-machine scheduling that follows the multiagent learningparadigm known from thefield ofDistributed Artificial Intelligence. According to this approach the machines collectively and as a whole learn and iteratively refine appropriate schedules. The major characteristic of this approach is that learning is distributed over several machines, and that the individual machines carry out their learning activities in a parallel and asynchronous way
-

Telecommunications

Application of distributed AI and cooperative problem solving to telecommunications by R. Weihmayer and H. Velthuijsen. IOS Press, 1994.

A cursory survey of applications in the DAI literature suggests that a primary area of using DAI technology for real-world systems can be found in telecommunications. This paper explores the current state-of-the-art of DAI and cooperative problem solving in this domain.Telecommunication networks have proven in the last five years to be a fertile ground for problems involving the coordination of distributed intelligence. A wide range of problems from distributed traffic management to resolution of service interactions in intelligent networks have led to conceptual studies and prototype developments involving systems of cooperative agents. Although there are currently no fielded systems that can be said to be DAI-based, we believe these will begin to appear within the next five years. We give an overview of the full range of potential applications found in the literature and additionally consider four applications in more detail, including the authors contributions to the field.
-

Cognitive Science

Sociobiology: The New Synthesis by E. Wilson. Belknap Press, 1975.

Sociobiology was brought together as a coherent discipline in Sociobiology: The New Synthesis (1975), the book now reprinted, but it was originally conceived in my earlier work The Insect Societies (1971) as a union between entomology and population biology. This first step was entirely logical, and in retrospect, inevitable. In the 1950s and 1960s studies of the social insects had multiplied and attained a new but still unorganized level.
-

To help or not to help by M. Sekaran and S. Sen. Conference of the Cognitive Science Society, 1995.

Any designer of intelligent agents in a multiagent system is faced with the choice of encoding a strategy of interaction with other agents. If the nature of other agents are known in advance, a suitable strategy may be chosen from the continuum between completely selfish behavior on one extreme and a philanthropic behavior on the other. In an open and dynamic system, however, it is unrealistic to assume that the nature of all other agents, possibly designed and used by users with very different goals and motivations, are known precisely. In the presence of this uncertainty, is it possible to build agents that adapt their behavior to interact appropriately with the particular group of agents in the current scenario? We address this question by borrowing on the simple yet powerful concept of reciprocal behavior. We propose a stochastic decision making scheme which promotes reciprocity among agents. Using a package delivery problem we show that reciprocal behavior can lead to system-wide cooperation, and hence close to optimal global performance can be achieved even though each individual agent chooses actions to benefit itself. More interestingly, we show that agents who do not help others perform worse in the long run when compared with reciprocal agents. Thus it is to the best interest of every individual agent to help other agents.
-

Multiagent systems: Milestones and new horizons by S. Sen. Elsevier, 1997.

Research in multiagent systems (MAS), or Distributed AI dates back to late 70's. Initial work in the area focused on distributed interpretation of sensor data organizational structuring and generic negotiation protocols. But several recent developments have helped reshape the focus of the field. Like the rest of AI the field has matured from being largely exploratory in nature to focusing on formal theories of negotiation distributed reasoning multiagent learning and communication languages. The field is also maturing to the point of developing its first few fielded applications. The recent widespread interest in the internet the world-wide-web and intelligent agent applications have further fueled the need for techniques and mechanisms by which agents representing users can effectively interact with other agents in open dynamic environments. The development of several new international workshops and conferences have helped focus research in the area. The field is poised at a critical juncture with stimulating problems and challenges promising some very exciting developments in the next few year.
-

Softwares and frameworks

Keepaway soccer: A machine learning testbed. by P. Stone and R. Sutton. In A. Birk, S. Coradeschi, and S. Tadokoro. Springer, 2002.

Keepaway soccer has been previously put forth as a testbed for machine learning. Although multiple researchers have used it successfully for machine learning experiments, doing so has required a good deal of domain expertise. This paper introduces a set of programs, tools, and resources designed to make the domain easily usable for experimentation without any prior knowledge of RoboCup or the Soccer Server. In addition, we report on new experiments in the Keepaway domain, along with performance results designed to be directly comparable with future experimental results. Combined, the new infrastructure and our concrete demonstration of its use in comparative experiments elevate the domain to a machine learning benchmark, suitable for use by researchers across the field.
-

Commerce and Economics

Characterization of satisfactory mechanisms for the revelation of preferences for public goods by Green J, Laffont JJ. Econometrica: Journal of the Econometric Society, 1977.

Problems connected with social decision processes in models with public goods have troubled economists for some time. Recently a negative result of Gibbard and Satterthwaite precluded the possibility of finding non-dictatorial deterministic mechanisms for choosing social states which have the property that individuals believe it to be impossible to manipulate the mechanism to their own advantage. They may, in particular, reveal preferences other than their own, and the resulting social choice may then be distorted away from the Pareto optimum relative to their true tastes. Gibbard and Satterthwaite's requirements are quite strong. In particular, arbitrary individual preferences are allowed. In a more specialized context, where the decision concerns the level of public goods and monetary transfers among individuals, Groves and Groves and Loeb assumed that preferences are monotonic in income and that the willingness-to-pay for alternative levels of the public good is independent of income. In such environments they found a class of mechanisms with the properties that stating one's true preferences is a dominant strategy for each individual and that a Pareto optimum is selected. In this paper, we show that the mechanisms proposed by Groves and Loeb are the only ones which have these desirable characteristics. This result enables us to concentrate the search for optimal mechanisms within this class and to use criteria other than individual incentive compatibility to distinguish among these. We have pursued this direction in other papers. In addition, we show that well-defined mechanisms which select Pareto optimal outcomes (referred to as satisfactory mechanisms), independently of the question of truthful revelation, are essentially isomorphic to the mechanisms proposed by Groves.
-

Pricing in agent economies using multi-agent Q-learning. Autonomous Agents and Multi-Agent Systems by G. Tesauro and J. O. Kephart. Springer, 2002.

This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing fixed-strategy opponents, ordinary Q-learning is guaranteed to find the optimal policy. However, for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. In this paper, we study simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. We find that, despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, exact or approximate convergence is also found even at large discount parameters. We show how the Q-derived policies increase profitability and damp out or eliminate cyclic price “wars” compared to simpler policies based on zero lookahead or short-term lookahead. In one of the models (the "Shopbot" model) where the sellers’ profit functions are symmetric, we find that Q-learning can produce either symmetric or broken-symmetry policies, depending on the discount parameter and on initial conditions.
-

Combinatorial information market design by Hanson R. Information Systems Frontiers, 2003.

Information markets are markets created to aggregate information. Such markets usually estimate a probability distribution over the values of certain variables, via bets on those values. Combinatorial information markets would aggregate information on the entire joint probability distribution over many variables, by allowing bets on all variable value combinations. To achieve this, we want to overcome the thin market and irrational participation problems that plague standard information markets. Scoring rules avoid these problems, but instead suffer from opinion pooling problems in the thick market case. Market scoring rules avoid all these problems, by becoming automated market makers in the thick market case and simple scoring rules in the thin market case. Logarithmic versions have cost and modularity advantages. After introducing market scoring rules, we consider several design issues, including how to represent variables to support both conditional and unconditional estimates, how to avoid becoming a money pump via errors in calculating probabilities, and how to ensure that users can cover their bets, without needlessly preventing them from using previous bets as collateral for future bets.
-

Efficiency loss in a network resource allocation game by Johari R, Tsitsiklis JN. Mathematics of Operations Research, 2004.

We explore the properties of a congestion game in which users of a congested resource anticipate the effect of their actions on the price of the resource. When users are sharing a single resource, we establish that the aggregate utility received by the users is at least 3/4 of the maximum possible aggregate utility. We also consider extensions to a network context, where users submit individual payments for each link in the network they may wish to use. In this network model, we again show that the selfish behavior of the users leads to an aggregate utility that is no worse than 3/4 of the maximum possible aggregate utility. We also show that the same analysis extends to a wide class of resource allocation systems where end users simultaneously require multiple scarce resources. These results form part of a growing literature on the “price of anarchy,” i.e., the extent to which selfish behavior affects system efficiency.
-

Worst-case optimal redistribution of VCG payments by Guo M, Conitzer V. In Proceedings of the 8th ACM conference on Electronic commerce, 2007.

For allocation problems with one or more items, the wellknown Vickrey-Clarke-Groves (VCG) mechanism is efficient, strategy-proof, individually rational, and does not incur a deficit. However, the VCG mechanism is not (strongly) budget balanced: generally, the agents’ payments will sum to more than 0. If there is an auctioneer who is selling the items, this may be desirable, because the surplus payment corresponds to revenue for the auctioneer. However, if the items do not have an owner and the agents are merely interested in allocating the items efficiently among themselves, any surplus payment is undesirable, because it will have to flow out of the system of agents. In 2006, Cavallo proposed a mechanism that redistributes some of the VCG payment back to the agents, while maintaining efficiency, strategy-proofness, individual rationality, and the non-deficit property. In this paper, we extend this result in a restricted setting. We study allocation settings where there are multiple indistinguishable units of a single good, and agents have unit demand. (For this specific setting, Cavallo’s mechanism coincides with a mechanism proposed by Bailey in 1997.) Here we propose a family of mechanisms that redistribute some of the VCG payment back to the agents. All mechanisms in the family are efficient, strategyproof, individually rational, and never incur a deficit. The family includes the Bailey-Cavallo mechanism as a special case. We then provide an optimization model for finding the optimal mechanism — that is, the mechanism that maximizes redistribution in the worst case — inside the family, and show how to cast this model as a linear program. We give both numerical and analytical solutions of this linear program, and the (unique) resulting mechanism shows significant improvement over the Bailey-Cavallo mechanism (in the worst case). Finally, we prove that the obtained mechanism is optimal among all anonymous deterministic mechanisms that satisfy the above properties.
-

The Price of Anarchy and the Design of Scalable Resource Allocation Mechanisms by Johari R. Algorithmic Game Theory, 2007.

In this chapter, we study the allocation of a single infinitely divisible resource among multiple competing users. While we aim for efficient allocation of the resource, the task is complicated by the fact that users’ utility functions are typically unknown to the resource manager. We study the design of resource allocation mechanisms that are approximately efficient (i.e., have a low price of anarchy), with low communication requirements (i.e., the strategy spaces of users are low dimensional). Our main results concern the proportional allocation mechanism, for which a tight bound on the price of anarchy can be provided. We also show that in a wide range of market mechanisms that use a single market-clearing price, the proportional allocation mechanism minimizes the price of anarchy. Finally, we relax the assumption of a single market-clearing price, and show that by extending the class of Vickrey–Clarke–Groves mechanisms all Nash equilibria can be guaranteed to be fully efficient.
-

*Misc Theory and Approaches

The papers below were found to be difficult to categorise and therefore are presented under the general "catch-all" category miscellaneous theory and approaches.

A logical approach to the dynamics of commitments by Meyer JJ, van der Hoek W, van Linder B. Artificial Intelligence, 1999.

In this paper we present a formalisation of motivational attitudes, the attitudes that are the driving forces behind the actions of agents. We consider the statics of these attitudes both at the assertion level, i.e., ranging over propositions, and at the practition level, i.e., ranging over actions, as well as the dynamics of these attitudes, i.e., how they change over time. Starting from an agent’s wishes, which form the primitive, most fundamental motivational attitude, we define its goals as induced by those wishes that do not yet hold, i.e., are unfulfilled, but are within the agent’s practical possibility to bring about, i.e., are implementable for the agent. Among these unfulfilled, implementable wishes the agent selects those that qualify as its goals. Based on its knowledge on its goals and practical possibilities, an agent may make certain commitments. In particular, an agent may commit itself to actions that it knows to be correct and feasible to bring about some of its known goals. As soon as it no longer knows its commitments to be useful, i.e., leading to fulfillment of some goal, and practically possible, an agent is able to undo these commitments. Both the act of committing as well as that of undoing commitments is modelled as a special model-transforming action in our framework, which extends the usual state-transition paradigm of Propositional Dynamic Logic. In between making and undoing commitments, an agent is committed to all the actions that are known to be identical for all practical purposes to the ones in its agenda. By modifying the agent’s agenda during the execution of actions in a straightforward way, it is ensured that commitments display an intuitively acceptable behaviour with regard to composite actions.
-

Learning to Weigh Basic Behaviors in Scalable Agents by Olivier Buffet, Alain Dutech Francois, Charpillet. AAMAS, 2002.

We are working on the use of Reinforcement Learning [RL](3) algorithms to design automatically reactive situated agents limited to only local perceptions. Unfortunately, as good RL algorithms suffer from combinatorial explosion, their use is generally limited to simple problems. As shown on the tile-world example of figure 1, we propose to overcome these difficulties by making the hypothesis, as in Brook’s subsumption architecture [1], that a complex problem can be efficiently dealt with if considered as a combination of simple problems. This short presentation gives basic ideas on RL algorithms (section 2). Then the three steps of our method are presented: how basic behaviors are learned for each basic motivation (sec. 3), how the scene is decomposed in key figures to find the basic behaviors currently involved (sec. 4), and how to combine them into a complex global behavior using learned weights (sec. 5). A few words are given on the experiments conducted on the tile-world problem (sec. 6) and precede a conclusion
-

Applications of distributed artificial intelligence in industry. by H. Van Dyke Parunak. John Wiley & Sons, 1996.

In many industrial applications, large centralized software systems are not as effective as distributed networks of relatively simpler computerized agents. For example, to compete effectively in today's markets, manufacturers must be able to design, implement, reconfigure, resize, and maintain manufacturing facilities rapidly and inexpensively. Because modern manufacturing depends heavily on computer systems, these same requirements apply to manufacturing control software, and are more easily satisfied by small modules than by large monolithic systems. This paper reviews industrial needs for Distributed Artificial Intelligence (DAI),1 giving special attention to systems for manufacturing scheduling and control. It describes a taxonomy of such systems, gives case studies of several advanced research applications and actual industrial installations, and identifies steps that need to be taken to deploy these technologies more broadly.
-

Files

Shallow learning

Directory actions

More options