At Silo.AI we have a weekly research club where we look into interesting techniques and methods within the fast-moving field of AI. Last week, I gave a presentation on Multi-agent Reinforcement Learning (MARL), which I compiled into this blog post. The importance of learning in multi-agent environments is widely acknowledged in artificial intelligence. Let’s take a look at MARL and its applications.
Multiple reinforcement learning agents
MARL aims to build multiple reinforcement learning agents in a multi-agent environment. The actions of all the agents are affecting the next state of the system. The agents can have cooperative, competitive, or mixed behaviour in the system.
MARL is not a new field or concept, but due to its complex nature, it didn’t get much of an attention until around 20 years ago. However, these days, an increasing amount of researchers are interested in MARL, since it has many promising applications in real world.
The main algorithms in the MARL area
There are three major categories of algorithms in the MARL area: policy based-method, value-based method, and the mix of the two. One of such mixes is the actor-critic method, which is drawing more and more attention in academia.
The applications for MARL are being researched and tested in academia. At the moment there are not so many use cases that exist in practice. MARL tries to tackle complex problems in complex systems, and therefore thorough tests need to be done before real deployment. However, in some fields, such as online distributed resource allocation and cellular network optimisation, it might be applied in near future as the required level of safety can be more easily achieved. Let’s take a look at some potential applications of MARL.
- Online Distributed Resource Allocation
Applying multi-agent learning on to come up with effective resource allocation in a network of computing.
Zhang, Chongjie, Victor R. Lesser, and Prashant J. Shenoy. “A Multi-Agent Learning Approach to Online Distributed Resource Allocation.” IJCAI. Vol. 9. 2009.
- Cellular Network Optimisation
Applying MARL in LTE networks, guide base stations to maximise mobile service quality.
Pandey, Binda. “Adaptive Learning For Mobile Network Management.” 2016.
- Smart Grid Optimisation
Applying MARL to control power flow in an electrical power grid with optimum efficiency.
Riedmiller, Martin, Andrew Moore, and Jeff Schneider. “Reinforcement learning for cooperating and communicating reactive agents in electrical power grids.” Workshop on Balancing Reactivity and Social Deliberation in Multi-Agent Systems. Springer, Berlin, Heidelberg. 2000.
- Smart Cross Light
Applying MARL to control traffic lights to minimise wait time for each car in a city, making them more adaptable based estimates of expected wait time.
Wiering, M. A. “Multi-agent reinforcement learning for traffic light control.” ICML, 2000.
Challenges with MARL
There are many challenges with MARL that are waiting to be tackled. First, all the agents in the system are determining the system state together causing the so-called curse of dimensionality to become a bottleneck for MARL. The agents’ action spaces are interacting with each other and complexity of the system grows exponentially. Second, the non-stationary nature of the multi-agent system makes the problem harder to approach. As all the agents are interacting with the system, the best policy for an agent can change according to other agents’ policies. Another one is the exploration and exploitation trade-off. In multi-agent settings, the exploration space is much larger as the change of other agents will introduce new states that need to be explored. Then how to make sure the system runs stably (exploitation) while evolving at a reasonable rate (exploration) becomes a concern, especially for real life application.
Overall, MARL has great potential because it is much closer to our multi-agent real world. Each individual makes its own decision and optimises its action to finish a task. The MARL agents have the ability to learn from human, cooperate with human, and facilitate human to achieve goals. In Silo.AI‘s vision, AI is a human-in-the-loop solution, where humans will be augmented by AI and AI can learn from human to better augment human in an iterative means. Reinforcement Learning structure naturally embeds this human-in-the-loop concept. This could be the future direction.
Sources and interesting material
General MARL research summaries
https://github.com/LantaoYu/MARL-Papers — this git repository has a good collection of MARL papers
https://github.com/openai/multiagent-particle-envs — A MARL environment that’s compatible with gym
https://github.com/geek-ai/MAgent — A MARL library
https://github.com/google/dopamine — Google Dopamine Reinforcement Learning for single agent