In our weekly research club we review a paper that covers some upcoming and interesting topic around AI, be it machine learning, natural language processing or computer vision related. This time around I got to lead the discussion on grounded language learning, and the so called “BabyAI”, a research platform introduced by Chevalier-Boisvert et al. to “support investigations towards including humans in the loop of grounded language learning”. By introducing the BabyAI platform, the paper aims at finding ways for humans to teach AI (or, more specifically artificial reinforcement learning [RL] agents) by language instructions. Interactive training of artificial agents would be beneficial for its practicality, but also for scientific reasons. Let’s take a look at the paper in more detail.
Grounded language learning means learning the connection between symbols and their meaning, in other words, understanding the real-world connection behind those linguistic tokens (in this case, instructions for the artificial agent such as “go to the red ball”). It studies similar questions as what e.g. psychology studies when researching how human children learn the language, but applied to how computers and machines acquire human language skills. The necessity to understand grounded language learning comes from humans desire to customise the AI to their specific needs, and to be able to instruct the AI with real-world language.
BabyAI is a language learning tool that consists of 19 levels of increasing difficulty of mastering a language. In the paper, the authors discuss the process that would be needed to train a neural network-based agent on some of the BabyAI levels. These levels are presented in the Table 1. below.
Table 1. Levels of BabyAI platform for learning language.
|Room Navigation (ROOM)||to navigate a 6×6 room|
|Ignoring Distracting Boxes (DISTR-BOX)||to navigate the environment even when there are multiple distracting grey box objects in it|
|Ignoring Distractors (DISTR)||same as DISTR-BOX, but distractor objects can be boxes, keys or balls of any colour|
|Maze Navigation (MAZE)||to navigate a 3×3 maze of 6×6 rooms in which the rooms are randomly connected to each other with doors|
|Unblocking the Way (UNBLOCK)||to navigate the environment even when it requires moving the objects that are in the way|
|Unlocking Doors (UNLOCK)||to be able to find the key and unlock the door if the instruction requires this explicitly|
|Guessing to Unlock Doors (IMP-UNLOCK)||to guess that in order to execute instructions, the agent needs to identify the door that needs to be unlocked, find the respective key, unlock the door and proceed further with the execution|
|Go To Instructions (GOTO)||to understand “go to” instructions, e.g. “go to the red ball”|
|Open Instructions (OPEN)||to understand “open” instructions, e.g. “open the door on your left”|
|Pickup Instructions (PICKUP)||to understand “pick up” instructions, e.g. “pick up a box”|
|Put Instructions (PUT)||to understand “put” instructions, e.g. “put a ball next to the blue key”|
|Location Language (LOC)||to understand instructions in which objects are referred to by not only their shape and colour but also by their location relative to the initial position of the agent, e.g. “go to the red ball in front of you”|
|Sequences of Commands (SEQ)||to understand composite instructions that require the agent to execute a sequence of instruction clauses, e.g. “put red ball next to the green box after you open the door”|
In the paper, the BabyAI is in a 2D gridworld where synthetic instructions, formulated using “Baby language” (subset of a synthetic language), tell the agent to perform a certain end result, find its way through unlocking doors. These tasks represent levels for the authors, as can be seen above in Table 1. The authors believe, that human-machine teaching needs to progress step-by-step, as the human-human teaching would too. Below is an example of this 2D gridworld with different tasks.As said, the paper aims to evaluate the feasibility of grounded language learning for artificial reinforcement learning agents by human teachers. By doing this, the paper provides a platform for benchmarking and evaluating the training of different RL agents. This is explored within curriculum and imitation learning frameworks (more on these below). Chevalier-Boisvert et al. present several initial experiments by training agents with the help of a heuristic expert teacher, instead of using an actual human teacher. This, they believe, will help create interactive teaching environment, where teacher teaches according to the learners current abilities, similarly as with a human teacher.
Data efficiency has long been a problem in real-life applications of reinforcement learning. Data efficiency refers to the amount of trials it takes for the agent to learn its environment and perform given tasks. This, according to the paper is the main challenge for human-in-the-loop training of reinforcement learning agents. It is not feasible for a real human teacher to perform such a task. As the authors note, typically deep learning methods used in imitation learning and RL require millions of reward function queries or a similarly unrealistic amount of examples, impossible for humans to perform.
At the moment, there exist several techniques for improving data efficiency of training reinforcement learning agents, such as the aforementioned curriculum learning, in which the agent first trains on simpler tasks and proceeds to more difficult ones, and imitation learning, in which the agent learns from demonstrations and examples, i.e., being shown how to perform the task. In the future the problem of data efficiency will likely be solved, but for now we need to follow the area and contribute to the development.
As a conclusion, the authors conclude that the field still needs substantial development for language learning to be possible or feasible to be done by a human teacher. In the paper the authors only conduct experiments using a bot for training the artificial agent. It would have been interesting if they had done even a limited initial experiment with a real human teacher, but that likely will be the topic of some future research.
The paper is a useful evaluation of the current situation, and I believe it to be a good resource in a sense that they are reporting a somewhat “negative” result that something is not possible, which is not very typical of scientific papers. They have a reasonable setup, experiments and a result that conclude that this is not possible.
References and resources:
Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio. (2018) BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop.https://arxiv.org/pdf/1810.08272.pdf
The topic is studied under the term “grounded language learning”, some papers and similar environments have been presented before:
The platform and pre-trained models presented by the authors in the article are available online: https://github.com/mila-udem/babyai.
Grounded Language Learning in a Simulated 3D World: https://arxiv.org/abs/1706.06551
Microsoft’s Malmö: https://github.com/Microsoft/malmo / https://www.ijcai.org/Proceedings/16/Papers/643.pdf
Other materials based on our weekly research club:
Silo.AI Academy Resources: https://silo.ai/research/#academy
Join the Silo.AI Research Community Slack. A 450+ strong community of machine learning, computer vision and NLP researchers around the world. Global network for sharing AI related news, events, open positions, ideas and learnings. Sign up for the research Slack by submitting a request at: https://silo.ai/research/#slack.