Machine Theory of Mind

In lab meeting last week, we read Machine Theory of Mind, a recent paper from Neil Rabinowitz and his collaborators at DeepMind & Google Brain (a trimmed version of the paper was presented at ICML 2018). Here, Theory of Mind (ToM) is broadly defined as the ability to represent the mental states of others. This paper aims to demonstrate ToM in an artificial agent. While designing & training such an agent constitutes one challenge, the authors must first devise a scenario in which ToM can be convincingly shown. Inspired by the Sally-Anne test — a classic test of ToM from developmental psychology that evaluates whether a child understands that others can hold false beliefs — the authors construct an analogous test, then train an agent to successfully pass it.

The paper is composed of a series of experiments that build in complexity to this final test. Within each experiment are three key parts. First, the environment: a simple 11×11 grid-world containing walls and 4 colored boxes that are all randomly located within each new world. Second, the agents: an individual agent belongs to a particular “species” according to its policy for acting within an environment. Agents can behave randomly, algorithmically, or with a learned policy (via deep RL). The trajectory of a particular agent within a particular environment constitutes an episode. Reward within an episode is generally maximized by navigating to a box of a particular color as fast as possible. However, limitations on the sightedness and statefulness of the agents, as well as the inclusion of more complex subgoals, are adjusted per the needs of each experiment. Finally, the observer: a meta-learning agent, called ToMnet, that parses the episodes of many agents in many environments so as to learn a prior over the behavior of an agent species. At test time, ToMnet uses a novel agent’s recent episodes & its trajectory on a current episode thus far to infer a posterior and make predictions regarding the agent’s future behavior.

From Figure 4 in the paper. (a) An example environment. The red arrows represent the trajectory of a goal-driven agent on a previous episode. (b) A new environment, but the same agent as in (a). Using a prior learned over the agent’s species (e.g., agents of this species seek to consume a box of a specific color) in combination with this agent’s past episodes (e.g., this agent consumed a green box), ToMnet is tasked with predicting the agent’s future behavior. (c) For the new environment in (b), ToMnet outputs the next predicted action (go right) and the predicted color of the box that will eventually be consumed (green). (d) ToMnet also predicts the agent’s successor representation.

To probe ToM in ToMnet, the authors introduce a species of agent with both a limited field of view and a subgoal. For example, an agent that can only see the squares adjacent to it must first navigate to a “star” in the grid-world before finally navigating to the blue box to achieve maximum reward. In certain environments, the agent passes the blue box early in its initial search and so knows directly where to go after finding the star, even if the blue box is not visible to the agent from the star’s location. The test comes when the experimenter now swaps the locations of the boxes while the agent is on the star and the boxes are out of view. While the agent is blind to the swap, ToMnet is not. And so, the analogous Sally-Anne test arises: Will ToMnet not recognize that the swap occurred outside of the agent’s field of view, and thus mistakenly predict that it will move toward the new location of the blue box? Or, will ToMnet recognize that the agent maintains the false belief that no swap has occurred, and thus correctly predict that it will move toward the old location of the blue box?

ToMnet predicts behavior reflecting the agent’s false belief, successfully passing the test. Importantly, this finding is supplemented with results that show that ToMnet is sensitive to how different fields of view make an agent “differentially vulnerable to acquire false beliefs,” and that ToMnet still passes the test even if it had never seen swap events during training. Thus, ToMnet “learns that agents can act based on false beliefs,” providing a compelling proof-of-concept for Machine Theory of Mind.