AIRS in the AIR—多智能体强化学习/DeepMind 大神分享预约报名-AIRS 研究院活动-活动行

活动内容收起 展开

微信图片_20220318100905.jpg

呼吸新鲜空气，了解前沿科技！

AIRS 重磅推出系列活动 AIRS in the AIR。每周二与您相约线上，一起探索人工智能与机器人领域的前沿技术、产业应用、发展趋势。

在过去的十年里，我们看到了很多人工智能在竞技中打败人类的案例，例如围棋、游戏等，由此引发一轮又一轮的关注和热议。

人工智能真的只能用于超越人类吗？我们是否能构建出可以与人类合作的人工智能？如何赋予人工智能系统社会认知和社会属性，并构建人工智能系统与人共同协作的社会？

在 AIRS in the AIR 第四期活动中，来自 Google 旗下人工智能公司 DeepMind 的研究员 Joel Z. Leibo 和牛津大学工程科学系副教授 Jakob Foerster 将围绕多智能体强化学习带来相关的最新研究进展。

TOPIC 1: Reverse engineering the social-cognitive capacities, representations, and motivations that underpin human cooperation to help build cooperative artificial general intelligence

ABSTRACT:

As a route to building cooperative artificial general intelligence, I propose we try to reverse engineer human cooperation. As humans, we employ a set of social-cognitive capacities, representations, and motivations which underlie our critical ability to cooperate with one another. Here I will argue that we need to figure out how human cooperation works so that we can build general artificial intelligence that cooperates like humans do. Specifically. in this talk I will describe how to use Melting Pot, an evaluation methodology and suite of test scenarios for multi-agent reinforcement learning, to further this goal of reverse engineering human cooperation in order to build cooperative artificial general intelligence.

SPEAKER：

Joel Z. Leibo is a research scientist at DeepMind. He obtained his PhD in 2013 from MIT where he worked on the computational neuroscience of face recognition with Tomaso Poggio. Nowadays, Joel's research is aimed at the following questions:

● How can we get deep reinforcement learning agents to perform complex cognitive behaviors like cooperating with one another in groups?

● How should we evaluate the performance of deep reinforcement learning agents?

● How can we model processes like cumulative culture that gave rise to unique aspects of human intelligence?

TOPIC2: Zero-Shot Coordination and Off-Belief

ABSTRACT:

There has been a large body of work studying how agents can learn communication protocols in decentralized settings, using their actions to communicate information. Surprisingly little work has studied how this can be prevented, yet this is a crucial prerequisite from a human-AI coordination and AI-safety point of view.

The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy pi_1 that is optimized assuming past actions were taken by a given, fixed policy, pi_0, but assuming that future actions will be taken by pi_1. When pi_0 is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior.

OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC).

OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.

SPEAKER：

Jakob Foerster.jpeg

Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind.

After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019.

His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting.

举报活动