Connecting Language and Vision to Actions

Introduction

Recent advances at the intersection of natural language processing and computer vision have made incredible progress, from being able to generate natural language descriptions of images and videos, to answering questions about them, to even holding free-form conversations about visual content! The challenge now is to extend this progress to embodied agents that take actions and interact with their visual environments.

This tutorial will provide a comprehensive yet accessible introduction to the key innovations that have driven progress in language and vision modeling (such as multi-modal pooling, visual and co-attention, dynamic network composition, methods for incorporating external knowledge and cooperative/adversarial games). We will then discuss some of the challenges in building models for tasks that combine language, vision, and actions, and discuss recently-released interactive 3D environments that can be used for these (such as House3D, HoME, MINOS, Matterport3D Simulator, Gibson, Thor & Chalet).

Slides

[Peter] Introduction [pptx]
[Peter] Neural Building Blocks
- CNNs and RNNs; captioning, VQA [pptx]
- Attention mechanisms [pptx]
[Qi] Neural Building Blocks (continued) and Tricks [pptx] [pdf]
- Multi-modal Pooling
- Dynamic Network Composition
- External Knowledge / Memory
- Engineering Tricks
[Abhishek] Reinforcement Learning in Language & Vision [keynote] [pdf]
- Train/test discrepancy in language generation
- Optimizing for metrics
- Adversarial training
- Downstream task-based training & evaluation
Embodied Agents & Environments
- [Abhishek] Introduction and EmbodiedQA [keynote] [pdf]
- [Peter] Vision-Language Navigation [pptx]
- [Abhishek] Comparing environments, and next steps [keynote] [pdf]

Presenters


Peter Anderson Australian National University / Macquarie University	Abhishek Das Georgia Tech	Qi Wu The University of Adelaide

Connecting Language and Vision to Actions

ACL 2018 Tutorial Sunday, July 15, 9:00 AM — 12:30 PM, Room 218 Melbourne Convention and Exhibition Centre, Australia

Introduction

Slides

Presenters

ACL 2018 Tutorial
Sunday, July 15, 9:00 AM — 12:30 PM, Room 218
Melbourne Convention and Exhibition Centre, Australia