AI Seminar: Angel Chang, Grounding Natural Language to 3D

Показать описание

Amii's Canada CIFAR AI Chair Angel Chang presents "Grounding Natural Language to 3D" at the AI Seminar (June 12, 2020).

The Artificial Intelligence (AI) Seminar is a weekly meeting at the University of Alberta where researchers interested in AI can share their research. Presenters include both local speakers from the University of Alberta and visitors from other institutions. Topics related in any way to artificial intelligence, from foundational theoretical work to innovative applications of AI techniques to new fields and problems, are explored.

Bio: Dr. Angel Xuan Chang is an Assistant Professor of Computer Science at Simon Fraser University. Dr. Chang’s research focuses on the intersection of natural language understanding, computer graphics, and AI. Her research connects language to 3D representations of shapes and scenes and addresses grounding of language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes, 3D scene understanding, and helped to create various datasets for 3D deep learning (ShapeNet, ScanNet, Matterport3D). She received her Ph.D. in Computer Science from Stanford, under the supervision of Chris Manning. Dr. Chang received the SGP 2018 Dataset Award for her work on the ShapeNet dataset. She is a recipient of the TUM-IAS Hans Fischer Fellowship and a Canada CIFAR AI Chair.

Abstract: In popular imagination, household robots that we can instruct to "bring me my red mug from the kitchen" or ask “where are my glasses?” are common. For a robot to execute such an instruction or answer such a question, it needs to parse and interpret natural language, understand the 3D environment it is in (e.g. what objects exist and how they are described), navigate to locate the target object, and then formulate an appropriate response. While there has been previous work on the language-to-vision grounding problem in the 2D domain, there is much less work on methods operating with 3D representations such as required by the scenarios in these examples. As a first step in this direction, we introduce the new task of 3D object localization in scenes using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a freeform text description of a specified target object. Through crowdsourcing, we collect a dataset of natural language descriptions of objects in the ScanNet dataset and create a benchmark with several baseline methods for this challenging task of predicting the 3D bounding box of a referred object based on a natural language description. I will conclude by briefly summarizing various other ongoing projects in the area of grounded natural language to 3D interactive environments.