Mateusz Malinowski: From image recognition, to visual question answering, to holistic reasoning

Показать описание

Abstract:
Owing to Deep Learning and large-scale datasets, we have seen a seismic shift in computer vision, where more traditional problems such as image recognition are one-by-one tackled with a great success. This has opened up an opportunity to attack even more challenging multimodal problems such as Visual Question Answering, where a machine's understanding of the visual world is challenged by questions about the surrounding scene.
In this talk, I will present motivation behind Visual Question Answering together with concrete manifestations. I will show tools that are commonly used to build question answering machines, namely CNN and LSTM. I will also introduce more modern approaches such as Relation Networks, Hyperbolic Attention Networks, and Hard Attention. I will conclude this talk with possible future directions