Visual Question Answering | VQA | Vision & Lang Transformer | ViLT | Show-Ask-Attend | Deep learning

preview_player
Показать описание
Visual Question Answering (VQA)

- given

1. an image and
2. a question about the image

attempts to answer the question

with different deep learning models

1. Show-Ask-Attend-Answer Deep learning Model
2. Vision & Language Transformer model (ViLT)

(pretrained on coco) with pytorch, the answer is predicted with logits / probabilities

#computervision #imageprocessing #imageprocessingpython #python #deeplearning #attention #vqa #nlp #lstm #pytorch
Рекомендации по теме
Комментарии
Автор

can you share the code implementation of your experiments with a bar plots with VILT model?

arsenyivanov