filmov
tv
Meta-Transformer: A Unified Framework for Multimodal Learning
Показать описание
In this video we explain Meta-Transformer, a unified framework for multimodal learning.
With Meta-Transformer, we can use the same pre-trained transformer to process information of 12 different modalities, which is significantly more than what was possible until now with similar works such as ImageBind by Meta AI.
We review the architecture of Meta-Transformer, which is composed of Data-to-Sequence Tokenizer, a Unified Multimodal Model, and task specific models, and explain how Meta-Transformer is used to create models that can solve end tasks for different modalities.
Next we dive deeper into the pre-training process of the unified multimodal model, which is based on the LAION-2B dataset and trained using contrastive learning approach.
We finish by reviewing some of the results presented in the paper.
👍 Please like & subscribe if you enjoy this content
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Chapters:
0:00 Introducing Meta-Transformer
0:55 Meta-Transformer Architecture
3:10 Pre-training
4:46 Results
With Meta-Transformer, we can use the same pre-trained transformer to process information of 12 different modalities, which is significantly more than what was possible until now with similar works such as ImageBind by Meta AI.
We review the architecture of Meta-Transformer, which is composed of Data-to-Sequence Tokenizer, a Unified Multimodal Model, and task specific models, and explain how Meta-Transformer is used to create models that can solve end tasks for different modalities.
Next we dive deeper into the pre-training process of the unified multimodal model, which is based on the LAION-2B dataset and trained using contrastive learning approach.
We finish by reviewing some of the results presented in the paper.
👍 Please like & subscribe if you enjoy this content
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Chapters:
0:00 Introducing Meta-Transformer
0:55 Meta-Transformer Architecture
3:10 Pre-training
4:46 Results
Meta-Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning with 12 Inputs
Meta Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning #ai #aiengineer #computervision
Meta-Transformer: A Unified Framework for Multimodal Learning #ai #aiengineer #computervision
Meta-Transformer: Revolutionizing Multimodal Learning with a Unified Framework
Meta Transformer A Unified Framework for Multimodal Learning CUHK 2023
Unifying Multimodal Learning: The Meta Transformer Revolution
Meta-Transformer: Multimodality Unite
08.08.2023 Meta-Transformer: A Unified Framework for Multimodal Learning
China's New Meta-Transformer Architecture for Multimodal Learning (Paper Breakdown)
MELTR: Meta Loss Transformer for Learning to Fine-Tune Video Foundation Models (CVPR 2023)
MetaFormer is Actually What You Need for Vision
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)
Mixture of Transformers for Multi-modal foundation models (paper explained)
Blackwell Tensor Cores: HYMBA Model w/ MetaTokens (Small LM)
Deep Learning Research Papers || Novel Research!
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
(CVPR 2023)Mask DINO: Towards A Unified Transformer Framework for Object Detection and Segmentation
Multi Modal Transformer for Image Classification
WWDC24: Train your machine learning and AI models on Apple GPUs | Apple
Комментарии