NExT-GPT: Any-to-Any Multimodal LLM

Показать описание

In this video we explain NExT-GPT, a multimodal large language model (MM-LLM), that was introduced in a research paper titled: "NExT-GPT: Any-to-Any Multimodal LLM".

We carefully review the NExT-GPT framework, explaining its different components, to understand how it is capable of using a LLM as its core agent to both process input and generate output from multiple modalities.
We then review a multimodal conversation example to get a better intuition for what can be done with such a framework.
Next, we dive into how NExT-GPT was trained by explaining few diagrams from the paper.
Finally, we review interesting results from the paper.

👍 Please like & subscribe if you enjoy this content
-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

Chapters:
0:00 Introduction & Motivation
1:03 NExT-GPT Framework
4:36 Conversation Example
5:32 Training NExT-GPT
8:40 Results