Multimodal AI: LLMs that can see (and hear)

Показать описание

Multimodal (Large) Language Models expand an LLM's text-only capabilities to include other modalities. Here are three ways to do this.

Resources:

References:

--

Introduction - 0:00
Multimodal LLMs - 1:49
Path 1: LLM + Tools - 4:24
Path 2: LLM + Adapaters - 7:20
Path 3: Unified Models - 11:19
Example: LLaMA 3.2 for Vision Tasks (Ollama) - 13:24
What's next? - 19:58