Multimodal AI: LLMs that can see (and hear)

preview_player
Показать описание

Multimodal (Large) Language Models expand an LLM's text-only capabilities to include other modalities. Here are three ways to do this.

Resources:

References:

--

Introduction - 0:00
Multimodal LLMs - 1:49
Path 1: LLM + Tools - 4:24
Path 2: LLM + Adapaters - 7:20
Path 3: Unified Models - 11:19
Example: LLaMA 3.2 for Vision Tasks (Ollama) - 13:24
What's next? - 19:58
Рекомендации по теме
Комментарии
Автор

I'm excited to kick off this new series! Check out more resources and references in the description :)

ShawhinTalebi
Автор

Hi, Shaw Talebi
Please make some videos on LangChain, LangGraph, AI Agents
your teaching style is best and simple

mohsinshah
Автор

great video, do videos on Lang chain and AI agents

sam-uwgf
Автор

I AM TRYING TO MAKE AN AVATAR THAT CAN CONTROL MY COMPUTER WITH OPEN INTERPRETER AND HEY-GEN LIVE STREAM A P I .

mysteryman
Автор

Use dark mode man!!!
I'll skip this video

Ilan-Aviv
Автор

I have versions of all of the above open sourced and not

jonnylukejs