Gemini 2.0 Flash

preview_player
Показать описание
In this video, I look at the latest release from Google for Gemini 2.0 Flash and we look at how it can do various multimodal tasks and how it's improved its over its previous versions

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
02:23 Multimodal Audio Output
04:18 Multimodal Inline Image Output
07:25 Multimodal Live API
12:12 Native Tool Use
12:54 Unified SDK
13:29 Google Gemini 2.0 Flash Blog
Рекомендации по теме
Комментарии
Автор

Man that conversation with Gemini and in Thai was so so cool.

imadsaddik
Автор

Sam speaks Thai! Quite the flex to slip in there!

mshonle
Автор

Woo. The versatility of the voice to go from whisper to different expressions is next level. Similar to notebook llm podcast feature. Impressive stuff!

jacobgoldenart
Автор

I've been building an VLM controlled Turtlebot2 based ROS robot (recently switched over to Gemini from Haiku 😢iykyk) Today's announcement was awesome. Native spatial reasoning is incredible and undersold! 3d bounding box creation is kinda wow. Not to mention the real-time speech, video and audio in.
The normies are not ready. I showed my septuagenarian parents my robot for the first time yesterday - at first they thought it was cute because it has STT and TTS, vision, silly animated face and arms... until they realized they had this weird alien intelligence wandering around their home and got creeped out 😆🤣and tbh i don't really blame them. What a time to be alive!
Thanks, Sam! Glad you've got early access - looking forward to seeing more!

thenoblerot
Автор

The voice is damn good, I'll give it that, sounds as good or better than advanced voice, also we have seen the native image output from openai in the demo.

countofst.germain
Автор

One thing I love is that even if AGI won’t exist in the near future, we are definitely in a new Industrial Revolution! I’m excited ❤

dandushi
Автор

It's day 5 for OpenAI and they are live, but here I am watching your overview of Gemini 2 Flash. And this is way more interesting.

MojaveHigh
Автор

Man your video was insane. Google is definitely going for OpenAI and Anthropic with 2.0

lydedreamoz
Автор

Finally a real alternative to advance voice mode.

firesoul
Автор

Fascinating, multimodel, greatest experience. Thank you Gemini

PrabhakarKrishnamurthyprof
Автор

I said that before and say it again, I'm really happy that Google this year is back on track and focusing on two things: one, shipping regularly for developers, and also working on foundation and LLM enhancement. Keeping these two aligned is really something, and now look, they are the best one providing such real-time communication with LLM in such a native way, amazing.

unclecode
Автор

Wow, this could be very interesting for doing some customer guidance RAG work. My day has now been reorganised!

paulmiller
Автор

Finally! Been waiting for google to release something we can actually build with! It's go time Sam!

klammer
Автор

I'd love a video on how to use Gemini to make a voice based customer service agent. When it generates audio, can it make tool calls in the same response? Do you get a transcript of the audio and then use that for decision making, etc? I'm familiar with how to make general agentic workflows but not how to integrate audio or phone systems.

tonyrungeetech
Автор

รีวิวดีมาก ทำให้เข้าใจมากขึ้น ขอบคุณครับ 😊

yossawat
Автор

First time I've been genuinely impressed with Gemini. Nice flex on the Thai by Sam and Gemini.

FredPauling
Автор

Can't wait until I use my Nuclear Powered Data Center with my own LLM!

KkfightStarBaal
Автор

google with this, gonna destroy openAis 200$ subscriptions

amandamate
Автор

One of the biggest shocks in this video is that you speak Thai fluently.

TreeLuvBurdpu
Автор

Awesome! I just updated AI knowledge by your video. I can not wait next video.

phongthe