Gemini 2.0 Flash

Показать описание

In this video, I look at the latest release from Google for Gemini 2.0 Flash and we look at how it can do various multimodal tasks and how it's improved its over its previous versions

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
02:23 Multimodal Audio Output
04:18 Multimodal Inline Image Output
07:25 Multimodal Live API
12:12 Native Tool Use
12:54 Unified SDK
13:29 Google Gemini 2.0 Flash Blog

Рекомендации по теме

Комментарии

Man that conversation with Gemini and in Thai was so so cool.

imadsaddik

Sam speaks Thai! Quite the flex to slip in there!

mshonle

Woo. The versatility of the voice to go from whisper to different expressions is next level. Similar to notebook llm podcast feature. Impressive stuff!

jacobgoldenart

I've been building an VLM controlled Turtlebot2 based ROS robot (recently switched over to Gemini from Haiku 😢iykyk) Today's announcement was awesome. Native spatial reasoning is incredible and undersold! 3d bounding box creation is kinda wow. Not to mention the real-time speech, video and audio in.
The normies are not ready. I showed my septuagenarian parents my robot for the first time yesterday - at first they thought it was cute because it has STT and TTS, vision, silly animated face and arms... until they realized they had this weird alien intelligence wandering around their home and got creeped out 😆🤣and tbh i don't really blame them. What a time to be alive!
Thanks, Sam! Glad you've got early access - looking forward to seeing more!

thenoblerot

The voice is damn good, I'll give it that, sounds as good or better than advanced voice, also we have seen the native image output from openai in the demo.

countofst.germain

One thing I love is that even if AGI won’t exist in the near future, we are definitely in a new Industrial Revolution! I’m excited ❤

dandushi

It's day 5 for OpenAI and they are live, but here I am watching your overview of Gemini 2 Flash. And this is way more interesting.

MojaveHigh

Man your video was insane. Google is definitely going for OpenAI and Anthropic with 2.0

lydedreamoz

Finally a real alternative to advance voice mode.

firesoul

Fascinating, multimodel, greatest experience. Thank you Gemini

PrabhakarKrishnamurthyprof

I said that before and say it again, I'm really happy that Google this year is back on track and focusing on two things: one, shipping regularly for developers, and also working on foundation and LLM enhancement. Keeping these two aligned is really something, and now look, they are the best one providing such real-time communication with LLM in such a native way, amazing.

unclecode

Wow, this could be very interesting for doing some customer guidance RAG work. My day has now been reorganised!

paulmiller

Finally! Been waiting for google to release something we can actually build with! It's go time Sam!

klammer

I'd love a video on how to use Gemini to make a voice based customer service agent. When it generates audio, can it make tool calls in the same response? Do you get a transcript of the audio and then use that for decision making, etc? I'm familiar with how to make general agentic workflows but not how to integrate audio or phone systems.

tonyrungeetech