Use Gemini 2.0 to Build a Realtime Chat App with Multimodal Live API

preview_player
Показать описание
In this video, I'd like to introduce how to build a real-time chat app with voice and video interaction by using Gemini 2.0 Multimodal Live API.

TIME STAMPS:
00:00 Overview of Gemini 2.0
02:04 Google AI Studio
04:52 Multimodal Live API
08:18 Code Walkthrough
17:09 Run the App

USEFUL LINKS:

MY CONNECT:
Рекомендации по теме
Комментарии
Автор

Google-genai has been upgraded, and the session.send() definition was changed. Make sure you run the demo code on google-genai==0.3.0.

yeyulab
Автор

"Incredible video! This is super valuable for all developers. Would love to see an implementation of Twilio incoming calls using the Google Gemini 2.0 Flash model!"

bunnynikhil
Автор

This video was so helpful. Thank you for the good work. Always worth byuing you some coffee. i was wondering if its possible to train this system on my plumbing business data such that when customers live stream a video of their plumbing issues, the API can analyse the issue and advice accordingly then tell the customer that we have the solution in stock/inventory and it costs so much! Maybe the customer can purchase the item at the same time. (my wild thought)

thabisonaha
Автор

May I know what input llm takes from websocket I mean any specific format in audio and wat it returns to websocket

learningtech
Автор

Great video! Are we able to choose between different voices?

Swollphin
Автор

Isn't it multimodal able anyway from the beginning on as gemini 2.0 flash? Or was your dev for local purposes?

RealLexable
Автор

The video was very helpful. With the help of this video I was able to build my own application. But having trouble in deploying this. Could you please make a video on how to deploy this application. For information, I am having issue in deploying it on render, pyaudio cannot be used, how to solve it. Please help

mayank
Автор

Hey, I hav one doubt I hav already tried implementing this
Facing some issues here is the approach which I followed,
Audio live streaming I want
Firstly giving input through the microphone and converting it into base64 and sending to websocket there I'm decoding audio and sending to llm and response audio wanna send back to user via websockets
But getting some issues I'm unable to get the res frm llm could you please help

learningtech
Автор

can this api be used to deploy and launch a web application to be used by others?

adarsh
Автор

did google update the api, mine seems to be failing with timed out during handshake error since yesterday.

aryanchauhan
Автор

Hello my friend, im a person with low vision=sight, j have somd questions

1- is this runnibg gemini locally on your device? Deprmfing on my GPU and CPU?

If yes, what is minimum to get this working locally?

2- can I use it to shsre my screen to it?

I'm asking about this because i feel likd it would bd helpful fof mh case to using my pc
I knos i can do this wifh Google AI studio, buf i want it locaalg to be as fast as possible.

ahmedal-ani
Автор

Could you check the git repo, its not working properly.. config w text is working okayish but the audio config isnt working

niv
Автор

Can I get response in both audio and text?

I tried:

CONFIG = {"generation_config": {"response_modalities": ["AUDIO", "TEXT]}} but it gives error.

AbdulRahman-vjel
Автор

This tutorial is misleading. Ran `pip install google-genai` in the console successfully, but the project wasn't populated with all the files `1.txt`, `index.html`, etc.; and `Demo Source Code` is a list of YouTube video links.

chrisBBGun
Автор

it is only showing response in text how do i make it talk back?

simphyy
Автор

getting this error ?
Error in Gemini session: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
Gemini session closed.

lohithnh